From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ivan Boule <ivan.boule-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
Subject: Re: Making space in mbuf data-structure
Date: Tue, 08 Jul 2014 09:16:54 +0200
Message-ID: <53BB9AE6.4030503@6wind.com>
References: <59AF69C657FD0841A61C55336867B5B02CF13E32@IRSMSX103.ger.corp.intel.com>
 <53BA7422.9080706@6wind.com>
 <F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B3F6@SHSMSX104.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "dev-VfR2kkLFssw@public.gmane.org" <dev-VfR2kkLFssw@public.gmane.org>
To: "Zhang, Helin" <helin.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
 Olivier MATZ <olivier.matz-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>,
 "Richardson, Bruce" <bruce.richardson-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Return-path: <dev-bounces-VfR2kkLFssw@public.gmane.org>
In-Reply-To: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B3F6-0J0gbvR4kTg/UvCtAeCM4rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev-VfR2kkLFssw@public.gmane.org>
List-Help: <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=subscribe>
Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org
Sender: "dev" <dev-bounces-VfR2kkLFssw@public.gmane.org>

On 07/08/2014 09:04 AM, Zhang, Helin wrote:
>
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces-VfR2kkLFssw@public.gmane.org] On Behalf Of Olivier MATZ
>> Sent: Monday, July 7, 2014 6:19 PM
>> To: Richardson, Bruce; dev-VfR2kkLFssw@public.gmane.org
>> Subject: Re: [dpdk-dev] Making space in mbuf data-structure
>>
>> Hello Bruce,
>>
>> Thank you to revive this discussion now that the 1.7 is released.
>>
>> First, I would like to reference my previous patch series that first reworks the
>> mbuf to gain 9 bytes [1]. The v1 of the patch was discussed at [2].
>>
>> Now, let's list what I would find useful to have in this mbuf rework:
>>
>> - larger size for ol_flags: this is at least needed for TSO, but as it
>>     is completely full today, I expect to have this need for other
>>     features.
>> - add other offload fields: l4_len and mss, required for TSO
>> - remove ctrl_mbuf: they could be replaced by a packet mbuf. It will
>>     simplify the mbuf structure. Moreover, it would allow to save room
>>     in the mbuf.
>> - a new vlan tag, I suppose this could be useful in some use-cases
>>     where vlans are stacked.
>> - splitting out fields that are superimposed: if 2 features can be used
>>     at the same time
>>
>> On the other hand, I'm not convinced by this:
>>
>> - new filters in the i40e driver: I don't think the mbuf is the
>>     right place for driver-specific flags. If a feature is brought
>>     by a new driver requiring a flag in mbuf, we should take care that
>>     the flag is not bound to this particular driver and would match
>>     the same feature in another driver.
>> - sequence number: I'm not sure I understand the use-case, maybe this
>>     could stay in a mbuf meta data in the reordering module.
>>
>>> Firstly, we believe that there is no possible way that we can ever fit
>>> all the fields we need to fit into a 64-byte mbuf, and so we need to
>>> start looking at a 128-byte mbuf instead.
>>
>> The TSO patches show that it is possible to keep a 64 bytes mbuf (of course, it
>> depends on what we want to add in the mbuf). I'm not fundamentally against
>> having 128 bytes mbuf. But:
>>
>> - it should not be a reason for just adding things and not reworking
>>     things that could be enhanced
>> - it should not be a reason for not optimizing the current mbuf
>>     structure
>> - if we can do the same with a 64 bytes mbuf, we need to carefuly
>>     compare the solutions as fetching a second cache line is not
>>     costless in all situations. The 64 bytes solution I'm proposing
>>     in [1] may cost a bit more in CPU cycles but avoids an additional
>>     cache prefetch (or miss). In some situations (I'm thinking about
>>     use-cases where we are memory-bound, e.g. an application processing
>>     a lot of data), it is better to loose a few CPU cycles.
>>
>>> First off the blocks is to look at moving the mempool pointer into the
>>> second cache line [...] Beyond this change, I'm also investigating
>>> potentially moving the "next"
>>> pointer to the second cache line, but it's looking harder to move
>>> without serious impact
>>
>> I think we can easily find DPDK applications that would use the "next"
>> field of the mbuf on rx side, as it is the standard way of chaining packets. For
>> instance: IP reassembly, TCP/UDP socket queues, or any other protocol that
>> needs a reassembly queue. This is at least what we do in 6WINDGate fast path
>> stack, and I suppose other network stack implementations would do something
>> similar, so we should probably avoid moving this field to the 2nd cache line.
>>
>> One more issue I do foresee, with slower CPUs like Atom, having 2 cache lines
>> will add more cost than on Xeon. I'm wondering if it make sense to have a
>> compilation time option to select either limited features with one cache line or
>> full features 2 line caches. I don't know if it's a good idea because it would make
>> the code more complex, but we could consider it. I think we don't target binary
>> compatibility today?
>>
>>   From a functional point of view, we could check that my TSO patch can be
>> adapted to your proposal so we can challenge and merge both approaches.
>>
>> As this change would impact the core of DPDK, I think it would be interesting to
>> list some representative use-cases in order to evaluate the cost of each
>> solution. This will also help for future modifications, and could be included in a
>> sort of non-regression test?
>>
>> Regards,
>> Olivier
>>
>> [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html
>> [2] http://dpdk.org/ml/archives/dev/2014-May/002322.html
>
> Hi Olivier
>
> I am trying to convince you on the new field of "filter status".
> It is for matched Flow Director Filter ID, and might be reused for HASH signature if it matches hash filter, or others.
> It is quite useful for Flow Director, and not a flag. I guess there should have the similar feature even in non-Intel NICs.
>
By construction, since a packet cannot match more than 1 filter with an 
associated identifier, this is typically the kind of field that should 
be put in an union with the standard 32-bit RSS id.

Regards,
Ivan

> Regards,
> Helin
>


-- 
Ivan Boule
6WIND Development Engineer