netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Ryazanov <ryazanov.s.a@gmail.com>
To: "Yanchao Yang (杨彦超)" <Yanchao.Yang@mediatek.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"loic.poulain@linaro.org" <loic.poulain@linaro.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"johannes@sipsolutions.net" <johannes@sipsolutions.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"Chris Feng (冯保林)" <Chris.Feng@mediatek.com>,
	"linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"Mingliang Xu (徐明亮)" <mingliang.xu@mediatek.com>,
	"linuxwwan@mediatek.com" <linuxwwan@mediatek.com>,
	"Min Dong (董敏)" <min.dong@mediatek.com>,
	"m.chetan.kumar@intel.com" <m.chetan.kumar@intel.com>,
	"linuxwwan@intel.com" <linuxwwan@intel.com>,
	"Liang Lu (吕亮)" <liang.lu@mediatek.com>,
	"Haijun Liu (刘海军)" <haijun.liu@mediatek.com>,
	"Haozhe Chang (常浩哲)" <Haozhe.Chang@mediatek.com>,
	"Hua Yang (杨华)" <Hua.Yang@mediatek.com>,
	"Xiayu Zhang (张夏宇)" <Xiayu.Zhang@mediatek.com>,
	"Aiden Wang (王咏麒)" <Aiden.Wang@mediatek.com>,
	"Felix Chen (陈非)" <Felix.Chen@mediatek.com>,
	"Ting Wang (王挺)" <ting.wang@mediatek.com>,
	"Guohao Zhang (张国豪)" <Guohao.Zhang@mediatek.com>,
	"Mingchuang Qiao (乔明闯)" <Mingchuang.Qiao@mediatek.com>,
	"Lambert Wang (王伟)" <Lambert.Wang@mediatek.com>
Subject: Re: [PATCH net-next v1 02/13] net: wwan: tmi: Add buffer management
Date: Sat, 17 Dec 2022 00:17:30 +0400	[thread overview]
Message-ID: <f6b1a1d6-699b-5740-6aa1-6285d673286a@gmail.com> (raw)
In-Reply-To: <a9d4eebb99d53224366c387ae314173b870d134c.camel@mediatek.com>

Hello Yanchao,

sorry for late response, please find some thoughts below.

On 09.12.2022 14:26, Yanchao Yang (杨彦超) wrote:
> On Sun, 2022-12-04 at 22:58 +0400, Sergey Ryazanov wrote:
>> On 22.11.2022 15:11, Yanchao Yang wrote:
>>> From: MediaTek Corporation <linuxwwan@mediatek.com>
>>>
>>> To malloc I/O memory as soon as possible, buffer management comes
>>> into being.
>>> It creates buffer pools that reserve some buffers through deferred
>>> works when
>>> the driver isn't busy.
>>>
>>> The buffer management provides unified memory allocation/de-
>>> allocation
>>> interfaces for other modules. It supports two buffer types of SKB
>>> and page.
>>> Two reload work queues with different priority values are provided
>>> to meet
>>> various requirements of the control plane and the data plane.
>>>
>>> When the reserved buffer count of the pool is less than a threshold
>>> (default
>>> is 2/3 of the pool size), the reload work will restart to allocate
>>> buffers
>>> from the OS until the buffer pool becomes full. When the buffer
>>> pool fills,
>>> the OS will recycle the buffer freed by the user.
>>>
>>> Signed-off-by: Mingliang Xu <mingliang.xu@mediatek.com>
>>> Signed-off-by: MediaTek Corporation <linuxwwan@mediatek.com>
>>> ---
>>>    drivers/net/wwan/mediatek/Makefile  |   3 +-
>>>    drivers/net/wwan/mediatek/mtk_bm.c  | 369
>>> ++++++++++++++++++++++++++++
>>>    drivers/net/wwan/mediatek/mtk_bm.h  |  79 ++++++
>>>    drivers/net/wwan/mediatek/mtk_dev.c |  11 +-
>>>    drivers/net/wwan/mediatek/mtk_dev.h |   1 +
>>>    5 files changed, 461 insertions(+), 2 deletions(-)
>>>    create mode 100644 drivers/net/wwan/mediatek/mtk_bm.c
>>>    create mode 100644 drivers/net/wwan/mediatek/mtk_bm.h
>>
>> Yanchao, can you share some numbers, how this custom pool is
>> outperform
>> the regular kernel allocator?
> Prepare 2 drivers *.ko for comparison.
> Driver A (following named A):  enable pre-allocate buffer pool.
> Driver B (following named A):  disenable pre-allocate buffer pool. It
> uses kernel API directly (__dev_alloc_skb and netdev_alloc_frag)
> 
> Test Instrument: Keysight UXM TA
> Iperf command:
> Server Command: iperf3 -s -p 5002 -i 1
> Client Command: iperf3 -c 192.168.2.1 -p 5002 -i 1 -w 8M -t 30 -R -P 5
> 
> Test result: Fig 1. A’s TCP DL throughput Fig 2. B’s TCP DL throughput
> (Ref attachment)
> 
>  From the results, it represents that the A’s IP packets throughput
> reaches 7Gbits/sec, while B’s throughput is 4.7Gbits/sec. A’s
> throughput is up about 50% compared with B.
> 
> In addition, from ftrace, it represents following results.
> A: it takes 14.241828s for allocating 33211099 buffers. The average
> time is about 0.4us.
> B: it takes 7.677069s for allocating 10890789 buffers. The average time
> is about 0.7us.

Thank you for this impressive comparison test. There is something to 
think about here.

In a common case, the kernel memory API is fast enough to guarantee 
multi-gigabit throughput. So if some custom code outperforms it, then 
either (a) you have found some corner case where the kernel memory API 
is deadly slow and should be improved, or (b) there is something wrong 
with a driver code. My point is that a driver should not implement 
custom memory management since that leads to a driver complexity without 
any real performance improvement.

The test shows the really significant difference between the custom 
memory pool and the direct kernel API calling. So let's try to figure 
out what is going on.

I assume that the control path (CLDMA) could not cause that much 
performance degradation due to the low control messages traffic. So most 
probably the root cause is somewhere in the data path (DPMAIF). Correct 
me if my assumption is wrong.

Digging deeper into the driver code, I noticed that there actually two 
types of pools (buffers). One pool type contains ready-made skbs, and 
the other contains just page fragments. And both types of pools are 
utilized in the data Rx path. Have you tried measuring which type of 
pool improves performance more significantly?

I also noticed that neither allocated skb nor allocated page fragments 
are freed in the DPMAIF code. So the improvement is not connected to 
optimal caching (i.e. memory reuse). Thus memory allocation improvement 
is most likely caused by avoiding of some contention.

The pool reload is performed in the context of work. And if I am not 
mistaken, then skbs and fragments are also taken from preallocated pools 
in the context of work to reinitialize the BAT (Rx) ring buffer. There 
is no difference in the matter of priority. Both the pool reload and the 
Rx ring buffer reload functions are called with the same priority on an 
arbitrary CPU in the absence of other high priority tasks (e.g. 
tasklets, irq). The only obvious difference is the invocation rate. The 
pool reload operation is triggered as soon as the pool level falls below 
the predefined threshold (currently 67%). While the Rx ring reload 
operation is called on each NAPI poll. Have you considered introducing a 
threshold similar to the pool reload threshold and calling the rx ring 
reload less frequently?

--
Sergey

  reply	other threads:[~2022-12-16 20:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22 11:11 [PATCH net-next v1 00/13] net: wwan: tmi: PCIe driver for MediaTek M.2 modem Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 01/13] net: wwan: tmi: Add PCIe core Yanchao Yang
2022-11-24 11:06   ` AngeloGioacchino Del Regno
2022-12-05 12:40     ` Yanchao Yang (杨彦超)
2022-12-05 12:47       ` AngeloGioacchino Del Regno
2022-12-26  1:43     ` Yanchao Yang (杨彦超)
2022-12-04 18:52   ` Sergey Ryazanov
2022-12-07  2:33     ` Yanchao Yang (杨彦超)
2022-12-16 20:40       ` Sergey Ryazanov
2023-01-11 13:33         ` Yanchao Yang (杨彦超)
2022-11-22 11:11 ` [PATCH net-next v1 02/13] net: wwan: tmi: Add buffer management Yanchao Yang
2022-12-04 18:58   ` Sergey Ryazanov
2022-12-09 10:26     ` Yanchao Yang (杨彦超)
2022-12-16 20:17       ` Sergey Ryazanov [this message]
2023-01-11 13:37         ` Yanchao Yang (杨彦超)
2022-11-22 11:11 ` [PATCH net-next v1 03/13] net: wwan: tmi: Add control plane transaction layer Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 04/13] net: wwan: tmi: Add control DMA interface Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 05/13] net: wwan: tmi: Add control port Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 06/13] net: wwan: tmi: Add FSM thread Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 07/13] net: wwan: tmi: Add AT & MBIM WWAN ports Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 08/13] net: wwan: tmi: Introduce data plane hardware interface Yanchao Yang
2022-11-22 11:11 ` [PATCH net-next v1 09/13] net: wwan: tmi: Add data plane transaction layer Yanchao Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6b1a1d6-699b-5740-6aa1-6285d673286a@gmail.com \
    --to=ryazanov.s.a@gmail.com \
    --cc=Aiden.Wang@mediatek.com \
    --cc=Chris.Feng@mediatek.com \
    --cc=Felix.Chen@mediatek.com \
    --cc=Guohao.Zhang@mediatek.com \
    --cc=Haozhe.Chang@mediatek.com \
    --cc=Hua.Yang@mediatek.com \
    --cc=Lambert.Wang@mediatek.com \
    --cc=Mingchuang.Qiao@mediatek.com \
    --cc=Xiayu.Zhang@mediatek.com \
    --cc=Yanchao.Yang@mediatek.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=haijun.liu@mediatek.com \
    --cc=johannes@sipsolutions.net \
    --cc=kuba@kernel.org \
    --cc=liang.lu@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linuxwwan@intel.com \
    --cc=linuxwwan@mediatek.com \
    --cc=loic.poulain@linaro.org \
    --cc=m.chetan.kumar@intel.com \
    --cc=min.dong@mediatek.com \
    --cc=mingliang.xu@mediatek.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ting.wang@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).