linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Zhangshaokun <zhangshaokun@hisilicon.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>,
	John Garry <john.garry@huawei.com>,
	Will Deacon <will.deacon@arm.com>,
	Zhenfa Qiu <qiuzhenfa@hisilicon.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64: cache: Update cache_line_size for HiSilicon certain platform
Date: Wed, 3 Apr 2019 13:57:28 +0100	[thread overview]
Message-ID: <20190403125727.GD34351@arrakis.emea.arm.com> (raw)
In-Reply-To: <c1cecba6-22e7-369a-dd9b-51018ac9cc08@hisilicon.com>

On Tue, Apr 02, 2019 at 03:51:33PM +0800, Zhangshaokun wrote:
> On 2019/3/30 2:52, Catalin Marinas wrote:
> > On Wed, Mar 27, 2019 at 03:16:34PM +0800, Zhangshaokun wrote:
> >> On 2019/3/26 22:55, Catalin Marinas wrote:
> >>> On Tue, Mar 26, 2019 at 02:28:10PM +0800, Shaokun Zhang wrote:
> >>>> When test mlx5 with Kunpeng920 SoC, ib_send_bw is run under the condition
> >>>> that the length of the packet is 4-Byte and only one queue and cpu core:
> >>>> Without this patch: 1.67 Mpps
> >>>> with this patch   : 2.40 Mpps
> >>>
> >>> This needs a better explanation. How does cache_line_size() affect the
> >>> 4-byte packet? Does it send more packets at once?
> >>>
> >>> I've seen in the mlx5 code assumptions about cache_line_size() being
> >>> 128. It looks to me more like some driver hand-tuning for specific
> >>> system configuration. Can the driver be changed to be more generic
> >>
> >> I'm not sure that mlx5 may implement some actions for different cache line
> >> size from different arch or platforms, so the driver needs to read the
> >> right cache_line_size.
> > 
> > We need to better understand why the performance hit but at a quick grep
> > for "128" in the mlx5 code, I can see different code paths executed when
> > cache_line_size() returned 128 (saved in cqe_size). IOW, presuming you
> > can somehow disable the L3C, do you still see the same performance
> > difference?
> 
> Unfortunately, we can't disable L3C separately to check the performance.
> In this platform, if IO operation is cache-able, it can lookup L3C. If we
> return 128-byte, the performance is better.

I presume the device affected (mlx5) is cache coherent (can look up the
L3C).

> One more question about CWG, if cache line size of L1$ and L2$ on A76 is
> 64-byte and it is used on our platform (L3C cache line is 128-byte), Shall
> CWG is 4b'0100(64-byte) or 4'b0101(128-byte)?

The ARM ARM description of CWG states:

  Log 2 of the number of words of the maximum size of memory that can be
  overwritten as a result of the eviction of a cache entry that has had
  a memory location in it modified.

If you have DMA-capable devices on your platform that do not snoop the
L3C, I would say CWG should report 128-byte. Otherwise it probably
doesn't matter from a correctness perspective.

> >> Originally, I thought this interface was used mainly for IO drivers and no
> >> harm to any other places.
> > 
> > Looking through the slab code, cache_line_size() is used when
> > SLAB_HWCACHE_ALIGN is passed and IIUC this is for performance reasons
> > rather than I/O (the DMA alignment is given by ARCH_DMA_MINALIGN which
> > is 128 on arm64).
> 
> Yeah, I miss it.
> ARCH_DMA_MINALIGN is only used for non DMA-coherent, right? It's only defined
> and not used in the drivers.

It's not used by drivers directly but this sets the minimum slab
allocation so that a device can do non-coherent DMA to a kmalloc'ed
buffer for example without the risk of corrupting adjacent objects.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-04-03 12:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26  6:28 [PATCH] arm64: cache: Update cache_line_size for HiSilicon certain platform Shaokun Zhang
2019-03-26 14:55 ` Catalin Marinas
2019-03-27  7:16   ` Zhangshaokun
2019-03-29 18:52     ` Catalin Marinas
2019-04-02  7:51       ` Zhangshaokun
2019-04-03 12:57         ` Catalin Marinas [this message]
2019-04-08  7:51           ` Zhangshaokun
2019-04-16 13:51             ` Will Deacon
2019-04-16 14:23               ` Zhangshaokun
2019-04-16 14:59                 ` Catalin Marinas
2019-04-17  3:41                   ` Zhangshaokun
2019-04-04 10:27         ` Catalin Marinas
2019-04-05  8:29           ` John Garry
2019-04-08  8:24           ` Zhangshaokun
2019-04-02 13:02       ` Suzuki K Poulose
2019-04-08  8:33         ` Zhangshaokun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190403125727.GD34351@arrakis.emea.arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=guohanjun@huawei.com \
    --cc=john.garry@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=qiuzhenfa@hisilicon.com \
    --cc=will.deacon@arm.com \
    --cc=zhangshaokun@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).