From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: brouer@redhat.com, "David S. Miller" <davem@davemloft.net>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Jason Cooper <jason@lakedaemon.net>, Andrew Lunn <andrew@lunn.ch>,
Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>,
linux-arm-kernel@lists.infradead.org,
Lior Amsalem <alior@marvell.com>,
Nadav Haklai <nadavh@marvell.com>,
Marcin Wojtas <mw@semihalf.com>,
Simon Guinot <simon.guinot@sequanux.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Willy Tarreau <w@1wt.eu>, Timor Kardashov <timork@marvell.com>,
Dmitri Epshtein <dima@marvell.com>,
Sebastian Careba <nitroshift@yahoo.com>
Subject: Re: [PATCH v6 net-next 09/10] net: add a hardware buffer management helper API
Date: Mon, 14 Mar 2016 11:33:18 +0100 [thread overview]
Message-ID: <20160314113318.1a10a6b5@redhat.com> (raw)
In-Reply-To: <1457944745-7634-10-git-send-email-gregory.clement@free-electrons.com>
I've not fully understood the hardware support part.
But I do think this generalization is very interesting work, and would
like to cooperate. If my use-case can fit into this, where my use-case
is in the extreme 100Gbit/s area.
There is some potential for performance improvements, if the API from
start is designed distinguish between being called from NAPI-context
(BH-disabled) and outside NAPI context.
See: netdev_alloc_frag() vs napi_alloc_frag().
Nitpicks inlined below...
On Mon, 14 Mar 2016 09:39:04 +0100
Gregory CLEMENT <gregory.clement@free-electrons.com> wrote:
> This basic implementation allows to share code between driver using
> hardware buffer management. As the code is hardware agnostic, there is
> few helpers, most of the optimization brought by the an HW BM has to be
> done at driver level.
>
> Tested-by: Sebastian Careba <nitroshift@yahoo.com>
> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
> ---
> include/net/hwbm.h | 28 ++++++++++++++++++
> net/Kconfig | 3 ++
> net/core/Makefile | 1 +
> net/core/hwbm.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 119 insertions(+)
> create mode 100644 include/net/hwbm.h
> create mode 100644 net/core/hwbm.c
>
> diff --git a/include/net/hwbm.h b/include/net/hwbm.h
> new file mode 100644
> index 000000000000..47d08662501b
> --- /dev/null
> +++ b/include/net/hwbm.h
> @@ -0,0 +1,28 @@
> +#ifndef _HWBM_H
> +#define _HWBM_H
> +
> +struct hwbm_pool {
> + /* Capacity of the pool */
> + int size;
> + /* Size of the buffers managed */
> + int frag_size;
> + /* Number of buffers currently used by this pool */
> + int buf_num;
> + /* constructor called during alocation */
alocation -> allocation
> + int (*construct)(struct hwbm_pool *bm_pool, void *buf);
> + /* protect acces to the buffer counter*/
acces -> access
Space after "counter"
> + spinlock_t lock;
> + /* private data */
> + void *priv;
> +};
> +#ifdef CONFIG_HWBM
> +void hwbm_buf_free(struct hwbm_pool *bm_pool, void *buf);
> +int hwbm_pool_refill(struct hwbm_pool *bm_pool, gfp_t gfp);
> +int hwbm_pool_add(struct hwbm_pool *bm_pool, unsigned int buf_num, gfp_t gfp);
> +#else
> +void hwbm_buf_free(struct hwbm_pool *bm_pool, void *buf) {}
> +int hwbm_pool_refill(struct hwbm_pool *bm_pool, gfp_t gfp) { return 0; }
> +int hwbm_pool_add(struct hwbm_pool *bm_pool, unsigned int buf_num, gfp_t gfp)
> +{ return 0; }
> +#endif /* CONFIG_HWBM */
> +#endif /* _HWBM_H */
> diff --git a/net/Kconfig b/net/Kconfig
> index 10640d5f8bee..e13449870d06 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -253,6 +253,9 @@ config XPS
> depends on SMP
> default y
>
> +config HWBM
> + bool
> +
> config SOCK_CGROUP_DATA
> bool
> default n
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 014422e2561f..d6508c2ddca5 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -25,4 +25,5 @@ obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
> obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
> obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
> obj-$(CONFIG_DST_CACHE) += dst_cache.o
> +obj-$(CONFIG_HWBM) += hwbm.o
> obj-$(CONFIG_NET_DEVLINK) += devlink.o
> diff --git a/net/core/hwbm.c b/net/core/hwbm.c
> new file mode 100644
> index 000000000000..941c28486896
> --- /dev/null
> +++ b/net/core/hwbm.c
> @@ -0,0 +1,87 @@
> +/* Support for hardware buffer manager.
> + *
> + * Copyright (C) 2016 Marvell
> + *
> + * Gregory CLEMENT <gregory.clement@free-electrons.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/skbuff.h>
> +#include <net/hwbm.h>
> +
> +void hwbm_buf_free(struct hwbm_pool *bm_pool, void *buf)
> +{
> + if (likely(bm_pool->frag_size <= PAGE_SIZE))
> + skb_free_frag(buf);
> + else
> + kfree(buf);
> +}
> +EXPORT_SYMBOL_GPL(hwbm_buf_free);
> +
> +/* Refill processing for HW buffer management */
> +int hwbm_pool_refill(struct hwbm_pool *bm_pool, gfp_t gfp)
> +{
> + int frag_size = bm_pool->frag_size;
> + void *buf;
> +
> + if (likely(frag_size <= PAGE_SIZE))
> + buf = netdev_alloc_frag(frag_size);
If we know the NAPI-context, there is a performance potential in
netdev_alloc_frag() vs napi_alloc_frag().
> + else
> + buf = kmalloc(frag_size, gfp);
> +
> + if (!buf)
> + return -ENOMEM;
> +
> + if (bm_pool->construct)
> + if (bm_pool->construct(bm_pool, buf)) {
> + hwbm_buf_free(bm_pool, buf);
> + return -ENOMEM;
> + }
Why don't we refill more objects? and do so with a bulk of memory
objects? The "refill" name just lead me to believe that the function
might refill several objects...
Maybe that is the role of hwbm_pool_add() ?
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(hwbm_pool_refill);
> +
> +int hwbm_pool_add(struct hwbm_pool *bm_pool, unsigned int buf_num, gfp_t gfp)
> +{
> + int err, i;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&bm_pool->lock, flags);
This might be needed, and you take the lock for several objects. But
the save/restore variant is the most expensive lock we have (at-least
based on my measurements[1] for Intel).
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
> + if (bm_pool->buf_num == bm_pool->size) {
> + pr_warn("pool already filled\n");
> + return bm_pool->buf_num;
> + }
> +
> + if (buf_num + bm_pool->buf_num > bm_pool->size) {
> + pr_warn("cannot allocate %d buffers for pool\n",
> + buf_num);
> + return 0;
> + }
> +
> + if ((buf_num + bm_pool->buf_num) < bm_pool->buf_num) {
> + pr_warn("Adding %d buffers to the %d current buffers will overflow\n",
> + buf_num, bm_pool->buf_num);
> + return 0;
> + }
> +
> + for (i = 0; i < buf_num; i++) {
> + err = hwbm_pool_refill(bm_pool, gfp);
I'm thinking why not use some bulk allocation API here...
> + if (err < 0)
> + break;
> + }
> +
> + /* Update BM driver with number of buffers added to pool */
> + bm_pool->buf_num += i;
> +
> + pr_debug("hwpm pool: %d of %d buffers added\n", i, buf_num);
> + spin_unlock_irqrestore(&bm_pool->lock, flags);
> +
> + return i;
> +}
> +EXPORT_SYMBOL_GPL(hwbm_pool_add);
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-03-14 10:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-14 8:38 [PATCH v6 net-next 00/10] API set for HW Buffer management Gregory CLEMENT
2016-03-14 8:38 ` [PATCH v6 net-next 01/10] misc: sram: add optional ioremap without write combining Gregory CLEMENT
2016-03-14 8:44 ` Gregory CLEMENT
2016-03-14 8:38 ` [PATCH v6 net-next 02/10] ARM: dts: armada-38x: add buffer manager nodes Gregory CLEMENT
2016-03-14 8:38 ` [PATCH v6 net-next 03/10] ARM: dts: armada-38x: enable buffer manager support on Armada 38x boards Gregory CLEMENT
2016-03-14 8:38 ` [PATCH v6 net-next 04/10] ARM: dts: armada-xp: add buffer manager nodes Gregory CLEMENT
2016-03-14 8:39 ` [PATCH v6 net-next 05/10] ARM: dts: armada-xp: enable buffer manager support on Armada XP boards Gregory CLEMENT
2016-03-14 8:39 ` [PATCH v6 net-next 06/10] ARM: dts: armada-xp-openblocks-ax3-4: Add BM support Gregory CLEMENT
2016-03-14 8:39 ` [PATCH v6 net-next 07/10] bus: mvebu-mbus: provide api for obtaining IO and DRAM window information Gregory CLEMENT
2016-03-14 8:39 ` [PATCH v6 net-next 08/10] net: mvneta: bm: add support for hardware buffer management Gregory CLEMENT
2016-03-14 9:57 ` Jesper Dangaard Brouer
2016-03-14 8:39 ` [PATCH v6 net-next 09/10] net: add a hardware buffer management helper API Gregory CLEMENT
2016-03-14 10:33 ` Jesper Dangaard Brouer [this message]
2016-03-14 8:39 ` [PATCH v6 net-next 10/10] net: mvneta: Use the new hwbm framework Gregory CLEMENT
2016-03-14 16:21 ` [PATCH v6 net-next 00/10] API set for HW Buffer management David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160314113318.1a10a6b5@redhat.com \
--to=brouer@redhat.com \
--cc=alior@marvell.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=dima@marvell.com \
--cc=f.fainelli@gmail.com \
--cc=gregory.clement@free-electrons.com \
--cc=jason@lakedaemon.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=mw@semihalf.com \
--cc=nadavh@marvell.com \
--cc=netdev@vger.kernel.org \
--cc=nitroshift@yahoo.com \
--cc=sebastian.hesselbarth@gmail.com \
--cc=simon.guinot@sequanux.org \
--cc=thomas.petazzoni@free-electrons.com \
--cc=timork@marvell.com \
--cc=w@1wt.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).