* [PATCH net-next 5/7] net: hns3: modify some logs format
From: Huazhong Tan @ 2019-09-10 8:58 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>
From: Guangbin Huang <huangguangbin2@huawei.com>
The pfc_en and pfc_map need to be displayed in hexadecimal notation,
printing dma address should use %pad, and the end of printed string
needs to be add "\n".
This patch modifies them.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 7 +++++--
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 2 +-
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 5cf4c1e..28961a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -166,6 +166,7 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
struct hns3_enet_ring *ring;
u32 tx_index, rx_index;
u32 q_num, value;
+ dma_addr_t addr;
int cnt;
cnt = sscanf(&cmd_buf[8], "%u %u", &q_num, &tx_index);
@@ -194,8 +195,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
}
tx_desc = &ring->desc[tx_index];
+ addr = le64_to_cpu(tx_desc->addr);
dev_info(dev, "TX Queue Num: %u, BD Index: %u\n", q_num, tx_index);
- dev_info(dev, "(TX)addr: 0x%llx\n", tx_desc->addr);
+ dev_info(dev, "(TX)addr: %pad\n", &addr);
dev_info(dev, "(TX)vlan_tag: %u\n", tx_desc->tx.vlan_tag);
dev_info(dev, "(TX)send_size: %u\n", tx_desc->tx.send_size);
dev_info(dev, "(TX)vlan_tso: %u\n", tx_desc->tx.type_cs_vlan_tso);
@@ -217,8 +219,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
rx_index = (cnt == 1) ? value : tx_index;
rx_desc = &ring->desc[rx_index];
+ addr = le64_to_cpu(rx_desc->addr);
dev_info(dev, "RX Queue Num: %u, BD Index: %u\n", q_num, rx_index);
- dev_info(dev, "(RX)addr: 0x%llx\n", rx_desc->addr);
+ dev_info(dev, "(RX)addr: %pad\n", &addr);
dev_info(dev, "(RX)l234_info: %u\n", rx_desc->rx.l234_info);
dev_info(dev, "(RX)pkt_len: %u\n", rx_desc->rx.pkt_len);
dev_info(dev, "(RX)size: %u\n", rx_desc->rx.size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 816f920..c063301 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -342,7 +342,7 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
hdev->tm_info.pfc_en = pfc->pfc_en;
netif_dbg(h, drv, netdev,
- "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+ "set pfc: pfc_en=%x, pfc_map=%x, num_tc=%u\n",
pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
hclge_tm_pfc_info_update(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8d4dc1b..bc5bad3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3751,7 +3751,7 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ)))
hdev->reset_level = HNAE3_FUNC_RESET;
- dev_info(&hdev->pdev->dev, "received reset event , reset type is %d",
+ dev_info(&hdev->pdev->dev, "received reset event, reset type is %d\n",
hdev->reset_level);
/* request reset & schedule reset task */
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 4/7] net: hns3: fix port setting handle for fibre port
From: Huazhong Tan @ 2019-09-10 8:58 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>
From: Guangbin Huang <huangguangbin2@huawei.com>
For hardware doesn't support use specified speed and duplex
to negotiate, it's unnecessary to check and modify the port
speed and duplex for fibre port when autoneg is on.
Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index f5a681d..680c350 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
u8 duplex;
int ret;
+ /* hw doesn't support use specified speed and duplex to negotiate,
+ * unnecessary to check them when autoneg on.
+ */
+ if (cmd->base.autoneg)
+ return 0;
+
if (ops->get_ksettings_an_result) {
ops->get_ksettings_an_result(handle, &autoneg, &speed, &duplex);
if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
@@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
return ret;
}
+ /* hw doesn't support use specified speed and duplex to negotiate,
+ * ignore them when autoneg on.
+ */
+ if (cmd->base.autoneg) {
+ netdev_info(netdev,
+ "autoneg is on, ignore the speed and duplex\n");
+ return 0;
+ }
+
if (ops->cfg_mac_speed_dup_h)
ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
cmd->base.duplex);
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: Huazhong Tan @ 2019-09-10 8:58 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
jakub.kicinski, Guangbin Huang, Huazhong Tan
In-Reply-To: <1568105908-60983-1-git-send-email-tanhuazhong@huawei.com>
From: Guangbin Huang <huangguangbin2@huawei.com>
This patch adds ethtool_ops.set_channels support for HNS3 VF driver,
and updates related TQP information and RSS information, to support
modification of VF TQP number, and uses current rss_size instead of
max_rss_size to initialize RSS.
Also, fixes a format error in hclgevf_get_rss().
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 1 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 87 ++++++++++++++++++++--
2 files changed, 83 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index aa692b1..f5a681d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1397,6 +1397,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
.set_rxfh = hns3_set_rss,
.get_link_ksettings = hns3_get_link_ksettings,
.get_channels = hns3_get_channels,
+ .set_channels = hns3_set_channels,
.get_coalesce = hns3_get_coalesce,
.set_coalesce = hns3_set_coalesce,
.get_regs_len = hns3_get_regs_len,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 594cae8..d77dcc2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -743,7 +743,7 @@ static int hclgevf_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key,
}
static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
- const u8 *key, const u8 hfunc)
+ const u8 *key, const u8 hfunc)
{
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
@@ -2060,9 +2060,10 @@ static int hclgevf_config_gro(struct hclgevf_dev *hdev, bool en)
static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
{
struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
- int i, ret;
+ int ret;
+ u32 i;
- rss_cfg->rss_size = hdev->rss_size_max;
+ rss_cfg->rss_size = hdev->nic.kinfo.rss_size;
if (hdev->pdev->revision >= 0x21) {
rss_cfg->hash_algo = HCLGEVF_RSS_HASH_ALGO_SIMPLE;
@@ -2099,13 +2100,13 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
/* Initialize RSS indirect table */
for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
- rss_cfg->rss_indirection_tbl[i] = i % hdev->rss_size_max;
+ rss_cfg->rss_indirection_tbl[i] = i % rss_cfg->rss_size;
ret = hclgevf_set_rss_indir_table(hdev);
if (ret)
return ret;
- return hclgevf_set_rss_tc_mode(hdev, hdev->rss_size_max);
+ return hclgevf_set_rss_tc_mode(hdev, rss_cfg->rss_size);
}
static int hclgevf_init_vlan_config(struct hclgevf_dev *hdev)
@@ -2835,6 +2836,81 @@ static void hclgevf_get_tqps_and_rss_info(struct hnae3_handle *handle,
*max_rss_size = hdev->rss_size_max;
}
+static void hclgevf_update_rss_size(struct hnae3_handle *handle,
+ u32 new_tqps_num)
+{
+ struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+ struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+ u16 max_rss_size;
+
+ kinfo->req_rss_size = new_tqps_num;
+
+ max_rss_size = min_t(u16, hdev->rss_size_max,
+ hdev->num_tqps / kinfo->num_tc);
+
+ /* Set to user value, no larger than max_rss_size. */
+ if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
+ kinfo->req_rss_size <= max_rss_size) {
+ dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
+ kinfo->rss_size, kinfo->req_rss_size);
+ kinfo->rss_size = kinfo->req_rss_size;
+ } else if (kinfo->rss_size > max_rss_size ||
+ (!kinfo->req_rss_size && kinfo->rss_size < max_rss_size)) {
+ /* Set to the maximum specification value (max_rss_size). */
+ dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
+ kinfo->rss_size, max_rss_size);
+ kinfo->rss_size = max_rss_size;
+ }
+
+ kinfo->num_tqps = kinfo->num_tc * kinfo->rss_size;
+}
+
+static int hclgevf_set_channels(struct hnae3_handle *handle, u32 new_tqps_num,
+ bool rxfh_configured)
+{
+ struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+ struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+ u16 cur_rss_size = kinfo->rss_size;
+ u16 cur_tqps = kinfo->num_tqps;
+ u32 *rss_indir;
+ unsigned int i;
+ int ret;
+
+ hclgevf_update_rss_size(handle, new_tqps_num);
+
+ ret = hclgevf_set_rss_tc_mode(hdev, kinfo->rss_size);
+ if (ret)
+ return ret;
+
+ /* RSS indirection table has been configuared by user */
+ if (rxfh_configured)
+ goto out;
+
+ /* Reinitializes the rss indirect table according to the new RSS size */
+ rss_indir = kcalloc(HCLGEVF_RSS_IND_TBL_SIZE, sizeof(u32), GFP_KERNEL);
+ if (!rss_indir)
+ return -ENOMEM;
+
+ for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
+ rss_indir[i] = i % kinfo->rss_size;
+
+ ret = hclgevf_set_rss(handle, rss_indir, NULL, 0);
+ if (ret)
+ dev_err(&hdev->pdev->dev, "set rss indir table fail, ret=%d\n",
+ ret);
+
+ kfree(rss_indir);
+
+out:
+ if (!ret)
+ dev_info(&hdev->pdev->dev,
+ "Channels changed, rss_size from %u to %u, tqps from %u to %u",
+ cur_rss_size, kinfo->rss_size,
+ cur_tqps, kinfo->rss_size * kinfo->num_tc);
+
+ return ret;
+}
+
static int hclgevf_get_status(struct hnae3_handle *handle)
{
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
@@ -3042,6 +3118,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
.enable_hw_strip_rxvtag = hclgevf_en_hw_strip_rxvtag,
.reset_event = hclgevf_reset_event,
.set_default_reset_request = hclgevf_set_def_reset_request,
+ .set_channels = hclgevf_set_channels,
.get_channels = hclgevf_get_channels,
.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
.get_regs_len = hclgevf_get_regs_len,
--
2.7.4
^ permalink raw reply related
* Re: ❌ FAIL: Stable queue: queue-5.2
From: Greg KH @ 2019-09-10 8:58 UTC (permalink / raw)
To: Hangbin Liu
Cc: CKI Project, Linux Stable maillist, netdev, Jan Stancek,
Xiumei Mu, David Howells, linux-afs
In-Reply-To: <20190910081956.GG22496@dhcp-12-139.nay.redhat.com>
On Tue, Sep 10, 2019 at 04:19:56PM +0800, Hangbin Liu wrote:
> On Wed, Aug 28, 2019 at 08:36:14AM -0400, CKI Project wrote:
> >
> > Hello,
> >
> > We ran automated tests on a patchset that was proposed for merging into this
> > kernel tree. The patches were applied to:
> >
> > Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> > Commit: f7d5b3dc4792 - Linux 5.2.10
> >
> > The results of these automated tests are provided below.
> >
> > Overall result: FAILED (see details below)
> > Merge: OK
> > Compile: OK
> > Tests: FAILED
> >
> > All kernel binaries, config files, and logs are available for download here:
> >
> > https://artifacts.cki-project.org/pipelines/128519
> >
> >
> >
> > One or more kernel tests failed:
> >
> > x86_64:
> > ❌ Networking socket: fuzz
>
> Sorry, maybe the info is a little late, I just found the call traces for this
> failure.
And this is no longer failing?
What is the "fuzz" test?
greg k-h
^ permalink raw reply
* [PATCH net] net: sonic: replace dev_kfree_skb in sonic_send_packet
From: Mao Wenan @ 2019-09-10 8:58 UTC (permalink / raw)
To: tsbogend, davem; +Cc: netdev, linux-kernel, kernel-janitors, Mao Wenan
sonic_send_packet will be processed in irq or none
irq context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.
Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
drivers/net/ethernet/natsemi/sonic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index 18fd62fbfb64..b339125b2f09 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -233,7 +233,7 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
if (!laddr) {
pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", dev->name);
- dev_kfree_skb(skb);
+ dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Yonghong Song @ 2019-09-10 8:37 UTC (permalink / raw)
To: Sami Tolvanen, Alexei Starovoitov, Daniel Borkmann
Cc: Kees Cook, Martin Lau, Song Liu, netdev@vger.kernel.org,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190909223236.157099-1-samitolvanen@google.com>
On 9/9/19 11:32 PM, Sami Tolvanen wrote:
> With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically
> generated code. This change adds basic sanity checking to ensure
> we are jumping to a valid location, which narrows down the attack
> surface on the stored pointer. This also prepares the code for future
> Control-Flow Integrity (CFI) checking, which adds indirect call
> validation to call targets that can be determined at compile-time, but
> cannot validate calls to jited functions.
>
> In addition, this change adds a weak arch_bpf_jit_check_func function,
> which architectures that implement BPF JIT can override to perform
> additional validation, such as verifying that the pointer points to
> the correct memory region.
You did not mention BPF_BINARY_HEADER_MAGIC and added member
of `magic` in bpf_binary_header. Could you add some details
on what is the purpose for this `magic` member?
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
> include/linux/filter.h | 26 ++++++++++++++++++++++++--
> kernel/bpf/core.c | 25 +++++++++++++++++++++++++
> 2 files changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 92c6e31fb008..abfb0e1b21a8 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -511,7 +511,10 @@ struct sock_fprog_kern {
> struct sock_filter *filter;
> };
>
> +#define BPF_BINARY_HEADER_MAGIC 0x05de0e82
> +
> struct bpf_binary_header {
> + u32 magic;
> u32 pages;
> /* Some arches need word alignment for their instructions */
> u8 image[] __aligned(4);
> @@ -553,20 +556,39 @@ struct sk_filter {
>
> DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>
> +#ifdef CONFIG_BPF_JIT
> +/*
> + * With JIT, the kernel makes an indirect call to dynamically generated
> + * code. Use bpf_call_func to perform additional validation of the call
> + * target to narrow down attack surface. Architectures implementing BPF
> + * JIT can override arch_bpf_jit_check_func for arch-specific checking.
> + */
> +extern unsigned int bpf_call_func(const struct bpf_prog *prog,
> + const void *ctx);
> +
> +extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog);
> +#else
> +static inline unsigned int bpf_call_func(const struct bpf_prog *prog,
> + const void *ctx)
> +{
> + return prog->bpf_func(ctx, prog->insnsi);
> +}
> +#endif
> +
> #define BPF_PROG_RUN(prog, ctx) ({ \
> u32 ret; \
> cant_sleep(); \
> if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
> struct bpf_prog_stats *stats; \
> u64 start = sched_clock(); \
> - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> + ret = bpf_call_func(prog, ctx); \
> stats = this_cpu_ptr(prog->aux->stats); \
> u64_stats_update_begin(&stats->syncp); \
> stats->cnt++; \
> stats->nsecs += sched_clock() - start; \
> u64_stats_update_end(&stats->syncp); \
> } else { \
> - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> + ret = bpf_call_func(prog, ctx); \
> } \
> ret; })
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 66088a9e9b9e..7aad58f67105 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr)
> module_memfree(addr);
> }
>
> +#ifdef CONFIG_BPF_JIT
> +bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog)
> +{
> + return true;
> +}
> +
> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> +{
> + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> +
> + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> + return prog->bpf_func(ctx, prog->insnsi);
> +
> + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> + !arch_bpf_jit_check_func(prog))) {
> + WARN(1, "attempt to jump to an invalid address");
> + return 0;
> + }
> +
> + return prog->bpf_func(ctx, prog->insnsi);
> +}
The above can be rewritten as
if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
hdr->magic != BPF_BINARY_HEADER_MAGIC ||
!arch_bpf_jit_check_func(prog))) {
WARN(1, "attempt to jump to an invalid address");
return 0;
}
return prog->bpf_func(ctx, prog->insnsi);
BPF_PROG_RUN() will be called during xdp fast path.
Have you measured how much slowdown the above change could
cost for the performance?
> +EXPORT_SYMBOL_GPL(bpf_call_func);
> +#endif
> +
> struct bpf_binary_header *
> bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> unsigned int alignment,
> @@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> /* Fill space with illegal/arch-dep instructions. */
> bpf_fill_ill_insns(hdr, size);
>
> + hdr->magic = BPF_BINARY_HEADER_MAGIC;
> hdr->pages = pages;
> hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
> PAGE_SIZE - sizeof(*hdr));
>
^ permalink raw reply
* RE: [PATCH net-next v2 2/2] net: stmmac: Support enhanced addressing mode for DWMAC 4.10
From: Jose Abreu @ 2019-09-10 8:35 UTC (permalink / raw)
To: Thierry Reding, Jose Abreu
Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
linux-tegra@vger.kernel.org
In-Reply-To: <20190909191329.GB23804@mithrandir>
From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/09/2019, 20:13:29 (UTC+00:00)
> On Mon, Sep 09, 2019 at 04:05:52PM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 16:25:46 (UTC+00:00)
> >
> > > @@ -79,6 +79,10 @@ static void dwmac4_dma_init_rx_chan(void __iomem *ioaddr,
> > > value = value | (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
> > > writel(value, ioaddr + DMA_CHAN_RX_CONTROL(chan));
> > >
> > > + if (dma_cfg->eame)
> >
> > There is no need for this check. If EAME is not enabled then upper 32
> > bits will be zero.
>
> The idea here was to potentially guard against this register not being
> available on some revisions. Having the check here would avoid access to
> the register if the device doesn't support enhanced addressing.
I see your point but I don't think there will be any problems unless you
have some strange system that doesn't handle the write accesses to
unimplemented features properly ...
---
Thanks,
Jose Miguel Abreu
^ permalink raw reply
* RE: [PATCH net-next v2 1/2] net: stmmac: Only enable enhanced addressing mode when needed
From: Jose Abreu @ 2019-09-10 8:32 UTC (permalink / raw)
To: Thierry Reding, Jose Abreu
Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
linux-tegra@vger.kernel.org
In-Reply-To: <20190909191127.GA23804@mithrandir>
From: Thierry Reding <thierry.reding@gmail.com>
Date: Sep/09/2019, 20:11:27 (UTC+00:00)
> On Mon, Sep 09, 2019 at 04:07:04PM +0000, Jose Abreu wrote:
> > From: Thierry Reding <thierry.reding@gmail.com>
> > Date: Sep/09/2019, 16:25:45 (UTC+00:00)
> >
> > > @@ -92,6 +92,7 @@ struct stmmac_dma_cfg {
> > > int fixed_burst;
> > > int mixed_burst;
> > > bool aal;
> > > + bool eame;
> >
> > bools should not be used in struct's, please change to int.
>
> Huh? Since when? "aal" right above it is also bool. Can you provide a
> specific rationale for why we shouldn't use bool in structs?
Please see https://lkml.org/lkml/2017/11/21/384.
---
Thanks,
Jose
Miguel Abreu
^ permalink raw reply
* Re: [PATCH] net/mlx4_en: ethtool: make array modes static const, makes object smaller
From: David Miller @ 2019-09-10 8:29 UTC (permalink / raw)
To: colin.king; +Cc: tariqt, netdev, linux-rdma, kernel-janitors, linux-kernel
In-Reply-To: <20190906115348.16621-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Fri, 6 Sep 2019 12:53:48 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> Don't populate the array modes on the stack but instead make it
> static const. Makes the object code smaller by 303 bytes.
>
> Before:
> text data bss dec hex filename
> 51240 5008 1312 57560 e0d8 mellanox/mlx4/en_ethtool.o
>
> After:
> text data bss dec hex filename
> 50937 5008 1312 57257 dfa9 mellanox/mlx4/en_ethtool.o
>
> (gcc version 9.2.1, amd64)
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied to net-next.
^ permalink raw reply
* [RFC PATCH 4/4] docs: Sample driver to demonstrate how to implement virtio-mdev framework
From: Jason Wang @ 2019-09-10 8:19 UTC (permalink / raw)
To: mst, jasowang, kvm, virtualization, netdev
Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>
This sample driver creates mdev device that simulate virtio net device
over virtio mdev transport. The device is implemented through vringh
and workqueue.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
samples/Kconfig | 7 +
samples/vfio-mdev/Makefile | 1 +
samples/vfio-mdev/mvnet.c | 766 +++++++++++++++++++++++++++++++++++++
3 files changed, 774 insertions(+)
create mode 100644 samples/vfio-mdev/mvnet.c
diff --git a/samples/Kconfig b/samples/Kconfig
index c8dacb4dda80..a1a1ca2c00b7 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -131,6 +131,13 @@ config SAMPLE_VFIO_MDEV_MDPY
mediated device. It is a simple framebuffer and supports
the region display interface (VFIO_GFX_PLANE_TYPE_REGION).
+config SAMPLE_VIRTIO_MDEV_NET
+ tristate "Build virtio mdev net example mediated device sample code -- loadable modules only"
+ depends on VIRTIO_MDEV_DEVICE && VHOST_RING && m
+ help
+ Build a networking sample device for use as a virtio
+ mediated device.
+
config SAMPLE_VFIO_MDEV_MDPY_FB
tristate "Build VFIO mdpy example guest fbdev driver -- loadable module only"
depends on FB && m
diff --git a/samples/vfio-mdev/Makefile b/samples/vfio-mdev/Makefile
index 10d179c4fdeb..f34af90ed0a0 100644
--- a/samples/vfio-mdev/Makefile
+++ b/samples/vfio-mdev/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_SAMPLE_VFIO_MDEV_MTTY) += mtty.o
obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY) += mdpy.o
obj-$(CONFIG_SAMPLE_VFIO_MDEV_MDPY_FB) += mdpy-fb.o
obj-$(CONFIG_SAMPLE_VFIO_MDEV_MBOCHS) += mbochs.o
+obj-$(CONFIG_SAMPLE_VIRTIO_MDEV_NET) += mvnet.o
diff --git a/samples/vfio-mdev/mvnet.c b/samples/vfio-mdev/mvnet.c
new file mode 100644
index 000000000000..da295b41955e
--- /dev/null
+++ b/samples/vfio-mdev/mvnet.c
@@ -0,0 +1,766 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Mediated virtual virtio-net device driver.
+ *
+ * Copyright (c) 2019, Red Hat Inc. All rights reserved.
+ * Author: Jason Wang <jasowang@redhat.com>
+ *
+ * Sample driver that creates mdev device that simulates ethernet
+ * device virtio mdev transport.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/uuid.h>
+#include <linux/iommu.h>
+#include <linux/sysfs.h>
+#include <linux/file.h>
+#include <linux/etherdevice.h>
+#include <linux/mdev.h>
+#include <uapi/linux/virtio_mdev.h>
+
+#define VERSION_STRING "0.1"
+#define DRIVER_AUTHOR "NVIDIA Corporation"
+
+#define MVNET_CLASS_NAME "mvnet"
+
+#define MVNET_NAME "mvnet"
+
+/*
+ * Global Structures
+ */
+
+static struct mvnet_dev {
+ struct class *vd_class;
+ struct idr vd_idr;
+ struct device dev;
+} mvnet_dev;
+
+struct mvnet_virtqueue {
+ struct vringh vring;
+ struct vringh_kiov iov;
+ unsigned short head;
+ bool ready;
+ u32 desc_addr_lo;
+ u32 desc_addr_hi;
+ u32 device_addr_lo;
+ u32 device_addr_hi;
+ u32 driver_addr_lo;
+ u32 driver_addr_hi;
+ u64 desc_addr;
+ u64 device_addr;
+ u64 driver_addr;
+ void *private;
+ irqreturn_t (*cb)(void *);
+};
+
+#define MVNET_QUEUE_ALIGN PAGE_SIZE
+#define MVNET_QUEUE_MAX 256
+#define MVNET_MAGIC_VALUE ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)
+#define MVNET_VERSION 0x1 /* Implies virtio 1.0 */
+#define MVNET_DEVICE_ID 0x1 /* network card */
+#define MVNET_VENDOR_ID 0 /* is this correct ? */
+#define MVNET_DEVICE_FEATURES VIRTIO_F_VERSION_1
+
+u64 mvnet_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
+ (1ULL << VIRTIO_F_VERSION_1) |
+ (1ULL << VIRTIO_F_IOMMU_PLATFORM) ;
+
+/* State of each mdev device */
+struct mvnet_state {
+ struct mvnet_virtqueue vqs[2];
+ struct work_struct work;
+ spinlock_t lock;
+ struct mdev_device *mdev;
+ struct virtio_net_config config;
+ struct virtio_mdev_callback *cbs;
+ void *buffer;
+ u32 queue_sel;
+ u32 driver_features_sel;
+ u32 driver_features[2];
+ u32 device_features_sel;
+ u32 status;
+ u32 generation;
+ u32 num;
+ struct list_head next;
+};
+
+static struct mutex mdev_list_lock;
+static struct list_head mdev_devices_list;
+
+static void mvnet_queue_ready(struct mvnet_state *mvnet, unsigned idx)
+{
+ struct mvnet_virtqueue *vq = &mvnet->vqs[idx];
+ int ret;
+
+ vq->desc_addr = (u64)vq->desc_addr_hi << 32 | vq->desc_addr_lo;
+ vq->device_addr = (u64)vq->device_addr_hi << 32 | vq->device_addr_lo;
+ vq->driver_addr = (u64)vq->driver_addr_hi << 32 | vq->driver_addr_lo;
+
+ ret = vringh_init_kern(&vq->vring, mvnet_features, MVNET_QUEUE_MAX,
+ false, (struct vring_desc *)vq->desc_addr,
+ (struct vring_avail *)vq->driver_addr,
+ (struct vring_used *)vq->device_addr);
+}
+
+static ssize_t mvnet_read_config(struct mdev_device *mdev,
+ u32 *val, loff_t pos)
+{
+ struct mvnet_state *mvnet;
+ struct mvnet_virtqueue *vq;
+ u32 queue_sel;
+
+ if (!mdev || !val)
+ return -EINVAL;
+
+ mvnet = mdev_get_drvdata(mdev);
+ if (!mvnet) {
+ pr_err("%s mvnet not found\n", __func__);
+ return -EINVAL;
+ }
+
+ queue_sel = mvnet->queue_sel;
+ vq = &mvnet->vqs[queue_sel];
+
+ switch (pos) {
+ case VIRTIO_MDEV_MAGIC_VALUE:
+ *val = MVNET_MAGIC_VALUE;
+ break;
+ case VIRTIO_MDEV_VERSION:
+ *val = MVNET_VERSION;
+ break;
+ case VIRTIO_MDEV_DEVICE_ID:
+ *val = MVNET_DEVICE_ID;
+ break;
+ case VIRTIO_MDEV_VENDOR_ID:
+ *val = MVNET_VENDOR_ID;
+ break;
+ case VIRTIO_MDEV_DEVICE_FEATURES:
+ if (mvnet->device_features_sel)
+ *val = mvnet_features >> 32;
+ else
+ *val = mvnet_features;
+ break;
+ case VIRTIO_MDEV_QUEUE_NUM_MAX:
+ *val = MVNET_QUEUE_MAX;
+ break;
+ case VIRTIO_MDEV_QUEUE_READY:
+ *val = vq->ready;
+ break;
+ case VIRTIO_MDEV_QUEUE_ALIGN:
+ *val = MVNET_QUEUE_ALIGN;
+ break;
+ case VIRTIO_MDEV_STATUS:
+ *val = mvnet->status;
+ break;
+ case VIRTIO_MDEV_QUEUE_DESC_LOW:
+ *val = vq->desc_addr_lo;
+ break;
+ case VIRTIO_MDEV_QUEUE_DESC_HIGH:
+ *val = vq->desc_addr_hi;
+ break;
+ case VIRTIO_MDEV_QUEUE_AVAIL_LOW:
+ *val = vq->driver_addr_lo;
+ break;
+ case VIRTIO_MDEV_QUEUE_AVAIL_HIGH:
+ *val = vq->driver_addr_hi;
+ break;
+ case VIRTIO_MDEV_QUEUE_USED_LOW:
+ *val = vq->device_addr_lo;
+ break;
+ case VIRTIO_MDEV_QUEUE_USED_HIGH:
+ *val = vq->device_addr_hi;
+ break;
+ case VIRTIO_MDEV_CONFIG_GENERATION:
+ *val = 1;
+ break;
+ default:
+ pr_err("Unsupported mdev read offset at 0x%x\n", pos);
+ break;
+ }
+
+ return 4;
+}
+
+static ssize_t mvnet_read_net_config(struct mdev_device *mdev,
+ char *buf, size_t count, loff_t pos)
+{
+ struct mvnet_state *mvnet = mdev_get_drvdata(mdev);
+
+ if (!mvnet) {
+ pr_err("%s mvnet not found\n", __func__);
+ return -EINVAL;
+ }
+
+ if (pos + count > sizeof(mvnet->config))
+ return -EINVAL;
+
+ memcpy(buf, &mvnet->config + (unsigned)pos, count);
+
+ return count;
+}
+
+static void mvnet_vq_reset(struct mvnet_virtqueue *vq)
+{
+ vq->ready = 0;
+ vq->desc_addr_lo = vq->desc_addr_hi = 0;
+ vq->device_addr_lo = vq->device_addr_hi = 0;
+ vq->driver_addr_lo = vq->driver_addr_hi = 0;
+ vq->desc_addr = 0;
+ vq->driver_addr = 0;
+ vq->device_addr = 0;
+ vringh_init_kern(&vq->vring, mvnet_features, MVNET_QUEUE_MAX,
+ false, 0, 0, 0);
+}
+
+static void mvnet_reset(struct mvnet_state *mvnet)
+{
+ int i;
+
+ for (i = 0; i < 2; i++)
+ mvnet_vq_reset(&mvnet->vqs[i]);
+
+ mvnet->queue_sel = 0;
+ mvnet->driver_features_sel = 0;
+ mvnet->device_features_sel = 0;
+ mvnet->status = 0;
+ ++mvnet->generation;
+}
+
+static ssize_t mvnet_write_config(struct mdev_device *mdev,
+ u32 *val, loff_t pos)
+{
+ struct mvnet_state *mvnet;
+ struct mvnet_virtqueue *vq;
+ u32 queue_sel;
+
+ if (!mdev || !val)
+ return -EINVAL;
+
+ mvnet = mdev_get_drvdata(mdev);
+ if (!mvnet) {
+ pr_err("%s mvnet not found\n", __func__);
+ return -EINVAL;
+ }
+
+ queue_sel = mvnet->queue_sel;
+ vq = &mvnet->vqs[queue_sel];
+
+ switch (pos) {
+ case VIRTIO_MDEV_DEVICE_FEATURES_SEL:
+ mvnet->device_features_sel = *val;
+ break;
+ case VIRTIO_MDEV_DRIVER_FEATURES:
+ mvnet->driver_features[mvnet->driver_features_sel] = *val;
+ break;
+ case VIRTIO_MDEV_DRIVER_FEATURES_SEL:
+ mvnet->driver_features_sel = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_SEL:
+ mvnet->queue_sel = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_NUM:
+ mvnet->num = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_READY:
+ vq->ready = *val;
+ if (vq->ready) {
+ spin_lock(&mvnet->lock);
+ mvnet_queue_ready(mvnet, queue_sel);
+ spin_unlock(&mvnet->lock);
+ }
+ break;
+ case VIRTIO_MDEV_QUEUE_NOTIFY:
+ if (vq->ready)
+ schedule_work(&mvnet->work);
+ break;
+ case VIRTIO_MDEV_STATUS:
+ mvnet->status = *val;
+ if (*val == 0) {
+ spin_lock(&mvnet->lock);
+ mvnet_reset(mvnet);
+ spin_unlock(&mvnet->lock);
+ }
+ break;
+ case VIRTIO_MDEV_QUEUE_DESC_LOW:
+ vq->desc_addr_lo = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_DESC_HIGH:
+ vq->desc_addr_hi = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_AVAIL_LOW:
+ vq->driver_addr_lo = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_AVAIL_HIGH:
+ vq->driver_addr_hi = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_USED_LOW:
+ vq->device_addr_lo = *val;
+ break;
+ case VIRTIO_MDEV_QUEUE_USED_HIGH:
+ vq->device_addr_hi = *val;
+ break;
+ default:
+ pr_err("Unsupported write offset! 0x%x\n", pos);
+ break;
+ }
+ spin_unlock(&mvnet->lock);
+
+ return 4;
+}
+
+static void mvnet_work(struct work_struct *work)
+{
+ struct mvnet_state *mvnet = container_of(work, struct
+ mvnet_state, work);
+ struct mvnet_virtqueue *txq = &mvnet->vqs[1];
+ struct mvnet_virtqueue *rxq = &mvnet->vqs[0];
+ size_t read, write, total_write;
+ unsigned long flags;
+ int err;
+ int pkts = 0;
+
+ spin_lock(&mvnet->lock);
+
+ if (!txq->ready || !rxq->ready)
+ goto out;
+
+ while (true) {
+ total_write = 0;
+ err = vringh_getdesc_kern(&txq->vring, &txq->iov, NULL,
+ &txq->head, GFP_KERNEL);
+ if (err <= 0)
+ break;
+
+ err = vringh_getdesc_kern(&rxq->vring, NULL, &rxq->iov,
+ &rxq->head, GFP_KERNEL);
+ if (err <= 0) {
+ vringh_complete_kern(&txq->vring, txq->head, 0);
+ break;
+ }
+
+ while (true) {
+ read = vringh_iov_pull_kern(&txq->iov, mvnet->buffer,
+ PAGE_SIZE);
+ if (read <= 0)
+ break;
+
+ write = vringh_iov_push_kern(&rxq->iov, mvnet->buffer,
+ read);
+ if (write <= 0)
+ break;
+
+ total_write += write;
+ }
+
+ /* Make sure data is wrote before advancing index */
+ smp_wmb();
+
+ vringh_complete_kern(&txq->vring, txq->head, 0);
+ vringh_complete_kern(&rxq->vring, rxq->head, total_write);
+
+ /* Make sure used is visible before rasing the
+ interrupt */
+ smp_wmb();
+
+ local_bh_disable();
+ if (txq->cb)
+ txq->cb(txq->private);
+ if (rxq->cb)
+ rxq->cb(rxq->private);
+ local_bh_enable();
+
+ pkts ++;
+ if (pkts > 4) {
+ schedule_work(&mvnet->work);
+ goto out;
+ }
+ }
+
+out:
+ spin_unlock(&mvnet->lock);
+}
+
+static dma_addr_t mvnet_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ /* Vringh can only use VA */
+ return page_address(page) + offset;
+}
+
+static void mvnet_unmap_page(struct device *dev, dma_addr_t dma_addr,
+ size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return ;
+}
+
+static void *mvnet_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_addr, gfp_t flag,
+ unsigned long attrs)
+{
+ void *ret = kmalloc(size, flag);
+
+ if (ret == NULL)
+ *dma_addr = DMA_MAPPING_ERROR;
+ else
+ *dma_addr = ret;
+
+ return ret;
+}
+
+static void mvnet_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_addr,
+ unsigned long attrs)
+{
+ kfree(dma_addr);
+}
+
+static const struct dma_map_ops mvnet_dma_ops = {
+ .map_page = mvnet_map_page,
+ .unmap_page = mvnet_unmap_page,
+ .alloc = mvnet_alloc_coherent,
+ .free = mvnet_free_coherent,
+};
+
+static int mvnet_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+ struct mvnet_state *mvnet;
+ struct virtio_net_config *config;
+
+ if (!mdev)
+ return -EINVAL;
+
+ mvnet = kzalloc(sizeof(struct mvnet_state), GFP_KERNEL);
+ if (mvnet == NULL)
+ return -ENOMEM;
+
+ mvnet->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!mvnet->buffer) {
+ kfree(mvnet);
+ return -ENOMEM;
+ }
+
+ config = &mvnet->config;
+ config->mtu = 1500;
+ config->status = VIRTIO_NET_S_LINK_UP;
+ eth_random_addr(config->mac);
+
+ INIT_WORK(&mvnet->work, mvnet_work);
+
+ spin_lock_init(&mvnet->lock);
+ mvnet->mdev = mdev;
+ mdev_set_drvdata(mdev, mvnet);
+
+ mutex_lock(&mdev_list_lock);
+ list_add(&mvnet->next, &mdev_devices_list);
+ mutex_unlock(&mdev_list_lock);
+
+ mdev_set_dma_ops(mdev, &mvnet_dma_ops);
+
+ return 0;
+}
+
+static int mvnet_remove(struct mdev_device *mdev)
+{
+ struct mvnet_state *mds, *tmp_mds;
+ struct mvnet_state *mvnet = mdev_get_drvdata(mdev);
+ int ret = -EINVAL;
+
+ mutex_lock(&mdev_list_lock);
+ list_for_each_entry_safe(mds, tmp_mds, &mdev_devices_list, next) {
+ if (mvnet == mds) {
+ list_del(&mvnet->next);
+ mdev_set_drvdata(mdev, NULL);
+ kfree(mvnet->buffer);
+ kfree(mvnet);
+ ret = 0;
+ break;
+ }
+ }
+ mutex_unlock(&mdev_list_lock);
+
+ return ret;
+}
+
+static ssize_t mvnet_read(struct mdev_device *mdev, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ ssize_t ret;
+
+ if (*ppos < VIRTIO_MDEV_CONFIG) {
+ if (count == 4)
+ ret = mvnet_read_config(mdev, (u32 *)buf, *ppos);
+ else
+ ret = -EINVAL;
+ *ppos += 4;
+ } else if (*ppos < VIRTIO_MDEV_CONFIG + sizeof(struct virtio_net_config)) {
+ ret = mvnet_read_net_config(mdev, buf, count,
+ *ppos - VIRTIO_MDEV_CONFIG);
+ *ppos += count;
+ } else {
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static ssize_t mvnet_write(struct mdev_device *mdev, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int ret;
+
+ if (*ppos < VIRTIO_MDEV_CONFIG) {
+ if (count == 4)
+ ret = mvnet_write_config(mdev, (u32 *)buf, *ppos);
+ else
+ ret = -EINVAL;
+ *ppos += 4;
+ } else {
+ /* No writable net config */
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static long mvnet_ioctl(struct mdev_device *mdev, unsigned int cmd,
+ unsigned long arg)
+{
+ int ret = 0;
+ struct mvnet_state *mvnet;
+ struct virtio_mdev_callback *cb;
+
+ if (!mdev)
+ return -EINVAL;
+
+ mvnet = mdev_get_drvdata(mdev);
+ if (!mvnet)
+ return -ENODEV;
+
+ spin_lock(&mvnet->lock);
+
+ switch (cmd) {
+ case VIRTIO_MDEV_SET_VQ_CALLBACK:
+ cb = (struct virtio_mdev_callback *)arg;
+ mvnet->vqs[mvnet->queue_sel].cb = cb->callback;
+ mvnet->vqs[mvnet->queue_sel].private = cb->private;
+ break;
+ case VIRTIO_MDEV_SET_CONFIG_CALLBACK:
+ break;
+ default:
+ pr_err("Not supportted ioctl cmd 0x%x\n", cmd);
+ ret = -ENOTTY;
+ break;
+ }
+
+ spin_unlock(&mvnet->lock);
+
+ return ret;
+}
+
+static int mvnet_open(struct mdev_device *mdev)
+{
+ pr_info("%s\n", __func__);
+ return 0;
+}
+
+static void mvnet_close(struct mdev_device *mdev)
+{
+ pr_info("%s\n", __func__);
+}
+
+static ssize_t
+sample_mvnet_dev_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "This is phy device\n");
+}
+
+static DEVICE_ATTR_RO(sample_mvnet_dev);
+
+static struct attribute *mvnet_dev_attrs[] = {
+ &dev_attr_sample_mvnet_dev.attr,
+ NULL,
+};
+
+static const struct attribute_group mvnet_dev_group = {
+ .name = "mvnet_dev",
+ .attrs = mvnet_dev_attrs,
+};
+
+static const struct attribute_group *mvnet_dev_groups[] = {
+ &mvnet_dev_group,
+ NULL,
+};
+
+static ssize_t
+sample_mdev_dev_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ if (mdev_from_dev(dev))
+ return sprintf(buf, "This is MDEV %s\n", dev_name(dev));
+
+ return sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR_RO(sample_mdev_dev);
+
+static struct attribute *mdev_dev_attrs[] = {
+ &dev_attr_sample_mdev_dev.attr,
+ NULL,
+};
+
+static const struct attribute_group mdev_dev_group = {
+ .name = "vendor",
+ .attrs = mdev_dev_attrs,
+};
+
+static const struct attribute_group *mdev_dev_groups[] = {
+ &mdev_dev_group,
+ NULL,
+};
+
+#define MVNET_STRING_LEN 16
+
+static ssize_t
+name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ char name[MVNET_STRING_LEN];
+ const char *name_str = "virtio-net";
+
+ snprintf(name, MVNET_STRING_LEN, "%s", dev_driver_string(dev));
+ if (!strcmp(kobj->name, name))
+ return sprintf(buf, "%s\n", name_str);
+
+ return -EINVAL;
+}
+
+static MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ return sprintf(buf, "%d\n", INT_MAX);
+}
+
+static MDEV_TYPE_ATTR_RO(available_instances);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+ char *buf)
+{
+ return sprintf(buf, "%s\n", VIRTIO_MDEV_DEVICE_API_STRING);
+}
+
+static MDEV_TYPE_ATTR_RO(device_api);
+
+static struct attribute *mdev_types_attrs[] = {
+ &mdev_type_attr_name.attr,
+ &mdev_type_attr_device_api.attr,
+ &mdev_type_attr_available_instances.attr,
+ NULL,
+};
+
+static struct attribute_group mdev_type_group = {
+ .name = "",
+ .attrs = mdev_types_attrs,
+};
+
+static struct attribute_group *mdev_type_groups[] = {
+ &mdev_type_group,
+ NULL,
+};
+
+static const struct mdev_parent_ops mdev_fops = {
+ .owner = THIS_MODULE,
+ .dev_attr_groups = mvnet_dev_groups,
+ .mdev_attr_groups = mdev_dev_groups,
+ .supported_type_groups = mdev_type_groups,
+ .create = mvnet_create,
+ .remove = mvnet_remove,
+ .open = mvnet_open,
+ .release = mvnet_close,
+ .read = mvnet_read,
+ .write = mvnet_write,
+ .ioctl = mvnet_ioctl,
+};
+
+static void mvnet_device_release(struct device *dev)
+{
+ dev_dbg(dev, "mvnet: released\n");
+}
+
+static int __init mvnet_dev_init(void)
+{
+ int ret = 0;
+
+ pr_info("mvnet_dev: %s\n", __func__);
+
+ memset(&mvnet_dev, 0, sizeof(mvnet_dev));
+
+ idr_init(&mvnet_dev.vd_idr);
+
+ mvnet_dev.vd_class = class_create(THIS_MODULE, MVNET_CLASS_NAME);
+
+ if (IS_ERR(mvnet_dev.vd_class)) {
+ pr_err("Error: failed to register mvnet_dev class\n");
+ ret = PTR_ERR(mvnet_dev.vd_class);
+ goto failed1;
+ }
+
+ mvnet_dev.dev.class = mvnet_dev.vd_class;
+ mvnet_dev.dev.release = mvnet_device_release;
+ dev_set_name(&mvnet_dev.dev, "%s", MVNET_NAME);
+
+ ret = device_register(&mvnet_dev.dev);
+ if (ret)
+ goto failed2;
+
+ ret = mdev_register_device(&mvnet_dev.dev, &mdev_fops);
+ if (ret)
+ goto failed3;
+
+ mutex_init(&mdev_list_lock);
+ INIT_LIST_HEAD(&mdev_devices_list);
+
+ goto all_done;
+
+failed3:
+
+ device_unregister(&mvnet_dev.dev);
+failed2:
+ class_destroy(mvnet_dev.vd_class);
+
+failed1:
+all_done:
+ return ret;
+}
+
+static void __exit mvnet_dev_exit(void)
+{
+ mvnet_dev.dev.bus = NULL;
+ mdev_unregister_device(&mvnet_dev.dev);
+
+ device_unregister(&mvnet_dev.dev);
+ idr_destroy(&mvnet_dev.vd_idr);
+ class_destroy(mvnet_dev.vd_class);
+ mvnet_dev.vd_class = NULL;
+ pr_info("mvnet_dev: Unloaded!\n");
+}
+
+module_init(mvnet_dev_init)
+module_exit(mvnet_dev_exit)
+
+MODULE_LICENSE("GPL v2");
+MODULE_INFO(supported, "Test driver that simulate serial port over PCI");
+MODULE_VERSION(VERSION_STRING);
+MODULE_AUTHOR(DRIVER_AUTHOR);
--
2.19.1
^ permalink raw reply related
* [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Jason Wang @ 2019-09-10 8:19 UTC (permalink / raw)
To: mst, jasowang, kvm, virtualization, netdev
Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>
This path introduces a new mdev transport for virtio. This is used to
use kernel virtio driver to drive the mediated device that is capable
of populating virtqueue directly.
A new virtio-mdev driver will be registered to the mdev bus, when a
new virtio-mdev device is probed, it will register the device with
mdev based config ops. This means, unlike the exist hardware
transport, this is a software transport between mdev driver and mdev
device. The transport was implemented through:
- configuration access was implemented through parent_ops->read()/write()
- vq/config callback was implemented through parent_ops->ioctl()
This transport is derived from virtio MMIO protocol and was wrote for
kernel driver. But for the transport itself, but the design goal is to
be generic enough to support userspace driver (this part will be added
in the future).
Note:
- current mdev assume all the parameter of parent_ops was from
userspace. This prevents us from implementing the kernel mdev
driver. For a quick POC, this patch just abuse those parameter and
assume the mdev device implementation will treat them as kernel
pointer. This should be addressed in the formal series by extending
mdev_parent_ops.
- for a quick POC, I just drive the transport from MMIO, I'm pretty
there's lot of optimization space for this.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vfio/mdev/Kconfig | 7 +
drivers/vfio/mdev/Makefile | 1 +
drivers/vfio/mdev/virtio_mdev.c | 500 +++++++++++++++++++++++++++++++
include/uapi/linux/virtio_mdev.h | 131 ++++++++
4 files changed, 639 insertions(+)
create mode 100644 drivers/vfio/mdev/virtio_mdev.c
create mode 100644 include/uapi/linux/virtio_mdev.h
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 5da27f2100f9..c488c31fc137 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -16,3 +16,10 @@ config VFIO_MDEV_DEVICE
default n
help
VFIO based driver for Mediated devices.
+
+config VIRTIO_MDEV_DEVICE
+ tristate "VIRTIO driver for Mediated devices"
+ depends on VFIO_MDEV && VIRTIO
+ default n
+ help
+ VIRTIO based driver for Mediated devices.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 101516fdf375..99d31e29c23e 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -4,3 +4,4 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
obj-$(CONFIG_VFIO_MDEV) += mdev.o
obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
+obj-$(CONFIG_VIRTIO_MDEV_DEVICE) += virtio_mdev.o
diff --git a/drivers/vfio/mdev/virtio_mdev.c b/drivers/vfio/mdev/virtio_mdev.c
new file mode 100644
index 000000000000..5ff09089297e
--- /dev/null
+++ b/drivers/vfio/mdev/virtio_mdev.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VIRTIO based driver for Mediated device
+ *
+ * Copyright (c) 2019, Red Hat. All rights reserved.
+ * Author: Jason Wang <jasowang@redhat.com>
+ *
+ * Based on Virtio MMIO driver.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+#include <uapi/linux/virtio_mdev.h>
+#include "mdev_private.h"
+
+#define DRIVER_VERSION "0.1"
+#define DRIVER_AUTHOR "Red Hat Corporation"
+#define DRIVER_DESC "VIRTIO based driver for Mediated device"
+
+#define to_virtio_mdev_device(dev) \
+ container_of(dev, struct virtio_mdev_device, vdev)
+
+struct virtio_mdev_device {
+ struct virtio_device vdev;
+ struct mdev_device *mdev;
+ unsigned long version;
+
+ struct virtqueue **vqs;
+ spinlock_t lock;
+};
+
+struct virtio_mdev_vq_info {
+ /* the actual virtqueue */
+ struct virtqueue *vq;
+
+ /* the list node for the virtqueues list */
+ struct list_head node;
+};
+
+static u32 virtio_mdev_readl(struct mdev_device *mdev,
+ loff_t off)
+{
+ struct mdev_parent *parent = mdev->parent;
+ ssize_t len;
+ u32 val;
+
+ if (unlikely(!parent->ops->read))
+ return 0xFFFFFFFF;
+
+ len = parent->ops->read(mdev, (char *)&val, 4, &off);
+ if (len != 4)
+ return 0xFFFFFFFF;
+
+ return val;
+}
+
+static void virtio_mdev_writel(struct mdev_device *mdev,
+ loff_t off, u32 val)
+{
+ struct mdev_parent *parent = mdev->parent;
+
+ if (unlikely(!parent->ops->write))
+ return;
+
+ parent->ops->write(mdev, (char *)&val, 4, &off);
+
+ return;
+}
+
+static void virtio_mdev_get(struct virtio_device *vdev, unsigned offset,
+ void *buf, unsigned len)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ struct mdev_device *mdev = vm_dev->mdev;
+ struct mdev_parent *parent = mdev->parent;
+
+ loff_t off = offset + VIRTIO_MDEV_CONFIG;
+
+ switch (len) {
+ case 1:
+ *(u8 *)buf = parent->ops->read(mdev, buf, 1, &off);
+ break;
+ case 2:
+ *(u16 *)buf = parent->ops->read(mdev, buf, 2, &off);
+ break;
+ case 4:
+ *(u32 *)buf = parent->ops->read(mdev, buf, 4, &off);
+ break;
+ case 8:
+ *(u32 *)buf = parent->ops->read(mdev, buf, 4, &off);
+ *((u32 *)buf + 1) = parent->ops->read(mdev, buf, 4, &off);
+ break;
+ default:
+ BUG();
+ }
+
+ return;
+}
+
+static void virtio_mdev_set(struct virtio_device *vdev, unsigned offset,
+ const void *buf, unsigned len)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ struct mdev_device *mdev = vm_dev->mdev;
+ struct mdev_parent *parent = mdev->parent;
+ loff_t off = offset + VIRTIO_MDEV_CONFIG;
+
+ switch (len) {
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ break;
+ default:
+ BUG();
+ }
+
+ parent->ops->write(mdev, buf, len, &off);
+
+ return;
+}
+
+static u32 virtio_mdev_generation(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+ if (vm_dev->version == 1)
+ return 0;
+ else
+ return virtio_mdev_readl(vm_dev->mdev,
+ VIRTIO_MDEV_CONFIG_GENERATION);
+}
+
+static u8 virtio_mdev_get_status(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+ return virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_STATUS) & 0xff;
+}
+
+static void virtio_mdev_set_status(struct virtio_device *vdev, u8 status)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_STATUS, status);
+}
+
+static void virtio_mdev_reset(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_STATUS, 0);
+}
+
+static bool virtio_mdev_notify(struct virtqueue *vq)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vq->vdev);
+
+ /* We write the queue's selector into the notification register to
+ * signal the other end */
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NOTIFY,
+ vq->index);
+ return true;
+}
+
+static irqreturn_t virtio_mdev_config_cb(void *private)
+{
+ struct virtio_mdev_device *vm_dev = private;
+
+ virtio_config_changed(&vm_dev->vdev);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t virtio_mdev_virtqueue_cb(void *private)
+{
+ struct virtio_mdev_vq_info *info = private;
+
+ return vring_interrupt(0, info->vq);
+}
+
+static struct virtqueue *
+virtio_mdev_setup_vq(struct virtio_device *vdev, unsigned index,
+ void (*callback)(struct virtqueue *vq),
+ const char *name, bool ctx)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ struct mdev_device *mdev= vm_dev->mdev;
+ struct mdev_parent *parent = mdev->parent;
+ struct virtio_mdev_vq_info *info;
+ struct virtio_mdev_callback cb;
+ struct virtqueue *vq;
+ unsigned long flags;
+ u32 align, num;
+ u64 addr;
+ int err;
+
+ if (!name)
+ return NULL;
+
+ /* Select the queue we're interested in */
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_SEL, index);
+
+ /* Queue shouldn't already be set up. */
+ if (virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY)) {
+ err = -ENOENT;
+ goto error_available;
+ }
+
+ /* Allocate and fill out our active queue description */
+ info = kmalloc(sizeof(*info), GFP_KERNEL);
+ if (!info) {
+ err = -ENOMEM;
+ goto error_kmalloc;
+ }
+
+ num = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NUM_MAX);
+ if (num == 0) {
+ err = -ENOENT;
+ goto error_new_virtqueue;
+ }
+
+ /* Create the vring */
+ align = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_ALIGN);
+ vq = vring_create_virtqueue(index, num, align, vdev,
+ true, true, ctx,
+ virtio_mdev_notify, callback, name);
+ if (!vq) {
+ err = -ENOMEM;
+ goto error_new_virtqueue;
+ }
+
+ /* Setup virtqueue callback */
+ cb.callback = virtio_mdev_virtqueue_cb;
+ cb.private = info;
+ err = parent->ops->ioctl(mdev, VIRTIO_MDEV_SET_VQ_CALLBACK,
+ (unsigned long)&cb);
+ if (err) {
+ err = -EINVAL;
+ goto error_callback;
+ }
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_NUM,
+ virtqueue_get_vring_size(vq));
+ addr = virtqueue_get_desc_addr(vq);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_DESC_LOW, (u32)addr);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_DESC_HIGH,
+ (u32)(addr >> 32));
+
+ addr = virtqueue_get_avail_addr(vq);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_AVAIL_LOW, (u32)addr);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_AVAIL_HIGH,
+ (u32)(addr >> 32));
+
+ addr = virtqueue_get_used_addr(vq);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_USED_LOW, (u32)addr);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_USED_HIGH, (u32)(addr >> 32));
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY, 1);
+
+ vq->priv = info;
+ info->vq = vq;
+
+ return vq;
+
+error_callback:
+ vring_del_virtqueue(vq);
+error_new_virtqueue:
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY, 0);
+ WARN_ON(virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY));
+ kfree(info);
+error_kmalloc:
+error_available:
+ return ERR_PTR(err);
+
+}
+
+static void virtio_mdev_del_vq(struct virtqueue *vq)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vq->vdev);
+ struct virtio_mdev_vq_info *info = vq->priv;
+ unsigned long flags;
+ unsigned int index = vq->index;
+
+ /* Select and deactivate the queue */
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_QUEUE_SEL, index);
+ virtio_mdev_writel(vm_dev->mdev,VIRTIO_MDEV_QUEUE_READY, 0);
+ WARN_ON(virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_QUEUE_READY));
+
+ vring_del_virtqueue(vq);
+
+ kfree(info);
+}
+
+static void virtio_mdev_del_vqs(struct virtio_device *vdev)
+{
+ struct virtqueue *vq, *n;
+
+ list_for_each_entry_safe(vq, n, &vdev->vqs, list)
+ virtio_mdev_del_vq(vq);
+
+ return;
+}
+
+static int virtio_mdev_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+ struct virtqueue *vqs[],
+ vq_callback_t *callbacks[],
+ const char * const names[],
+ const bool *ctx,
+ struct irq_affinity *desc)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ struct mdev_device *mdev = vm_dev->mdev;
+ struct mdev_parent *parent = mdev->parent;
+ struct virtio_mdev_callback cb;
+ int i, err, queue_idx = 0;
+ vm_dev->vqs = kmalloc_array(queue_idx, sizeof(*vm_dev->vqs),
+ GFP_KERNEL);
+ if (!vm_dev->vqs)
+ return -ENOMEM;
+
+ for (i = 0; i < nvqs; ++i) {
+ if (!names[i]) {
+ vqs[i] = NULL;
+ continue;
+ }
+
+ vqs[i] = virtio_mdev_setup_vq(vdev, queue_idx++,
+ callbacks[i], names[i], ctx ?
+ ctx[i] : false);
+ if (IS_ERR(vqs[i])) {
+ err = PTR_ERR(vqs[i]);
+ goto err_setup_vq;
+ }
+ }
+
+ cb.callback = virtio_mdev_config_cb;
+ cb.private = vm_dev;
+ err = parent->ops->ioctl(mdev, VIRTIO_MDEV_SET_CONFIG_CALLBACK,
+ (unsigned long)&cb);
+ if (err)
+ goto err_setup_vq;
+
+ return 0;
+
+err_setup_vq:
+ kfree(vm_dev->vqs);
+ virtio_mdev_del_vqs(vdev);
+ return err;
+}
+
+static u64 virtio_mdev_get_features(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ u64 features;
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES_SEL, 1);
+ features = virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES);
+ features <<= 32;
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES_SEL, 0);
+ features |= virtio_mdev_readl(vm_dev->mdev, VIRTIO_MDEV_DEVICE_FEATURES);
+
+ return features;
+}
+
+static int virtio_mdev_finalize_features(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+
+ /* Give virtio_ring a chance to accept features. */
+ vring_transport_features(vdev);
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 1);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+ (u32)(vdev->features >> 32));
+
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES_SEL, 0);
+ virtio_mdev_writel(vm_dev->mdev, VIRTIO_MDEV_DRIVER_FEATURES,
+ (u32)vdev->features);
+
+ return 0;
+}
+
+static const char *virtio_mdev_bus_name(struct virtio_device *vdev)
+{
+ struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
+ struct mdev_device *mdev = vm_dev->mdev;
+
+ return dev_name(&mdev->dev);
+}
+
+static const struct virtio_config_ops virtio_mdev_config_ops = {
+ .get = virtio_mdev_get,
+ .set = virtio_mdev_set,
+ .generation = virtio_mdev_generation,
+ .get_status = virtio_mdev_get_status,
+ .set_status = virtio_mdev_set_status,
+ .reset = virtio_mdev_reset,
+ .find_vqs = virtio_mdev_find_vqs,
+ .del_vqs = virtio_mdev_del_vqs,
+ .get_features = virtio_mdev_get_features,
+ .finalize_features = virtio_mdev_finalize_features,
+ .bus_name = virtio_mdev_bus_name,
+};
+
+static void virtio_mdev_release_dev(struct device *_d)
+{
+ struct virtio_device *vdev =
+ container_of(_d, struct virtio_device, dev);
+ struct virtio_mdev_device *vm_dev =
+ container_of(vdev, struct virtio_mdev_device, vdev);
+
+ devm_kfree(_d, vm_dev);
+}
+
+static int virtio_mdev_probe(struct device *dev)
+{
+ struct mdev_device *mdev = to_mdev_device(dev);
+ struct virtio_mdev_device *vm_dev;
+ unsigned long magic;
+ int rc;
+
+ magic = virtio_mdev_readl(mdev, VIRTIO_MDEV_MAGIC_VALUE);
+ if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) {
+ dev_warn(dev, "Wrong magic value 0x%08lx!\n", magic);
+ return -ENODEV;
+ }
+
+ vm_dev = devm_kzalloc(dev, sizeof(*vm_dev), GFP_KERNEL);
+ if (!vm_dev)
+ return -ENOMEM;
+
+ vm_dev->vdev.dev.parent = dev;
+ vm_dev->vdev.dev.release = virtio_mdev_release_dev;
+ vm_dev->vdev.config = &virtio_mdev_config_ops;
+ vm_dev->mdev = mdev;
+ vm_dev->vqs = NULL;
+ spin_lock_init(&vm_dev->lock);
+
+ vm_dev->version = virtio_mdev_readl(mdev, VIRTIO_MDEV_VERSION);
+ if (vm_dev->version != 1) {
+ dev_err(dev, "Version %ld not supported!\n",
+ vm_dev->version);
+ return -ENXIO;
+ }
+
+ vm_dev->vdev.id.device = virtio_mdev_readl(mdev, VIRTIO_MDEV_DEVICE_ID);
+ if (vm_dev->vdev.id.device == 0)
+ return -ENODEV;
+
+ vm_dev->vdev.id.vendor = virtio_mdev_readl(mdev, VIRTIO_MDEV_VENDOR_ID);
+ rc = register_virtio_device(&vm_dev->vdev);
+ if (rc)
+ put_device(dev);
+
+ dev_set_drvdata(dev, vm_dev);
+
+ return rc;
+
+}
+
+static void virtio_mdev_remove(struct device *dev)
+{
+ struct virtio_mdev_device *vm_dev = dev_get_drvdata(dev);
+
+ unregister_virtio_device(&vm_dev->vdev);
+}
+
+static struct mdev_driver virtio_mdev_driver = {
+ .name = "virtio_mdev",
+ .probe = virtio_mdev_probe,
+ .remove = virtio_mdev_remove,
+};
+
+static int __init virtio_mdev_init(void)
+{
+ return mdev_register_driver(&virtio_mdev_driver, THIS_MODULE);
+}
+
+static void __exit virtio_mdev_exit(void)
+{
+ mdev_unregister_driver(&virtio_mdev_driver);
+}
+
+module_init(virtio_mdev_init)
+module_exit(virtio_mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/virtio_mdev.h b/include/uapi/linux/virtio_mdev.h
new file mode 100644
index 000000000000..8040de6b960a
--- /dev/null
+++ b/include/uapi/linux/virtio_mdev.h
@@ -0,0 +1,131 @@
+/*
+ * Virtio mediated device driver
+ *
+ * Copyright 2019, Red Hat Corp.
+ *
+ * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
+ *
+ * This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ * may be used to endorse or promote products derived from this software
+ * without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_MDEV_H
+#define _LINUX_VIRTIO_MDEV_H
+
+#include <linux/interrupt.h>
+#include <linux/vringh.h>
+#include <uapi/linux/virtio_net.h>
+
+/*
+ * Ioctls
+ */
+
+struct virtio_mdev_callback {
+ irqreturn_t (*callback)(void *);
+ void *private;
+};
+
+#define VIRTIO_MDEV 0xAF
+#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
+ struct virtio_mdev_callback)
+#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
+ struct virtio_mdev_callback)
+
+#define VIRTIO_MDEV_DEVICE_API_STRING "virtio-mdev"
+
+/*
+ * Control registers
+ */
+
+/* Magic value ("virt" string) - Read Only */
+#define VIRTIO_MDEV_MAGIC_VALUE 0x000
+
+/* Virtio device version - Read Only */
+#define VIRTIO_MDEV_VERSION 0x004
+
+/* Virtio device ID - Read Only */
+#define VIRTIO_MDEV_DEVICE_ID 0x008
+
+/* Virtio vendor ID - Read Only */
+#define VIRTIO_MDEV_VENDOR_ID 0x00c
+
+/* Bitmask of the features supported by the device (host)
+ * (32 bits per set) - Read Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES 0x010
+
+/* Device (host) features set selector - Write Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES_SEL 0x014
+
+/* Bitmask of features activated by the driver (guest)
+ * (32 bits per set) - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES 0x020
+
+/* Activated features set selector - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES_SEL 0x024
+
+/* Queue selector - Write Only */
+#define VIRTIO_MDEV_QUEUE_SEL 0x030
+
+/* Maximum size of the currently selected queue - Read Only */
+#define VIRTIO_MDEV_QUEUE_NUM_MAX 0x034
+
+/* Queue size for the currently selected queue - Write Only */
+#define VIRTIO_MDEV_QUEUE_NUM 0x038
+
+/* Ready bit for the currently selected queue - Read Write */
+#define VIRTIO_MDEV_QUEUE_READY 0x044
+
+/* Alignment of virtqueue - Read Only */
+#define VIRTIO_MDEV_QUEUE_ALIGN 0x048
+
+/* Queue notifier - Write Only */
+#define VIRTIO_MDEV_QUEUE_NOTIFY 0x050
+
+/* Device status register - Read Write */
+#define VIRTIO_MDEV_STATUS 0x060
+
+/* Selected queue's Descriptor Table address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_DESC_LOW 0x080
+#define VIRTIO_MDEV_QUEUE_DESC_HIGH 0x084
+
+/* Selected queue's Available Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_AVAIL_LOW 0x090
+#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH 0x094
+
+/* Selected queue's Used Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_USED_LOW 0x0a0
+#define VIRTIO_MDEV_QUEUE_USED_HIGH 0x0a4
+
+/* Configuration atomicity value */
+#define VIRTIO_MDEV_CONFIG_GENERATION 0x0fc
+
+/* The config space is defined by each driver as
+ * the per-driver configuration space - Read Write */
+#define VIRTIO_MDEV_CONFIG 0x100
+
+#endif
+
+
+/* Ready bit for the currently selected queue - Read Write */
--
2.19.1
^ permalink raw reply related
* [RFC PATCH 2/4] mdev: introduce helper to set per device dma ops
From: Jason Wang @ 2019-09-10 8:19 UTC (permalink / raw)
To: mst, jasowang, kvm, virtualization, netdev
Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>
This patch introduces mdev_set_dma_ops() which allows parent to set
per device DMA ops. This help for the kernel driver to setup a correct
DMA mappings.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vfio/mdev/mdev_core.c | 7 +++++++
include/linux/mdev.h | 2 ++
2 files changed, 9 insertions(+)
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..eb28552082d7 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -13,6 +13,7 @@
#include <linux/uuid.h>
#include <linux/sysfs.h>
#include <linux/mdev.h>
+#include <linux/dma-mapping.h>
#include "mdev_private.h"
@@ -27,6 +28,12 @@ static struct class_compat *mdev_bus_compat_class;
static LIST_HEAD(mdev_list);
static DEFINE_MUTEX(mdev_list_lock);
+void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops)
+{
+ set_dma_ops(&mdev->dev, ops);
+}
+EXPORT_SYMBOL(mdev_set_dma_ops);
+
struct device *mdev_parent_dev(struct mdev_device *mdev)
{
return mdev->parent->dev;
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..7195f40bf8bf 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -145,4 +145,6 @@ struct device *mdev_parent_dev(struct mdev_device *mdev);
struct device *mdev_dev(struct mdev_device *mdev);
struct mdev_device *mdev_from_dev(struct device *dev);
+void mdev_set_dma_ops(struct mdev_device *mdev, struct dma_map_ops *ops);
+
#endif /* MDEV_H */
--
2.19.1
^ permalink raw reply related
* Re: ❌ FAIL: Stable queue: queue-5.2
From: Hangbin Liu @ 2019-09-10 8:19 UTC (permalink / raw)
To: CKI Project
Cc: Linux Stable maillist, netdev, Jan Stancek, Xiumei Mu,
David Howells, linux-afs
In-Reply-To: <cki.77A5953448.UY7ROQ6BKT@redhat.com>
On Wed, Aug 28, 2019 at 08:36:14AM -0400, CKI Project wrote:
>
> Hello,
>
> We ran automated tests on a patchset that was proposed for merging into this
> kernel tree. The patches were applied to:
>
> Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> Commit: f7d5b3dc4792 - Linux 5.2.10
>
> The results of these automated tests are provided below.
>
> Overall result: FAILED (see details below)
> Merge: OK
> Compile: OK
> Tests: FAILED
>
> All kernel binaries, config files, and logs are available for download here:
>
> https://artifacts.cki-project.org/pipelines/128519
>
>
>
> One or more kernel tests failed:
>
> x86_64:
> ❌ Networking socket: fuzz
Sorry, maybe the info is a little late, I just found the call traces for this
failure.
[ 9492.446228] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 9492.447493] #PF: supervisor write access in kernel mode
[ 9492.448489] #PF: error_code(0x0002) - not-present page
[ 9492.449410] PGD 800000010902c067 P4D 800000010902c067 PUD 104202067 PMD 0
[ 9492.450663] Oops: 0002 [#1] SMP PTI
[ 9492.451348] CPU: 0 PID: 19353 Comm: socket Tainted: G W 5.2.10-f7d5b3d.cki #1
[ 9492.453040] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 9492.454153] RIP: 0010:rxrpc_unuse_local+0xa/0x20 [rxrpc]
[ 9492.455110] Code: ce e9 c4 fe ff ff 0f 0b e9 34 dd 00 00 e9 95 dd 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 b8 ff ff ff ff <3e> 0f c1 47 10 83 f8 01 74 05 e9 a7 f5 ff ff e9 e2 f7 ff ff 66 90
[ 9492.458362] RSP: 0018:ffffa756008bbeb0 EFLAGS: 00010246
[ 9492.459329] RAX: 00000000ffffffff RBX: ffff95fed42c0000 RCX: ffffc755ffc63b37
[ 9492.460690] RDX: 0000000000000001 RSI: 0000000000000046 RDI: 0000000000000000
[ 9492.461940] RBP: ffff95ff04fed000 R08: 0000000000000001 R09: ffffc755ffc63b60
[ 9492.463220] R10: 0000000000000060 R11: 0000000000000000 R12: ffff95ff04fed0e4
[ 9492.464508] R13: ffff95feaa84c780 R14: 0000000000000000 R15: 0000000000000000
[ 9492.465781] FS: 00007f86bd101740(0000) GS:ffff95ffbba00000(0000) knlGS:0000000000000000
[ 9492.467156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9492.468185] CR2: 0000000000000010 CR3: 000000002e34a004 CR4: 00000000007606f0
[ 9492.469435] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9492.470754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9492.472050] PKRU: 55555554
[ 9492.472562] Call Trace:
[ 9492.473025] rxrpc_release+0x138/0x1e0 [rxrpc]
[ 9492.473885] __sock_release+0x89/0xa0
[ 9492.474564] __sys_socket+0xd4/0xf0
[ 9492.475200] __x64_sys_socket+0x16/0x20
[ 9492.475903] do_syscall_64+0x5f/0x1a0
[ 9492.476551] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9492.477446] RIP: 0033:0x7f86bd20069b
[ 9492.478094] Code: 73 01 c3 48 8b 0d ed 37 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 29 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd 37 0c 00 f7 d8 64 89 01 48
[ 9492.481381] RSP: 002b:00007ffcbb797dc8 EFLAGS: 00000217 ORIG_RAX: 0000000000000029
[ 9492.482744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f86bd20069b
[ 9492.483945] RDX: 000000000000000a RSI: 0000000000000002 RDI: 0000000000000021
[ 9492.485220] RBP: 00007ffcbb797e10 R08: 00007f86bd2c41f4 R09: 00007f86bd2c4260
[ 9492.486505] R10: 00000000ffffffff R11: 0000000000000217 R12: 00000000004012b0
[ 9492.487769] R13: 00007ffcbb797ef0 R14: 0000000000000000 R15: 0000000000000000
[ 9492.489048] Modules linked in: nfnetlink cmtp kernelcapi l2tp_ip6 l2tp_ip rfcomm pptp gre l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel bnep can_bcm hidp can_raw kcm pppoe pppox ppp_generic slhc vmw_vsock_vmci_transport vsock vmw_vmci psnap ieee802154_socket ieee802154 rose bluetooth ecdh_generic ecc mpls_router ip_tunnel netrom ax25 smc ib_core af_key fcrypt pcbc rxrpc nfc rfkill atm can mlx4_en mlx4_core nls_utf8 isofs dummy minix binfmt_misc nfsv3 nfs_acl nfs lockd grace fscache sctp rds brd vfat fat btrfs xor zstd_compress raid6_pq zstd_decompress loop tun ip6table_nat ip6_tables xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge stp llc overlay fuse nfit libnvdimm sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_net pcspkr net_failover joydev failover virtio_balloon i2c_piix4 ip_tables xfs libcrc32c qxl drm_kms_helper ttm drm crc32c_intel virtio_blk serio_raw ata_generic pat
a_acpi
[ 9492.489083] floppy qemu_fw_cfg [last unloaded: can]
[ 9492.505349] CR2: 0000000000000010
[ 9492.505948] ---[ end trace afa9902ac3c49830 ]---
Thanks
Hangbin
>
> We hope that these logs can help you find the problem quickly. For the full
> detail on our testing procedures, please scroll to the bottom of this message.
>
> Please reply to this email if you have any questions about the tests that we
> ran or if you have any suggestions on how to make future tests more effective.
>
> ,-. ,-.
> ( C ) ( K ) Continuous
> `-',-.`-' Kernel
> ( I ) Integration
> `-'
> ______________________________________________________________________________
>
> Merge testing
> -------------
>
> We cloned this repository and checked out the following commit:
>
> Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> Commit: f7d5b3dc4792 - Linux 5.2.10
>
>
> We grabbed the 54831dad38d2 commit of the stable queue repository.
>
> We then merged the patchset with `git am`:
>
> asoc-simple_card_utils.h-care-null-dai-at-asoc_simpl.patch
> asoc-simple-card-fix-an-use-after-free-in-simple_dai.patch
> asoc-simple-card-fix-an-use-after-free-in-simple_for.patch
> asoc-audio-graph-card-fix-use-after-free-in-graph_da.patch
> asoc-audio-graph-card-fix-an-use-after-free-in-graph.patch
> asoc-audio-graph-card-add-missing-const-at-graph_get.patch
> regulator-axp20x-fix-dcdca-and-dcdcd-for-axp806.patch
> regulator-axp20x-fix-dcdc5-and-dcdc6-for-axp803.patch
> asoc-samsung-odroid-fix-an-use-after-free-issue-for-.patch
> asoc-samsung-odroid-fix-a-double-free-issue-for-cpu_.patch
> asoc-intel-bytcht_es8316-add-quirk-for-irbis-nb41-ne.patch
> hid-logitech-hidpp-add-usb-pid-for-a-few-more-suppor.patch
> hid-add-044f-b320-thrustmaster-inc.-2-in-1-dt.patch
> mips-kernel-only-use-i8253-clocksource-with-periodic.patch
> mips-fix-cacheinfo.patch
> libbpf-sanitize-var-to-conservative-1-byte-int.patch
> netfilter-ebtables-fix-a-memory-leak-bug-in-compat.patch
> asoc-dapm-fix-handling-of-custom_stop_condition-on-d.patch
> asoc-sof-use-__u32-instead-of-uint32_t-in-uapi-heade.patch
> spi-pxa2xx-balance-runtime-pm-enable-disable-on-erro.patch
> bpf-sockmap-sock_map_delete-needs-to-use-xchg.patch
> bpf-sockmap-synchronize_rcu-before-free-ing-map.patch
> bpf-sockmap-only-create-entry-if-ulp-is-not-already-.patch
> selftests-bpf-fix-sendmsg6_prog-on-s390.patch
> asoc-dapm-fix-a-memory-leak-bug.patch
> bonding-force-slave-speed-check-after-link-state-rec.patch
> net-mvpp2-don-t-check-for-3-consecutive-idle-frames-.patch
> selftests-forwarding-gre_multipath-enable-ipv4-forwa.patch
> selftests-forwarding-gre_multipath-fix-flower-filter.patch
> selftests-bpf-add-another-gso_segs-access.patch
> libbpf-fix-using-uninitialized-ioctl-results.patch
> can-dev-call-netif_carrier_off-in-register_candev.patch
> can-mcp251x-add-error-check-when-wq-alloc-failed.patch
> can-gw-fix-error-path-of-cgw_module_init.patch
> asoc-fail-card-instantiation-if-dai-format-setup-fai.patch
> staging-fbtft-fix-gpio-handling.patch
> libbpf-silence-gcc8-warning-about-string-truncation.patch
> st21nfca_connectivity_event_received-null-check-the-.patch
> st_nci_hci_connectivity_event_received-null-check-th.patch
> nl-mac-80211-fix-interface-combinations-on-crypto-co.patch
> asoc-ti-davinci-mcasp-fix-clk-pdir-handling-for-i2s-.patch
> asoc-rockchip-fix-mono-capture.patch
> asoc-ti-davinci-mcasp-correct-slot_width-posed-const.patch
> net-usb-qmi_wwan-add-the-broadmobi-bm818-card.patch
> qed-rdma-fix-the-hw_ver-returned-in-device-attribute.patch
> isdn-misdn-hfcsusb-fix-possible-null-pointer-derefer.patch
> habanalabs-fix-f-w-download-in-be-architecture.patch
> mac80211_hwsim-fix-possible-null-pointer-dereference.patch
> net-stmmac-manage-errors-returned-by-of_get_mac_addr.patch
> netfilter-ipset-actually-allow-destination-mac-addre.patch
> netfilter-ipset-copy-the-right-mac-address-in-bitmap.patch
> netfilter-ipset-fix-rename-concurrency-with-listing.patch
> rxrpc-fix-potential-deadlock.patch
> rxrpc-fix-the-lack-of-notification-when-sendmsg-fail.patch
> nvmem-use-the-same-permissions-for-eeprom-as-for-nvm.patch
> iwlwifi-mvm-avoid-races-in-rate-init-and-rate-perfor.patch
> iwlwifi-dbg_ini-move-iwl_dbg_tlv_load_bin-out-of-deb.patch
> iwlwifi-dbg_ini-move-iwl_dbg_tlv_free-outside-of-deb.patch
> iwlwifi-fix-locking-in-delayed-gtk-setting.patch
> iwlwifi-mvm-send-lq-command-always-async.patch
> enetc-fix-build-error-without-phylib.patch
> isdn-hfcsusb-fix-misdn-driver-crash-caused-by-transf.patch
> net-phy-phy_led_triggers-fix-a-possible-null-pointer.patch
> perf-bench-numa-fix-cpu0-binding.patch
> spi-pxa2xx-add-support-for-intel-tiger-lake.patch
> can-sja1000-force-the-string-buffer-null-terminated.patch
> can-peak_usb-force-the-string-buffer-null-terminated.patch
> asoc-amd-acp3x-use-dma_ops-of-parent-device-for-acp3.patch
> net-ethernet-qlogic-qed-force-the-string-buffer-null.patch
> enetc-select-phylib-while-config_fsl_enetc_vf-is-set.patch
> nfsv4-fix-a-credential-refcount-leak-in-nfs41_check_.patch
> nfsv4-when-recovering-state-fails-with-eagain-retry-.patch
> nfsv4.1-fix-open-stateid-recovery.patch
> nfsv4.1-only-reap-expired-delegations.patch
> nfsv4-fix-a-potential-sleep-while-atomic-in-nfs4_do_.patch
> nfs-fix-regression-whereby-fscache-errors-are-appear.patch
> hid-quirks-set-the-increment_usage_on_duplicate-quir.patch
> hid-input-fix-a4tech-horizontal-wheel-custom-usage.patch
> drm-rockchip-suspend-dp-late.patch
> smb3-fix-potential-memory-leak-when-processing-compo.patch
> smb3-kernel-oops-mounting-a-encryptdata-share-with-c.patch
> sched-deadline-fix-double-accounting-of-rq-running-b.patch
> sched-psi-reduce-psimon-fifo-priority.patch
> sched-psi-do-not-require-setsched-permission-from-th.patch
> s390-protvirt-avoid-memory-sharing-for-diag-308-set-.patch
> s390-mm-fix-dump_pagetables-top-level-page-table-wal.patch
> s390-put-_stext-and-_etext-into-.text-section.patch
> ata-rb532_cf-fix-unused-variable-warning-in-rb532_pa.patch
> net-cxgb3_main-fix-a-resource-leak-in-a-error-path-i.patch
> net-stmmac-fix-issues-when-number-of-queues-4.patch
> net-stmmac-tc-do-not-return-a-fragment-entry.patch
> drm-amdgpu-pin-the-csb-buffer-on-hw-init-for-gfx-v8.patch
> net-hisilicon-make-hip04_tx_reclaim-non-reentrant.patch
> net-hisilicon-fix-hip04-xmit-never-return-tx_busy.patch
> net-hisilicon-fix-dma_map_single-failed-on-arm64.patch
> nfsv4-ensure-state-recovery-handles-etimedout-correc.patch
> libata-have-ata_scsi_rw_xlat-fail-invalid-passthroug.patch
> libata-add-sg-safety-checks-in-sff-pio-transfers.patch
> x86-lib-cpu-address-missing-prototypes-warning.patch
> drm-vmwgfx-fix-memory-leak-when-too-many-retries-hav.patch
> block-aoe-fix-kernel-crash-due-to-atomic-sleep-when-.patch
> block-bfq-handle-null-return-value-by-bfq_init_rq.patch
> perf-ftrace-fix-failure-to-set-cpumask-when-only-one.patch
> perf-cpumap-fix-writing-to-illegal-memory-in-handlin.patch
> perf-pmu-events-fix-missing-cpu_clk_unhalted.core-ev.patch
> dt-bindings-riscv-fix-the-schema-compatible-string-f.patch
> kvm-arm64-don-t-write-junk-to-sysregs-on-reset.patch
> kvm-arm-don-t-write-junk-to-cp15-registers-on-reset.patch
> selftests-kvm-adding-config-fragments.patch
> iwlwifi-mvm-disable-tx-amsdu-on-older-nics.patch
> hid-wacom-correct-misreported-ekr-ring-values.patch
> hid-wacom-correct-distance-scale-for-2nd-gen-intuos-devices.patch
> revert-kvm-x86-mmu-zap-only-the-relevant-pages-when-removing-a-memslot.patch
> revert-dm-bufio-fix-deadlock-with-loop-device.patch
> clk-socfpga-stratix10-fix-rate-caclulationg-for-cnt_clks.patch
> ceph-clear-page-dirty-before-invalidate-page.patch
> ceph-don-t-try-fill-file_lock-on-unsuccessful-getfilelock-reply.patch
> libceph-fix-pg-split-vs-osd-re-connect-race.patch
> drm-amdgpu-gfx9-update-pg_flags-after-determining-if-gfx-off-is-possible.patch
> drm-nouveau-don-t-retry-infinitely-when-receiving-no-data-on-i2c-over-aux.patch
> scsi-ufs-fix-null-pointer-dereference-in-ufshcd_config_vreg_hpm.patch
> gpiolib-never-report-open-drain-source-lines-as-input-to-user-space.patch
> drivers-hv-vmbus-fix-virt_to_hvpfn-for-x86_pae.patch
> userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
> x86-retpoline-don-t-clobber-rflags-during-call_nospec-on-i386.patch
> x86-apic-handle-missing-global-clockevent-gracefully.patch
> x86-cpu-amd-clear-rdrand-cpuid-bit-on-amd-family-15h-16h.patch
> x86-boot-save-fields-explicitly-zero-out-everything-else.patch
> x86-boot-fix-boot-regression-caused-by-bootparam-sanitizing.patch
> ib-hfi1-unsafe-psn-checking-for-tid-rdma-read-resp-packet.patch
> ib-hfi1-add-additional-checks-when-handling-tid-rdma-read-resp-packet.patch
> ib-hfi1-add-additional-checks-when-handling-tid-rdma-write-data-packet.patch
> ib-hfi1-drop-stale-tid-rdma-packets-that-cause-tiderr.patch
> psi-get-poll_work-to-run-when-calling-poll-syscall-next-time.patch
> dm-kcopyd-always-complete-failed-jobs.patch
> dm-dust-use-dust-block-size-for-badblocklist-index.patch
> dm-btree-fix-order-of-block-initialization-in-btree_split_beneath.patch
> dm-integrity-fix-a-crash-due-to-bug_on-in-__journal_read_write.patch
> dm-raid-add-missing-cleanup-in-raid_ctr.patch
> dm-space-map-metadata-fix-missing-store-of-apply_bops-return-value.patch
> dm-table-fix-invalid-memory-accesses-with-too-high-sector-number.patch
> dm-zoned-improve-error-handling-in-reclaim.patch
> dm-zoned-improve-error-handling-in-i-o-map-code.patch
> dm-zoned-properly-handle-backing-device-failure.patch
> genirq-properly-pair-kobject_del-with-kobject_add.patch
> mm-z3fold.c-fix-race-between-migration-and-destruction.patch
> mm-page_alloc-move_freepages-should-not-examine-struct-page-of-reserved-memory.patch
> mm-memcontrol-flush-percpu-vmstats-before-releasing-memcg.patch
> mm-memcontrol-flush-percpu-vmevents-before-releasing-memcg.patch
> mm-page_owner-handle-thp-splits-correctly.patch
> mm-zsmalloc.c-migration-can-leave-pages-in-zs_empty-indefinitely.patch
> mm-zsmalloc.c-fix-race-condition-in-zs_destroy_pool.patch
> mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags-y.patch
> xfs-fix-missing-ilock-unlock-when-xfs_setattr_nonsize-fails-due-to-edquot.patch
> ib-hfi1-drop-stale-tid-rdma-packets.patch
> dm-zoned-fix-potential-null-dereference-in-dmz_do_re.patch
> io_uring-fix-potential-hang-with-polled-io.patch
> io_uring-don-t-enter-poll-loop-if-we-have-cqes-pendi.patch
> io_uring-add-need_resched-check-in-inner-poll-loop.patch
> powerpc-allow-flush_-inval_-dcache_range-to-work-across-ranges-4gb.patch
> rxrpc-fix-local-endpoint-refcounting.patch
> rxrpc-fix-read-after-free-in-rxrpc_queue_local.patch
> rxrpc-fix-local-endpoint-replacement.patch
>
> Compile testing
> ---------------
>
> We compiled the kernel for 3 architectures:
>
> aarch64:
> make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
>
> ppc64le:
> make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
>
> x86_64:
> make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
>
>
> Hardware testing
> ----------------
> We booted each kernel and ran the following tests:
>
> aarch64:
> Host 1:
> ✅ Boot test [0]
> ✅ xfstests: xfs [1]
> ✅ selinux-policy: serge-testsuite [2]
> ✅ lvm thinp sanity [3]
> ✅ storage: software RAID testing [4]
> 🚧 ✅ Storage blktests [5]
>
> Host 2:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test [0]
> ⚡⚡⚡ Podman system integration test (as root) [6]
> ⚡⚡⚡ Podman system integration test (as user) [6]
> ⚡⚡⚡ Loopdev Sanity [7]
> ⚡⚡⚡ jvm test suite [8]
> ⚡⚡⚡ AMTU (Abstract Machine Test Utility) [9]
> ⚡⚡⚡ LTP: openposix test suite [10]
> ⚡⚡⚡ Ethernet drivers sanity [11]
> ⚡⚡⚡ Networking socket: fuzz [12]
> ⚡⚡⚡ audit: audit testsuite test [13]
> ⚡⚡⚡ httpd: mod_ssl smoke sanity [14]
> ⚡⚡⚡ iotop: sanity [15]
> ⚡⚡⚡ tuned: tune-processes-through-perf [16]
> ⚡⚡⚡ Usex - version 1.9-29 [17]
> ⚡⚡⚡ storage: SCSI VPD [18]
> ⚡⚡⚡ stress: stress-ng [19]
> 🚧 ⚡⚡⚡ LTP lite [20]
>
>
> ppc64le:
> Host 1:
> ✅ Boot test [0]
> ✅ xfstests: xfs [1]
> ✅ selinux-policy: serge-testsuite [2]
> ✅ lvm thinp sanity [3]
> ✅ storage: software RAID testing [4]
> 🚧 ✅ Storage blktests [5]
>
> Host 2:
> ✅ Boot test [0]
> ✅ Podman system integration test (as root) [6]
> ✅ Podman system integration test (as user) [6]
> ✅ Loopdev Sanity [7]
> ✅ jvm test suite [8]
> ✅ AMTU (Abstract Machine Test Utility) [9]
> ✅ LTP: openposix test suite [10]
> ✅ Ethernet drivers sanity [11]
> ✅ Networking socket: fuzz [12]
> ✅ audit: audit testsuite test [13]
> ✅ httpd: mod_ssl smoke sanity [14]
> ✅ iotop: sanity [15]
> ✅ tuned: tune-processes-through-perf [16]
> ✅ Usex - version 1.9-29 [17]
> 🚧 ✅ LTP lite [20]
>
>
> x86_64:
> Host 1:
> ✅ Boot test [0]
> ✅ Podman system integration test (as root) [6]
> ✅ Podman system integration test (as user) [6]
> ✅ Loopdev Sanity [7]
> ✅ jvm test suite [8]
> ✅ AMTU (Abstract Machine Test Utility) [9]
> ✅ LTP: openposix test suite [10]
> ✅ Ethernet drivers sanity [11]
> ❌ Networking socket: fuzz [12]
> ⚡⚡⚡ audit: audit testsuite test [13]
> ⚡⚡⚡ httpd: mod_ssl smoke sanity [14]
> ⚡⚡⚡ iotop: sanity [15]
> ⚡⚡⚡ tuned: tune-processes-through-perf [16]
> ⚡⚡⚡ pciutils: sanity smoke test [21]
> ⚡⚡⚡ Usex - version 1.9-29 [17]
> ⚡⚡⚡ storage: SCSI VPD [18]
> ⚡⚡⚡ stress: stress-ng [19]
> 🚧 ❌ LTP lite [20]
>
> Host 2:
> ✅ Boot test [0]
> ✅ xfstests: xfs [1]
> ✅ selinux-policy: serge-testsuite [2]
> ✅ lvm thinp sanity [3]
> ✅ storage: software RAID testing [4]
> 🚧 ✅ Storage blktests [5]
>
>
> Test source:
> 💚 Pull requests are welcome for new tests or improvements to existing tests!
> [0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/kpkginstall
> [1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/filesystems/xfs/xfstests
> [2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/selinux-policy/serge-testsuite
> [3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/lvm/thinp/sanity
> [4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/swraid/trim
> [5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/blk
> [6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/podman
> [7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/loopdev/sanity
> [8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
> [9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
> [10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/ltp/openposix_testsuite
> [11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/driver/sanity
> [12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/socket/fuzz
> [13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/audit/audit-testsuite
> [14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/httpd/mod_ssl-smoke
> [15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iotop/sanity
> [16]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tuned/tune-processes-through-perf
> [17]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/usex/1.9-29
> [18]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/scsi/vpd
> [19]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stress-ng
> [20]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution/ltp-upstream/lite
> [21]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/sanity-smoke
>
> Waived tests
> ------------
> If the test run included waived tests, they are marked with 🚧. Such tests are
> executed but their results are not taken into account. Tests are waived when
> their results are not reliable enough, e.g. when they're just introduced or are
> being fixed.
^ permalink raw reply
* [RFC PATCH 1/4] vringh: fix copy direction of vringh_iov_push_kern()
From: Jason Wang @ 2019-09-10 8:19 UTC (permalink / raw)
To: mst, jasowang, kvm, virtualization, netdev
Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-1-jasowang@redhat.com>
We want to copy from iov to buf, so the direction was wrong.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vringh.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 08ad0d1f0476..a0a2d74967ef 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -852,6 +852,12 @@ static inline int xfer_kern(void *src, void *dst, size_t len)
return 0;
}
+static inline int kern_xfer(void *dst, void *src, size_t len)
+{
+ memcpy(dst, src, len);
+ return 0;
+}
+
/**
* vringh_init_kern - initialize a vringh for a kernelspace vring.
* @vrh: the vringh to initialize.
@@ -958,7 +964,7 @@ EXPORT_SYMBOL(vringh_iov_pull_kern);
ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
const void *src, size_t len)
{
- return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
+ return vringh_iov_xfer(wiov, (void *)src, len, kern_xfer);
}
EXPORT_SYMBOL(vringh_iov_push_kern);
--
2.19.1
^ permalink raw reply related
* [RFC PATCH 0/4] mdev based hardware virtio offloading support
From: Jason Wang @ 2019-09-10 8:19 UTC (permalink / raw)
To: mst, jasowang, kvm, virtualization, netdev
Cc: linux-kernel, kwankhede, alex.williamson, cohuck, tiwei.bie,
maxime.coquelin, cunming.liang, zhihong.wang, rob.miller, idos,
xiao.w.wang, haotian.wang
Hi all:
There are hardware that can do virtio datapath offloading while having
its own control path. This path tries to implement a mdev based
unified API to support using kernel virtio driver to drive those
devices. This is done by introducing a new mdev transport for virtio
(virtio_mdev) and register itself as a new kind of mdev driver. Then
it provides a unified way for kernel virtio driver to talk with mdev
device implementation.
Though the series only contain kernel driver support, the goal is to
make the transport generic enough to support userspace drivers. This
means vhost-mdev[1] could be built on top as well by resuing the
transport.
A sample driver is also implemented which simulate a virito-net
loopback ethernet device on top of vringh + workqueue. This could be
used as a reference implementation for real hardware driver.
Notes:
- Some of the key transport command for vhost-mdev(userspace driver)
is not introduced. This includes:
1) set/get virtqueue state (idx etc), this could be simply done by
introducing new transport command
2) dirty pages tracking, could be simply done by introducing new
transport command
3) set/get device internal state, this requires more thought, of
course we can introduce device specific transport command, but it
would be better to have a unified API
- Current mdev_parent_ops assumes all pointers are userspace pointer,
this block the kernel driver, this series just abuse those as kernel
pointer and this could be addressed by inventing new parent_ops.
- For quick POC, mdev transport was just derived from virtio-MMIO,
I'm pretty sure it has lots of space to be optimized, please share
your thought.
Please review.
[1] https://lkml.org/lkml/2019/8/28/35
Jason Wang (4):
vringh: fix copy direction of vringh_iov_push_kern()
mdev: introduce helper to set per device dma ops
virtio: introudce a mdev based transport
docs: Sample driver to demonstrate how to implement virtio-mdev
framework
drivers/vfio/mdev/Kconfig | 7 +
drivers/vfio/mdev/Makefile | 1 +
drivers/vfio/mdev/mdev_core.c | 7 +
drivers/vfio/mdev/virtio_mdev.c | 500 ++++++++++++++++++++
drivers/vhost/vringh.c | 8 +-
include/linux/mdev.h | 2 +
include/uapi/linux/virtio_mdev.h | 131 ++++++
samples/Kconfig | 7 +
samples/vfio-mdev/Makefile | 1 +
samples/vfio-mdev/mvnet.c | 766 +++++++++++++++++++++++++++++++
10 files changed, 1429 insertions(+), 1 deletion(-)
create mode 100644 drivers/vfio/mdev/virtio_mdev.c
create mode 100644 include/uapi/linux/virtio_mdev.h
create mode 100644 samples/vfio-mdev/mvnet.c
--
2.19.1
^ permalink raw reply
* Re: [PATCH] net/mlx5: reduce stack usage in FW tracer
From: Arnd Bergmann @ 2019-09-10 8:14 UTC (permalink / raw)
To: Saeed Mahameed
Cc: cai@lca.pw, linux-rdma@vger.kernel.org, davem@davemloft.net,
Moshe Shemesh, Feras Daoud, linux-kernel@vger.kernel.org,
Eran Ben Elisha, netdev@vger.kernel.org, leon@kernel.org,
Erez Shitrit
In-Reply-To: <5abccf6452a9d4efa2a1593c0af6d41703d4f16f.camel@mellanox.com>
On Mon, Sep 9, 2019 at 11:53 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
> On Mon, 2019-09-09 at 22:18 +0200, Arnd Bergmann wrote:
> > To do this right, a better approach may be to just rely on ftrace,
> > storing
> > the (pointer to the) format string and the arguments in the buffer
> > without
> > creating a string. Would that be an option here?
>
> I am not sure how this would work, since the format parameters can
> changes depending on the FW string and the specific traces.
Ah, so the format string comes from the firmware? I didn't look
at the code in enough detail to understand why it's done like this,
only enough to notice that it's rather unusual.
Possibly trace_mlx5_fw might still get away with copying the format
string and the arguments, leaving the snprintf() to the time we read
the buffer, but I don't know enough about ftrace to be sure that
would actually work, and you'd need to duplicate it in
mlx5_devlink_fmsg_fill_trace().
> > A more minimal approach might be to move what is now the on-stack
> > buffer into the mlx5_fw_tracer function. I see that you already store
> > a copy of the string in there from mlx5_fw_tracer_save_trace(),
> > which conveniently also holds a mutex already that protects
> > it from concurrent access.
> >
>
> This sounds plausible.
>
> So for now let's do this or the noinline approach, Please let me know
> which one do you prefer, if it is the mutex protected buffer, i can do
> it myself.
>
> I will open an internal task and discussion then address your valuable
> points in a future submission, since we already in rc8 I don't want to
> take the risk now.
Yes, that sounds like a good plan. If you can't avoid the snprintf
entirely, then the mutex protected buffer should be helpful, and
also avoid a strncpy() along with the stack buffer.
Arnd
^ permalink raw reply
* Re: [PATCH net v2] bridge/mdb: remove wrong use of NLM_F_MULTI
From: David Miller @ 2019-09-10 8:13 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: roopa, netdev, bridge, nikolay
In-Reply-To: <20190906094703.21300-1-nicolas.dichtel@6wind.com>
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 6 Sep 2019 11:47:02 +0200
> NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end.
> In fact, NLMSG_DONE is sent only at the end of a dump.
>
> Libraries like libnl will wait forever for NLMSG_DONE.
>
> Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del")
> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH] tcp: fix tcp_disconnect() not clear tp->fastopen_rsk sometimes
From: David Miller @ 2019-09-10 8:09 UTC (permalink / raw)
To: chunguo.feng
Cc: edumazet, kuznet, yoshfuji, ast, daniel, netdev, kafai,
songliubraving, yhs, linux-kernel, bpf
In-Reply-To: <20190906093429.930-1-chunguo.feng@amlogic.com>
From: chunguo feng <chunguo.feng@amlogic.com>
Date: Fri, 6 Sep 2019 17:34:29 +0800
> From: fengchunguo <chunguo.feng@amlogic.com>
>
> This patch avoids fastopen_rsk not be cleared every times, then occur
> the below BUG_ON:
> tcp_v4_destroy_sock
> ->BUG_ON(tp->fastopen_rsk);
>
> When playback some videos from netwrok,used tcp_disconnect continually.
...
> Signed-off-by: fengchunguo <chunguo.feng@amlogic.com>
This still needs review.
^ permalink raw reply
* Re: [PATCH v3 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Michal Kubecek @ 2019-09-10 8:00 UTC (permalink / raw)
To: netdev
Cc: Alexandru Ardelean, devicetree, linux-kernel, davem, robh+dt,
mark.rutland, f.fainelli, hkallweit1, andrew
In-Reply-To: <20190909131251.3634-2-alexandru.ardelean@analog.com>
On Mon, Sep 09, 2019 at 04:12:50PM +0300, Alexandru Ardelean wrote:
> The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
> this feature is common across other PHYs (like EEE), and defining
> `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
>
> The way EDPD works, is that the RX block is put to a lower power mode,
> except for link-pulse detection circuits. The TX block is also put to low
> power mode, but the PHY wakes-up periodically to send link pulses, to avoid
> lock-ups in case the other side is also in EDPD mode.
>
> Currently, there are 2 PHY drivers that look like they could use this new
> PHY tunable feature: the `adin` && `micrel` PHYs.
>
> The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
> default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
> datasheet does not mention whether they can be disabled, but mentions that
> they can modified.
>
> The way this change is structured, is similar to the PHY tunable downshift
> control:
> * a `ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL` value is exposed to cover a default
> TX interval; some PHYs could specify a certain value that makes sense
> * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
> * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
>
> This should allow PHYs to:
> * enable EDPD and not enable TX pulses (interval would be 0)
> * enable EDPD and configure TX pulse interval; note that TX interval units
> would be PHY specific; we could consider `seconds` as units, but it could
> happen that some PHYs would be prefer milliseconds as a unit;
> a maximum of 65533 units should be sufficient
Sorry for missing the discussion on previous version but I don't really
like the idea of leaving the choice of units to PHY. Both for manual
setting and system configuration, it would be IMHO much more convenient
to have the interpretation universal for all NICs.
Seconds as units seem too coarse and maximum of ~18 hours way too big.
Milliseconds would be more practical from granularity point of view,
would maximum of ~65 seconds be sufficient?
Michal Kubecek
> * disable EDPD
>
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
^ permalink raw reply
* Re: [PATCH REPOST 1/2] can: flexcan: fix deadlock when using self wakeup
From: Sean Nyekjaer @ 2019-09-10 7:52 UTC (permalink / raw)
To: Joakim Zhang, mkl@pengutronix.de, linux-can@vger.kernel.org
Cc: wg@grandegger.com, netdev@vger.kernel.org, dl-linux-imx,
Martin Hundebøll
In-Reply-To: <DB7PR04MB461868320DA0B25CC8255213E6BB0@DB7PR04MB4618.eurprd04.prod.outlook.com>
On 05/09/2019 09.10, Joakim Zhang wrote:
> Hi Sean,
>
> Could you update lastest flexcan driver using linux-can-next/flexcan and then merge below two patches from linux-can/testing?
> d0b53616716e (HEAD -> testing, origin/testing) can: flexcan: add LPSR mode support for i.MX7D
> 803eb6bad65b can: flexcan: fix deadlock when using self wakeup
>
> Best Regards,
> Joakim Zhang
Hi
I reverted 2 commits on thw nand driver and got the testing kernel to work.
I can confirm the issue is resolved with this patch :-)
/Sean
^ permalink raw reply
* Re: [PATCH] net/ibmvnic: Fix missing { in __ibmvnic_reset
From: David Miller @ 2019-09-10 7:45 UTC (permalink / raw)
To: msuchanek
Cc: netdev, julietk, benh, paulus, mpe, tlfalcon, jallen,
linuxppc-dev, linux-kernel
In-Reply-To: <20190909204451.7929-1-msuchanek@suse.de>
From: Michal Suchanek <msuchanek@suse.de>
Date: Mon, 9 Sep 2019 22:44:51 +0200
> Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
> adds a } without corresponding { causing build break.
>
> Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Applied.
^ permalink raw reply
* Re: [net-next v2 00/15][pull request] Intel Wired LAN Driver Updates 2019-09-09
From: David Miller @ 2019-09-10 7:45 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190909224802.29595-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 9 Sep 2019 15:47:47 -0700
> This series contains a variety of cold and hot savoury changes to Intel
> drivers. Some of the fixes could be considered for stable even though
> the author did not request it.
...
Pulled.
^ permalink raw reply
* Re: [RFC bpf-next 2/7] bpf: extend bpf_pcap support to tracing programs
From: Yonghong Song @ 2019-09-10 7:43 UTC (permalink / raw)
To: Alan Maguire
Cc: ast@kernel.org, daniel@iogearbox.net, Martin Lau, Song Liu,
davem@davemloft.net, jakub.kicinski@netronome.com,
hawk@kernel.org, john.fastabend@gmail.com, rostedt@goodmis.org,
mingo@redhat.com, quentin.monnet@netronome.com, Andrey Ignatov,
joe@wand.net.nz, acme@redhat.com, jolsa@kernel.org,
alexey.budankov@linux.intel.com, gregkh@linuxfoundation.org,
namhyung@kernel.org, sdf@google.com, f.fainelli@gmail.com,
shuah@kernel.org, peter@lekensteyn.nl, ivan@cloudflare.com,
Andrii Nakryiko, bhole_prashant_q7@lab.ntt.co.jp,
david.calavera@gmail.com, danieltimlee@gmail.com,
Takshak Chahande, netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, toke@redhat.com,
jbenc@redhat.com
In-Reply-To: <alpine.LRH.2.20.1909092236490.10757@dhcp-10-175-172-139.vpn.oracle.com>
On 9/9/19 11:25 PM, Alan Maguire wrote:
> On Sun, 8 Sep 2019, Yonghong Song wrote:
>
>> For net side bpf_perf_event_output, we have
>> static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
>> unsigned long off, unsigned long len)
>> {
>> void *ptr = skb_header_pointer(skb, off, len, dst_buff);
>>
>> if (unlikely(!ptr))
>> return len;
>> if (ptr != dst_buff)
>> memcpy(dst_buff, ptr, len);
>>
>> return 0;
>> }
>>
>> BPF_CALL_5(bpf_skb_event_output, struct sk_buff *, skb, struct bpf_map
>> *, map,
>> u64, flags, void *, meta, u64, meta_size)
>> {
>> u64 skb_size = (flags & BPF_F_CTXLEN_MASK) >> 32;
>>
>> if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
>> return -EINVAL;
>> if (unlikely(skb_size > skb->len))
>> return -EFAULT;
>>
>> return bpf_event_output(map, flags, meta, meta_size, skb, skb_size,
>> bpf_skb_copy);
>> }
>>
>> It does not really consider output all the frags.
>> I understand that to get truly all packet data, frags should be
>> considered, but seems we did not do it before? I am wondering
>> whether we need to do here.
>
> Thanks for the feedback! In experimenting with packet capture,
> my original hope was to keep things simple and avoid fragment parsing
> if possible. However if scatter-gather is enabled for the networking
> device, or indeed if it's running in a VM it turns out a lot of the
> interesting packet data ends up in the fragments on transmit (ssh
> headers, http headers etc). So I think it would be worth considering
> adding support for fragment traversal. It's not needed as much
> in the skb program case - we can always pullup the skb - but in
> the tracing situation we probably wouldn't want to do something
> that invasive in tracing context.
Agree that in tracing context, we should avoid push/pull skb. It is
indeed invasive.
>
> Fragment traversal might be worth breaking out as a separate patchset,
> perhaps triggered by a specific flag to bpf_skb_event_output?
This can be done for bpf_skb_event_output as the context is a sk_buff.
And you can just follow the frags to copy the whole thing without
bpf_probe_read().
>
> Feedback from folks at Linux Plumbers (I hope I'm summarizing correctly)
> seemed to agree with what you mentioned WRT the first patch in this
> series. The gist was we probably don't want to force the metadata to be a
> specific packet capture type; we'd rather use the existing perf event
> mechanisms and if we are indeed doing packet capture, simply specify that
> data in the program as metadata.
Agree, you can have whatever metadata you have for bpf_perf_event_output.
>
> I'd be happy with that approach myself if I could capture skb
> fragments in tracing programs - being able to do that would give
> equivalent functionality to what I proposed but without having a packet
> capture-specific helper.
That won't work for tracing program. Full of bpf_probe_read()
in tracing version of packet copying is not nice either.
We may still need a different helper for tracing programs.
I think we need something like below:
- vmlinux BTF at /sys/kernel/btf/kernel, is loaded into kernel.
(/sys/kernel/btf/kernel is the source of truth)
- For a tracing bpf program, if that function eventually
copy helper
bpf_skb_event_output(..., skb, ...)
the verifier needs to verify skb is indeed a valid skb
by tracing back to one of parameters.
Here, I use skb as an example, maybe it can be extended
to other data structures as well.
With this approach, you can reuse some of functions from
tracing side to deal with frag copying and no bpf_probe_read()
is needed.
Here, I use skb as an example, maybe it can be extended
to other data structures as well if needed.
>>
>> If we indeed do not need to handle frags here, I think maybe
>> bpf_probe_read() in existing bpf kprobe function should be
>> enough, we do not need this helper?
>>
>
> Certainly for many use cases, that will get you most of what you need -
> particularly if you're just looking at L2 to L4 data. For full packet
> capture however I think we may need to think about fragment traversal.
>
>>> +
>>> +/* Derive protocol for some of the easier cases. For tracing, a probe point
>>> + * may be dealing with packets in various states. Common cases are IP
>>> + * packets prior to adding MAC header (_PCAP_TYPE_IP) and a full packet
>>> + * (_PCAP_TYPE_ETH). For other cases the caller must specify the
>>> + * protocol they expect. Other heuristics for packet identification
>>> + * should be added here as needed, since determining the packet type
>>> + * ensures we do not capture packets that fail to match the desired
>>> + * pcap type in BPF_F_PCAP_STRICT_TYPE mode.
>>> + */
>>> +static inline int bpf_skb_protocol_get(struct sk_buff *skb)
>>> +{
>>> + switch (htons(skb->protocol)) {
>>> + case ETH_P_IP:
>>> + case ETH_P_IPV6:
>>> + if (skb_network_header(skb) == skb->data)
>>> + return BPF_PCAP_TYPE_IP;
>>> + else
>>> + return BPF_PCAP_TYPE_ETH;
>>> + default:
>>> + return BPF_PCAP_TYPE_UNSET;
>>> + }
>>> +}
>>> +
>>> +BPF_CALL_5(bpf_trace_pcap, void *, data, u32, size, struct bpf_map *, map,
>>> + int, protocol_wanted, u64, flags)
>>
>> Up to now, for helpers, verifier has a way to verifier it is used
>> properly regarding to the context. For example, for xdp version
>> perf_event_output, the help prototype,
>> BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct
>> bpf_map *, map,
>> u64, flags, void *, meta, u64, meta_size)
>> the verifier is able to guarantee that the first parameter
>> has correct type xdp_buff, not something from type cast.
>> .arg1_type = ARG_PTR_TO_CTX,
>>
>> This helper, in the below we have
>> .arg1_type = ARG_ANYTHING,
>>
>> So it is not really enforced. Bringing BTF can help, but type
>> name matching typically bad.
>>
>>
> One thing we were discussing - and I think this is similar to what
> you're suggesting - is to investigate if there might be a way to
> leverage BTF to provide additional guarantees that the tracing
> data we are handling is indeed an skb. Specifically if we
> trace a kprobe function argument or a tracepoint function, and
> if we had that guarantee, we could perhaps invoke the skb-style
> perf event output function (trace both the skb data and the metadata).
> The challenge would be how to do that type-based matching; we'd
> need the function argument information from BTF _and_ need to
> somehow associate it at probe attach time.
>
> Thanks again for looking at the code!
>
> Alan
>
^ permalink raw reply
* Re: [PATCH net-next 1/5] enetc: Fix if_mode extraction
From: Andrew Lunn @ 2019-09-10 7:44 UTC (permalink / raw)
To: Claudiu Manoil
Cc: David S . Miller, Alexandru Marginean, netdev@vger.kernel.org
In-Reply-To: <VI1PR04MB48803DB044AB6CF66CACB89E96B70@VI1PR04MB4880.eurprd04.prod.outlook.com>
On Mon, Sep 09, 2019 at 04:24:01PM +0000, Claudiu Manoil wrote:
> >-----Original Message-----
> >From: Andrew Lunn <andrew@lunn.ch>
> >Sent: Friday, September 6, 2019 10:58 PM
> >To: Claudiu Manoil <claudiu.manoil@nxp.com>
> >Cc: David S . Miller <davem@davemloft.net>; Alexandru Marginean
> ><alexandru.marginean@nxp.com>; netdev@vger.kernel.org
> >Subject: Re: [PATCH net-next 1/5] enetc: Fix if_mode extraction
> >
> >On Fri, Sep 06, 2019 at 05:15:40PM +0300, Claudiu Manoil wrote:
> >> Fix handling of error return code. Before this fix,
> >> the error code was handled as unsigned type.
> >> Also, on this path if if_mode not found then just handle
> >> it as fixed link (i.e mac2mac connection).
> >>
> >> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
> >> ---
> >> drivers/net/ethernet/freescale/enetc/enetc_pf.c | 17 ++++++-----------
> >> 1 file changed, 6 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> >b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> >> index 7d6513ff8507..3a556646a2fb 100644
> >> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> >> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> >> @@ -751,6 +751,7 @@ static int enetc_of_get_phy(struct enetc_ndev_priv
> >*priv)
> >> struct enetc_pf *pf = enetc_si_priv(priv->si);
> >> struct device_node *np = priv->dev->of_node;
> >> struct device_node *mdio_np;
> >> + int phy_mode;
> >> int err;
> >>
> >> if (!np) {
> >> @@ -784,17 +785,11 @@ static int enetc_of_get_phy(struct enetc_ndev_priv
> >*priv)
> >> }
> >> }
> >>
> >> - priv->if_mode = of_get_phy_mode(np);
> >> - if (priv->if_mode < 0) {
> >> - dev_err(priv->dev, "missing phy type\n");
> >> - of_node_put(priv->phy_node);
> >> - if (of_phy_is_fixed_link(np))
> >> - of_phy_deregister_fixed_link(np);
> >> - else
> >> - enetc_mdio_remove(pf);
> >> -
> >> - return -EINVAL;
> >> - }
> >
> >Hi Claudiu
> >
> >It is not clear to me why it is no longer necessary to deregister the
> >fixed link, or remove the mdio bus?
> >
> >> + phy_mode = of_get_phy_mode(np);
> >> + if (phy_mode < 0)
> >> + priv->if_mode = PHY_INTERFACE_MODE_NA; /* fixed link */
> >> + else
> >> + priv->if_mode = phy_mode;
> >
>
> Hi Andrew,
>
> The MAC2MAC connections are defined as fixed-link too, but without
> phy-mode/phy-connection-type properties. We don't want to de-register
> these links. Initial code was bogus in this regard.
Hi Claudiu
This is what is not clear in the change log. That this code is removed
because it is wrong. Please could you expand the explanation to make
this clearer.
> Current proposal is:
> ethernet@0,2 { /* SoC internal, connected to switch port 4 */
> compatible = "fsl,enetc";
> reg = <0x000200 0 0 0 0>;
> fixed-link {
> speed = <1000>;
> full-duplex;
> };
> };
> switch@0,5 {
> compatible = "mscc,felix-switch";
> [...]
> ports {
> #address-cells = <1>;
> #size-cells = <0>;
>
> /* external ports */
> [...]
> /* internal SoC ports */
> port@4 { /* connected to ENETC port2 */
> reg = <4>;
> fixed-link {
> speed = <1000>;
> full-duplex;
> };
> };
So this connection between the SoC and the switch does not use tags?
Can it use tags? Does the hardware allow you to have two CPU ports,
and load balance over them?
This second half is just standard DSA. This looks good.
Andrew
> port@5 { /* CPU port, connected to ENETC port3 */
> reg = <5>;
> ethernet = <&enetc_port3>;
> fixed-link {
> speed = <1000>;
> full-duplex;
> };
> };
> };
> };
> enetc_port3: ethernet@0,6 { /* SoC internal connected to switch port 5 */
> compatible = "fsl,enetc";
> reg = <0x000600 0 0 0 0>;
> fixed-link {
> speed = <1000>;
> full-duplex;
> };
> };
> };
>
> Thanks.
>
> Claudiu
^ permalink raw reply
* Re: [RFC PATCH untested] vhost: block speculation of translated descriptors
From: Jason Wang @ 2019-09-10 7:28 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization, netdev
In-Reply-To: <20190910024814-mutt-send-email-mst@kernel.org>
On 2019/9/10 下午2:48, Michael S. Tsirkin wrote:
> On Tue, Sep 10, 2019 at 09:52:10AM +0800, Jason Wang wrote:
>> On 2019/9/9 下午10:45, Michael S. Tsirkin wrote:
>>> On Mon, Sep 09, 2019 at 03:19:55PM +0800, Jason Wang wrote:
>>>> On 2019/9/8 下午7:05, Michael S. Tsirkin wrote:
>>>>> iovec addresses coming from vhost are assumed to be
>>>>> pre-validated, but in fact can be speculated to a value
>>>>> out of range.
>>>>>
>>>>> Userspace address are later validated with array_index_nospec so we can
>>>>> be sure kernel info does not leak through these addresses, but vhost
>>>>> must also not leak userspace info outside the allowed memory table to
>>>>> guests.
>>>>>
>>>>> Following the defence in depth principle, make sure
>>>>> the address is not validated out of node range.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>> drivers/vhost/vhost.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>> index 5dc174ac8cac..0ee375fb7145 100644
>>>>> --- a/drivers/vhost/vhost.c
>>>>> +++ b/drivers/vhost/vhost.c
>>>>> @@ -2072,7 +2072,9 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
>>>>> size = node->size - addr + node->start;
>>>>> _iov->iov_len = min((u64)len - s, size);
>>>>> _iov->iov_base = (void __user *)(unsigned long)
>>>>> - (node->userspace_addr + addr - node->start);
>>>>> + (node->userspace_addr +
>>>>> + array_index_nospec(addr - node->start,
>>>>> + node->size));
>>>>> s += size;
>>>>> addr += size;
>>>>> ++ret;
>>>> I've tried this on Kaby Lake smap off metadata acceleration off using
>>>> testpmd (virtio-user) + vhost_net. I don't see obvious performance
>>>> difference with TX PPS.
>>>>
>>>> Thanks
>>> Should I push this to Linus right now then? It's a security thing so
>>> maybe we better do it ASAP ... what's your opinion?
>>
>> Yes, you can.
>>
>> Acked-by: Jason Wang <jasowang@redhat.com>
>
> And should I include
>
> Tested-by: Jason Wang <jasowang@redhat.com>
>
> ?
Yes.
Thanks
>
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox