From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DD0CC433E0 for ; Wed, 29 Jul 2020 22:02:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C50B32070B for ; Wed, 29 Jul 2020 22:02:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C50B32070B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:35084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k0uA1-0004M4-0C for qemu-devel@archiver.kernel.org; Wed, 29 Jul 2020 18:02:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52758) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k0u8d-0003Ni-BM; Wed, 29 Jul 2020 18:01:27 -0400 Received: from mga11.intel.com ([192.55.52.93]:41645) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k0u8U-0007sQ-Lb; Wed, 29 Jul 2020 18:01:27 -0400 IronPort-SDR: 2b5Tn4DagwGG9ZqbG1F5tGcyXDmteWYbeUPrussmzCItwPj8xoIrxORN3zs6HIZQrcyMbHHgf5 BlT9RbTOutwg== X-IronPort-AV: E=McAfee;i="6000,8403,9697"; a="149342748" X-IronPort-AV: E=Sophos;i="5.75,412,1589266800"; d="scan'208";a="149342748" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2020 15:01:13 -0700 IronPort-SDR: 8QfKk9ZcLQV6UnX6LF5SkX3A6YETo6HH51TmDxg0ZW38SxAmPdHjppJGtsEfrfWp0jgfgdoKYP SWn5V98mbI1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,412,1589266800"; d="scan'208";a="304361610" Received: from ajakowsk-mobl1.amr.corp.intel.com (HELO localhost.localdomain) ([10.209.191.249]) by orsmga002.jf.intel.com with ESMTP; 29 Jul 2020 15:01:13 -0700 From: Andrzej Jakowski To: kbusch@kernel.org, kwolf@redhat.com, mreitz@redhat.com Subject: [PATCH v6 2/2] nvme: allow cmb and pmr to be enabled on same device Date: Wed, 29 Jul 2020 15:01:07 -0700 Message-Id: <20200729220107.37758-3-andrzej.jakowski@linux.intel.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20200729220107.37758-1-andrzej.jakowski@linux.intel.com> References: <20200729220107.37758-1-andrzej.jakowski@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: none client-ip=192.55.52.93; envelope-from=andrzej.jakowski@linux.intel.com; helo=mga11.intel.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/29 18:01:11 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -68 X-Spam_score: -6.9 X-Spam_bar: ------ X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrzej Jakowski , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" So far it was not possible to have CMB and PMR emulated on the same device, because BAR2 was used exclusively either of PMR or CMB. This patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors. Signed-off-by: Andrzej Jakowski --- hw/block/nvme.c | 124 +++++++++++++++++++++++++++++-------------- hw/block/nvme.h | 1 + include/block/nvme.h | 4 +- 3 files changed, 89 insertions(+), 40 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 43866b744f..292bca445f 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -22,12 +22,13 @@ * [pmrdev=,] \ * max_ioqpairs= * - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. + * Note cmb_size_mb denotes size of CMB in MB. CMB when configured is assumed + * to be resident in BAR4 at offset that is 2MiB aligned. When CMB is emulated + * on Linux guest it is recommended to make cmb_size_mb multiple of 2. Both + * size and alignment restrictions are imposed by Linux guest. * - * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation - * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when - * both provided. + * pmrdev is assumed to be resident in BAR2/BAR3. When configured it consumes + * whole BAR2/BAR3 exclusively. * Enabling pmr emulation can be achieved by pointing to memory-backend-file. * For example: * -object memory-backend-file,id=,share=on,mem-path=, \ @@ -57,7 +58,6 @@ #define NVME_MAX_IOQPAIRS 0xffff #define NVME_DB_SIZE 4 #define NVME_SPEC_VER 0x00010300 -#define NVME_CMB_BIR 2 #define NVME_PMR_BIR 2 #define NVME_TEMPERATURE 0x143 #define NVME_TEMPERATURE_WARNING 0x157 @@ -109,18 +109,25 @@ static uint16_t nvme_sqid(NvmeRequest *req) return le16_to_cpu(req->sq->sqid); } +static inline hwaddr nvme_cmb_to_absolute_addr(NvmeCtrl *n) +{ + return n->bar4.addr + n->ctrl_mem.addr; +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { - hwaddr low = n->ctrl_mem.addr; - hwaddr hi = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size); + hwaddr low = nvme_cmb_to_absolute_addr(n); + hwaddr hi = low + int128_get64(n->ctrl_mem.size); return addr >= low && addr < hi; } static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size) { + hwaddr cmb_addr = nvme_cmb_to_absolute_addr(n); + if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) { - memcpy(buf, (void *)&n->cmbuf[addr - n->ctrl_mem.addr], size); + memcpy(buf, (void *)&n->cmbuf[addr - cmb_addr], size); return; } @@ -207,17 +214,18 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1, uint64_t prp2, uint32_t len, NvmeCtrl *n) { hwaddr trans_len = n->page_size - (prp1 % n->page_size); + hwaddr cmb_addr = nvme_cmb_to_absolute_addr(n); trans_len = MIN(len, trans_len); int num_prps = (len >> n->page_bits) + 1; if (unlikely(!prp1)) { trace_pci_nvme_err_invalid_prp(); return NVME_INVALID_FIELD | NVME_DNR; - } else if (n->bar.cmbsz && prp1 >= n->ctrl_mem.addr && - prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) { + } else if (n->bar.cmbsz && prp1 >= cmb_addr && + prp1 < cmb_addr + int128_get64(n->ctrl_mem.size)) { qsg->nsg = 0; qemu_iovec_init(iov, num_prps); - qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len); + qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - cmb_addr], trans_len); } else { pci_dma_sglist_init(qsg, &n->parent_obj, num_prps); qemu_sglist_add(qsg, prp1, trans_len); @@ -262,7 +270,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1, if (qsg->nsg){ qemu_sglist_add(qsg, prp_ent, trans_len); } else { - qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len); + qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - cmb_addr], trans_len); } len -= trans_len; i++; @@ -275,7 +283,7 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1, if (qsg->nsg) { qemu_sglist_add(qsg, prp2, len); } else { - qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len); + qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - cmb_addr], trans_len); } } } @@ -1980,7 +1988,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) return; } - if (!n->params.cmb_size_mb && n->pmrdev) { + if (n->pmrdev) { if (host_memory_backend_is_mapped(n->pmrdev)) { char *path = object_get_canonical_path_component(OBJECT(n->pmrdev)); error_setg(errp, "can't use already busy memdev: %s", path); @@ -2042,33 +2050,73 @@ static void nvme_init_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) id_ns->nuse = id_ns->ncap; } -static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev) +static void nvme_init_bar4(PCIDevice *pci_dev, Error **errp) { - NVME_CMBLOC_SET_BIR(n->bar.cmbloc, NVME_CMB_BIR); - NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0); + NvmeCtrl *n = NVME(pci_dev); + int status; + uint64_t bar_size, cmb_offset = 0; + uint32_t msix_vectors; + uint32_t nvme_pba_offset; + uint32_t cmb_size_units; + + msix_vectors = n->params.msix_qsize; + nvme_pba_offset = PCI_MSIX_ENTRY_SIZE * msix_vectors; + bar_size = nvme_pba_offset + QEMU_ALIGN_UP(msix_vectors, 64) / 8; + + if (n->params.cmb_size_mb) { + NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1); + NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0); + NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0); + NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1); + NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1); + NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */ + NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb); + + cmb_size_units = NVME_CMBSZ_GETSIZEUNITS(n->bar.cmbsz); + /* Linux guest requires it to be 2MiB aligned */ + cmb_offset = QEMU_ALIGN_UP(bar_size, 2 * MiB); + + NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 4); + NVME_CMBLOC_SET_OFST(n->bar.cmbloc, cmb_offset / cmb_size_units); + + n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); - NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1); - NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0); - NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0); - NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1); - NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1); - NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */ - NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->params.cmb_size_mb); + bar_size += cmb_offset; + bar_size += NVME_CMBSZ_GETSIZE(n->bar.cmbsz); + } + + bar_size = pow2ceil(bar_size); - n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); - memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n, - "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); - pci_register_bar(pci_dev, NVME_CMBLOC_BIR(n->bar.cmbloc), + /* + * Create memory region for BAR4, then overlap cmb, msix and pba + * tables on top of it. + */ + memory_region_init(&n->bar4, OBJECT(n), "nvme-bar4", bar_size); + + if (n->params.cmb_size_mb) { + memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n, + "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); + + memory_region_add_subregion(&n->bar4, cmb_offset, &n->ctrl_mem); + } + + status = msix_init(pci_dev, n->params.msix_qsize, + &n->bar4, 4, 0, + &n->bar4, 4, nvme_pba_offset, + 0, errp); + + if (status) { + return; + } + + pci_register_bar(pci_dev, 4, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64 | - PCI_BASE_ADDRESS_MEM_PREFETCH, &n->ctrl_mem); + PCI_BASE_ADDRESS_MEM_PREFETCH, &n->bar4); } static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev) { - /* Controller Capabilities register */ - NVME_CAP_SET_PMRS(n->bar.cap, 1); - /* PMR Capabities register */ n->bar.pmrcap = 0; NVME_PMRCAP_SET_RDS(n->bar.pmrcap, 0); @@ -2126,13 +2174,10 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) n->reg_size); pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem); - if (msix_init_exclusive_bar(pci_dev, n->params.msix_qsize, 4, errp)) { - return; - } - if (n->params.cmb_size_mb) { - nvme_init_cmb(n, pci_dev); - } else if (n->pmrdev) { + nvme_init_bar4(pci_dev, errp); + + if (n->pmrdev) { nvme_init_pmr(n, pci_dev); } } @@ -2199,6 +2244,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev) NVME_CAP_SET_CSS(n->bar.cap, 1); NVME_CAP_SET_MPSMAX(n->bar.cap, 4); NVME_CAP_SET_CMBS(n->bar.cap, n->params.cmb_size_mb ? 1 : 0); + NVME_CAP_SET_PMRS(n->bar.cap, n->pmrdev ? 1 : 0); n->bar.vs = NVME_SPEC_VER; n->bar.intmc = n->bar.intms = 0; diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 0b6a8ae665..f291395cd0 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -91,6 +91,7 @@ typedef struct NvmeCtrl { PCIDevice parent_obj; MemoryRegion iomem; MemoryRegion ctrl_mem; + MemoryRegion bar4; NvmeBar bar; BlockConf conf; NvmeParams params; diff --git a/include/block/nvme.h b/include/block/nvme.h index d641ca6649..77b59d18dd 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -217,9 +217,11 @@ enum NvmeCmbszMask { (cmbsz |= (uint64_t)(val & CMBSZ_SZU_MASK) << CMBSZ_SZU_SHIFT) #define NVME_CMBSZ_SET_SZ(cmbsz, val) \ (cmbsz |= (uint64_t)(val & CMBSZ_SZ_MASK) << CMBSZ_SZ_SHIFT) +#define NVME_CMBSZ_GETSIZEUNITS(cmbsz) \ + (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz))) #define NVME_CMBSZ_GETSIZE(cmbsz) \ - (NVME_CMBSZ_SZ(cmbsz) * (1 << (12 + 4 * NVME_CMBSZ_SZU(cmbsz)))) + (NVME_CMBSZ_SZ(cmbsz) * NVME_CMBSZ_GETSIZEUNITS(cmbsz)) enum NvmePmrcapShift { PMRCAP_RDS_SHIFT = 3, -- 2.25.4