All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, benh@kernel.crashing.org,
	mpe@ellerman.id.au, aik@ozlabs.ru, alistair@popple.id.au
Subject: Re: [PATCH v9 16/26] powerpc/powernv/ioda1: Improve DMA32 segment track
Date: Wed, 4 May 2016 23:20:01 +1000	[thread overview]
Message-ID: <20160504132001.GA29136@gwshan> (raw)
In-Reply-To: <1462254105-24128-17-git-send-email-gwshan@linux.vnet.ibm.com>

On Tue, May 03, 2016 at 03:41:35PM +1000, Gavin Shan wrote:
>In current implementation, the DMA32 segments required by one specific
>PE isn't calculated with the information hold in the PE independently.
>It conflicts with the PCI hotplug design: PE centralized, meaning the
>PE's DMA32 segments should be calculated from the information hold in
>the PE independently.
>
>This introduces an array (@dma32_segmap) for every PHB to track the
>DMA32 segmeng usage. Besides, this moves the logic calculating PE's
>consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
>DMA32 segments are calculated/allocated from the information hold in
>the PE (DMA32 weight). Also the logic is improved: we try to allocate
>as much DMA32 segments as we can. It's acceptable that number of DMA32
>segments less than the expected number are allocated.
>
>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

This can cause overlapped DMA32 segments to different PEs in some cases.
I already had the fix and hold for posting after discussion with Michael
on how to handle the series tomorrow.

Thanks,
Gavin

>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 110 ++++++++++++++++--------------
> arch/powerpc/platforms/powernv/pci.h      |   7 +-
> 2 files changed, 61 insertions(+), 56 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index f70a4e0..cfd2906 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -2011,27 +2011,57 @@ static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe)
> }
>
> static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>-				       struct pnv_ioda_pe *pe,
>-				       unsigned int base,
>-				       unsigned int segs)
>+				       struct pnv_ioda_pe *pe)
> {
>
> 	struct page *tce_mem = NULL;
> 	struct iommu_table *tbl;
>-	unsigned int tce32_segsz, i;
>+	unsigned int weight, total_weight = 0;
>+	unsigned int tce32_segsz, base, segs, i;
> 	int64_t rc;
> 	void *addr;
>
> 	/* XXX FIXME: Handle 64-bit only DMA devices */
> 	/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
> 	/* XXX FIXME: Allocate multi-level tables on PHB3 */
>+	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight,
>+		     &total_weight);
>+	weight = pnv_pci_ioda_pe_dma_weight(pe);
>+
>+	segs = (weight * phb->ioda.dma32_count) / total_weight;
>+	if (!segs)
>+		segs = 1;
>+
>+	/*
>+	 * Allocate contiguous DMA32 segments. We begin with the expected
>+	 * number of segments. With one more attempt, the number of DMA32
>+	 * segments to be allocated is decreased by one until one segment
>+	 * is allocated successfully.
>+	 */
>+	do {
>+		for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
>+			for (i = base; i < base + segs; i++) {
>+				if (phb->ioda.dma32_segmap[i] !=
>+				    IODA_INVALID_PE)
>+					goto found;
>+			}
>+		}
>+	} while (segs--);
>+
>+	if (!segs) {
>+		pe_warn(pe, "No available DMA32 segments\n");
>+		return;
>+	}
>
>+found:
> 	tbl = pnv_pci_table_alloc(phb->hose->node);
> 	iommu_register_group(&pe->table_group, phb->hose->global_number,
> 			pe->pe_number);
> 	pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, &pe->table_group);
>
> 	/* Grab a 32-bit TCE table */
>+	pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
>+		weight, total_weight, base, segs);
> 	pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
> 		base * PNV_IODA1_DMA32_SEGSIZE,
> 		(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
>@@ -2068,6 +2098,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
> 		}
> 	}
>
>+	/* Setup DMA32 segment mapping */
>+	for (i = base; i < base + segs; i++)
>+		phb->ioda.dma32_segmap[i] = pe->pe_number;
>+
> 	/* Setup linux iommu table */
> 	pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
> 				  base * PNV_IODA1_DMA32_SEGSIZE,
>@@ -2542,73 +2576,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
> static void pnv_ioda_setup_dma(struct pnv_phb *phb)
> {
> 	struct pci_controller *hose = phb->hose;
>-	unsigned int weight, total_weight, dma_pe_count;
>-	unsigned int residual, remaining, segs, base;
> 	struct pnv_ioda_pe *pe;
>-
>-	total_weight = 0;
>-	pci_walk_bus(phb->hose->bus, pnv_pci_ioda_dev_dma_weight,
>-		     &total_weight);
>-
>-	dma_pe_count = 0;
>-	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>-		weight = pnv_pci_ioda_pe_dma_weight(pe);
>-		if (weight > 0)
>-			dma_pe_count++;
>-	}
>+	unsigned int weight;
>
> 	/* If we have more PE# than segments available, hand out one
> 	 * per PE until we run out and let the rest fail. If not,
> 	 * then we assign at least one segment per PE, plus more based
> 	 * on the amount of devices under that PE
> 	 */
>-	if (dma_pe_count > phb->ioda.tce32_count)
>-		residual = 0;
>-	else
>-		residual = phb->ioda.tce32_count - dma_pe_count;
>-
>-	pr_info("PCI: Domain %04x has %ld available 32-bit DMA segments\n",
>-		hose->global_number, phb->ioda.tce32_count);
>-	pr_info("PCI: %d PE# for a total weight of %d\n",
>-		dma_pe_count, total_weight);
>+	pr_info("PCI: Domain %04x has %d available 32-bit DMA segments\n",
>+		hose->global_number, phb->ioda.dma32_count);
>
> 	pnv_pci_ioda_setup_opal_tce_kill(phb);
>
>-	/* Walk our PE list and configure their DMA segments, hand them
>-	 * out one base segment plus any residual segments based on
>-	 * weight
>-	 */
>-	remaining = phb->ioda.tce32_count;
>-	base = 0;
>+	/* Walk our PE list and configure their DMA segments */
> 	list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> 		weight = pnv_pci_ioda_pe_dma_weight(pe);
> 		if (!weight)
> 			continue;
>
>-		if (!remaining) {
>-			pe_warn(pe, "No DMA32 resources available\n");
>-			continue;
>-		}
>-		segs = 1;
>-		if (residual) {
>-			segs += ((weight * residual) + (total_weight / 2)) /
>-				total_weight;
>-			if (segs > remaining)
>-				segs = remaining;
>-		}
>-
> 		/*
> 		 * For IODA2 compliant PHB3, we needn't care about the weight.
> 		 * The all available 32-bits DMA space will be assigned to
> 		 * the specific PE.
> 		 */
> 		if (phb->type == PNV_PHB_IODA1) {
>-			pe_info(pe, "DMA weight %d, assigned %d DMA32 segments\n",
>-				weight, segs);
>-			pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
>+			pnv_pci_ioda1_setup_dma_pe(phb, pe);
> 		} else if (phb->type == PNV_PHB_IODA2) {
> 			pe_info(pe, "Assign DMA32 space\n");
>-			segs = 0;
> 			pnv_pci_ioda2_setup_dma_pe(phb, pe);
> 		} else if (phb->type == PNV_PHB_NPU) {
> 			/*
>@@ -2618,9 +2613,6 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
> 			 * as the PHB3 TVT.
> 			 */
> 		}
>-
>-		remaining -= segs;
>-		base += segs;
> 	}
> }
>
>@@ -3327,7 +3319,8 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> {
> 	struct pci_controller *hose;
> 	struct pnv_phb *phb;
>-	unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>+	unsigned long size, m64map_off, m32map_off, pemap_off;
>+	unsigned long iomap_off = 0, dma32map_off = 0;
> 	const __be64 *prop64;
> 	const __be32 *prop32;
> 	int len;
>@@ -3413,6 +3406,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	phb->ioda.io_segsize = phb->ioda.io_size / phb->ioda.total_pe_num;
> 	phb->ioda.io_pci_base = 0; /* XXX calculate this ? */
>
>+	/* Calculate how many 32-bit TCE segments we have */
>+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
>+				PNV_IODA1_DMA32_SEGSIZE;
>+
> 	/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
> 	size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
> 	m64map_off = size;
>@@ -3422,6 +3419,9 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	if (phb->type == PNV_PHB_IODA1) {
> 		iomap_off = size;
> 		size += phb->ioda.total_pe_num * sizeof(phb->ioda.io_segmap[0]);
>+		dma32map_off = size;
>+		size += phb->ioda.dma32_count *
>+			sizeof(phb->ioda.dma32_segmap[0]);
> 	}
> 	pemap_off = size;
> 	size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>@@ -3437,6 +3437,10 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 		phb->ioda.io_segmap = aux + iomap_off;
> 		for (segno = 0; segno < phb->ioda.total_pe_num; segno++)
> 			phb->ioda.io_segmap[segno] = IODA_INVALID_PE;
>+
>+		phb->ioda.dma32_segmap = aux + dma32map_off;
>+		for (segno = 0; segno < phb->ioda.dma32_count; segno++)
>+			phb->ioda.dma32_segmap[segno] = IODA_INVALID_PE;
> 	}
> 	phb->ioda.pe_array = aux + pemap_off;
> 	set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
>@@ -3445,7 +3449,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
> 	mutex_init(&phb->ioda.pe_list_mutex);
>
> 	/* Calculate how many 32-bit TCE segments we have */
>-	phb->ioda.tce32_count = phb->ioda.m32_pci_base /
>+	phb->ioda.dma32_count = phb->ioda.m32_pci_base /
> 				PNV_IODA1_DMA32_SEGSIZE;
>
> #if 0 /* We should really do that ... */
>diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>index 117cfcd..14d9391 100644
>--- a/arch/powerpc/platforms/powernv/pci.h
>+++ b/arch/powerpc/platforms/powernv/pci.h
>@@ -142,6 +142,10 @@ struct pnv_phb {
> 		unsigned int		*m32_segmap;
> 		unsigned int		*io_segmap;
>
>+		/* DMA32 segment maps - IODA1 only */
>+		unsigned int		dma32_count;
>+		unsigned int		*dma32_segmap;
>+
> 		/* IRQ chip */
> 		int			irq_chip_init;
> 		struct irq_chip		irq_chip;
>@@ -158,9 +162,6 @@ struct pnv_phb {
> 		 */
> 		unsigned char		pe_rmap[0x10000];
>
>-		/* 32-bit TCE tables allocation */
>-		unsigned long		tce32_count;
>-
> 		/* TCE cache invalidate registers (physical and
> 		 * remapped)
> 		 */
>-- 
>2.1.0
>

  reply	other threads:[~2016-05-04 13:21 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-03  5:41 [PATCH v9 00/26] powerpc/powernv: PCI hotplug preparation Gavin Shan
2016-05-03  5:41 ` [PATCH v9 01/26] powerpc/pci: Cleanup on struct pci_controller_ops Gavin Shan
2016-05-10 21:48   ` [v9,01/26] " Michael Ellerman
2016-05-03  5:41 ` [PATCH v9 02/26] powerpc/powernv: Cleanup on pci_controller_ops instances Gavin Shan
2016-05-05  4:07   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 03/26] powerpc/powernv: Drop phb->bdfn_to_pe() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 04/26] powerpc/powernv: Reorder fields in struct pnv_phb Gavin Shan
2016-05-03  5:41 ` [PATCH v9 05/26] powerpc/powernv: Rename PE# " Gavin Shan
2016-05-03  5:41 ` [PATCH v9 06/26] powerpc/powernv: Data type unsigned int for PE number Gavin Shan
2016-05-04  3:31   ` Alistair Popple
2016-05-04  8:39   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 07/26] powerpc/powernv: Fix initial IO and M32 segmap Gavin Shan
2016-05-04  3:31   ` Alistair Popple
2016-05-04  4:38     ` Gavin Shan
2016-05-05  2:06   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 08/26] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg() Gavin Shan
2016-05-04  3:45   ` Alistair Popple
2016-05-05  2:11   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 09/26] powerpc/powernv: IO and M32 mapping based on PCI device resources Gavin Shan
2016-05-05  2:57   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 10/26] powerpc/powernv: Track M64 segment consumption Gavin Shan
2016-05-03  5:41 ` [PATCH v9 11/26] powerpc/powernv: Rename M64 related functions Gavin Shan
2016-05-03  5:41 ` [PATCH v9 12/26] powerpc/powernv/ioda1: M64 support on P7IOC Gavin Shan
2016-05-04  5:17   ` Alistair Popple
2016-05-04  6:48     ` Gavin Shan
2016-05-04 23:53       ` Alistair Popple
2016-05-05  0:40         ` Gavin Shan
2016-05-05  1:03           ` Alistair Popple
2016-05-05  2:28             ` Gavin Shan
2016-05-05  2:02   ` [PATCH v10 " Gavin Shan
2016-05-05  2:41     ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 13/26] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 14/26] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE Gavin Shan
2016-05-04  4:02   ` Alistair Popple
2016-05-05  2:48   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 15/26] powerpc/powernv: Remove DMA32 PE list Gavin Shan
2016-05-03  5:41 ` [PATCH v9 16/26] powerpc/powernv/ioda1: Improve DMA32 segment track Gavin Shan
2016-05-04 13:20   ` Gavin Shan [this message]
2016-05-05  1:55     ` Gavin Shan
2016-05-05  2:04   ` [PATCH v10 " Gavin Shan
2016-05-05  4:03     ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 17/26] powerpc/powernv: Use PE instead of number during setup and release Gavin Shan
2016-05-03  5:41 ` [PATCH v9 18/26] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Gavin Shan
2016-05-04  4:10   ` Alistair Popple
2016-05-04  4:53     ` [PATCH v9 18/26] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Gavin Shan
2016-05-04  4:43   ` [PATCH v9 18/26] powerpc/pci: Rename pcibios_{add, remove}_pci_devices() Andrew Donnellan
2016-05-05  3:06   ` [PATCH v9 18/26] powerpc/pci: Rename pcibios_{add,remove}_pci_devices() Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 19/26] powerpc/pci: Rename pcibios_find_pci_bus() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 20/26] powerpc/pci: Move pci_find_bus_by_node() around Gavin Shan
2016-05-04  4:46   ` Andrew Donnellan
2016-05-05  3:07   ` Alexey Kardashevskiy
2016-05-03  5:41 ` [PATCH v9 21/26] powerpc/pci: Export pci_add_device_node_info() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 22/26] powerpc/pci: Introduce pci_remove_device_node_info() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 23/26] powerpc/pci: Export pci_traverse_device_nodes() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 24/26] powerpc/pci: Don't scan empty slot Gavin Shan
2016-05-03  5:41 ` [PATCH v9 25/26] powerpc/powernv: Simplify pnv_eeh_reset() Gavin Shan
2016-05-03  5:41 ` [PATCH v9 26/26] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus() Gavin Shan
2016-05-12  3:48   ` Gavin Shan
2016-05-12 11:35     ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160504132001.GA29136@gwshan \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.