* [PATCH 01/51] DMA-API: provide a helper to set both DMA and coherent DMA masks
From: Russell King @ 2013-09-19 21:25 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Vinod Koul, Dan Williams, Rob Landley
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
Provide a helper to set both the DMA and coherent DMA masks to the
same value - this avoids duplicated code in a number of drivers,
sometimes with buggy error handling, and also allows us identify
which drivers do things differently.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
Documentation/DMA-API-HOWTO.txt | 37 ++++++++++++++++++++++---------------
Documentation/DMA-API.txt | 8 ++++++++
include/linux/dma-mapping.h | 14 ++++++++++++++
3 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 14129f1..5e98303 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -101,14 +101,23 @@ style to do this even if your device holds the default setting,
because this shows that you did think about these issues wrt. your
device.
-The query is performed via a call to dma_set_mask():
+The query is performed via a call to dma_set_mask_and_coherent():
- int dma_set_mask(struct device *dev, u64 mask);
+ int dma_set_mask_and_coherent(struct device *dev, u64 mask);
-The query for consistent allocations is performed via a call to
-dma_set_coherent_mask():
+which will query the mask for both streaming and coherent APIs together.
+If you have some special requirements, then the following two separate
+queries can be used instead:
- int dma_set_coherent_mask(struct device *dev, u64 mask);
+ The query for streaming mappings is performed via a call to
+ dma_set_mask():
+
+ int dma_set_mask(struct device *dev, u64 mask);
+
+ The query for consistent allocations is performed via a call
+ to dma_set_coherent_mask():
+
+ int dma_set_coherent_mask(struct device *dev, u64 mask);
Here, dev is a pointer to the device struct of your device, and mask
is a bit mask describing which bits of an address your device
@@ -137,7 +146,7 @@ exactly why.
The standard 32-bit addressing device would do something like this:
- if (dma_set_mask(dev, DMA_BIT_MASK(32))) {
+ if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
@@ -171,22 +180,20 @@ If a card is capable of using 64-bit consistent allocations as well,
int using_dac, consistent_using_dac;
- if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
+ if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) {
using_dac = 1;
consistent_using_dac = 1;
- dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
- } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
+ } else if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
using_dac = 0;
consistent_using_dac = 0;
- dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
} else {
printk(KERN_WARNING
"mydev: No suitable DMA available.\n");
goto ignore_this_device;
}
-dma_set_coherent_mask() will always be able to set the same or a
-smaller mask as dma_set_mask(). However for the rare case that a
+The coherent coherent mask will always be able to set the same or a
+smaller mask as the streaming mask. However for the rare case that a
device driver only uses consistent allocations, one would have to
check the return value from dma_set_coherent_mask().
@@ -199,9 +206,9 @@ Finally, if your device can only drive the low 24-bits of
goto ignore_this_device;
}
-When dma_set_mask() is successful, and returns zero, the kernel saves
-away this mask you have provided. The kernel will use this
-information later when you make DMA mappings.
+When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
+returns zero, the kernel saves away this mask you have provided. The
+kernel will use this information later when you make DMA mappings.
There is a case which we are aware of at this time, which is worth
mentioning in this documentation. If your device supports multiple
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 78a6c56..e865279 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -142,6 +142,14 @@ internal API for use by the platform than an external API for use by
driver writers.
int
+dma_set_mask_and_coherent(struct device *dev, u64 mask)
+
+Checks to see if the mask is possible and updates the device
+streaming and coherent DMA mask parameters if it is.
+
+Returns: 0 if successful and a negative error if not.
+
+int
dma_set_mask(struct device *dev, u64 mask)
Checks to see if the mask is possible and updates the device
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3a8d0a2..ec951f9 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -97,6 +97,20 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
}
#endif
+/*
+ * Set both the DMA mask and the coherent DMA mask to the same thing.
+ * Note that we don't check the return value from dma_set_coherent_mask()
+ * as the DMA API guarantees that the coherent DMA mask can be set to
+ * the same or smaller than the streaming DMA mask.
+ */
+static inline int dma_set_mask_and_coherent(struct device *dev, u64 mask)
+{
+ int rc = dma_set_mask(dev, mask);
+ if (rc == 0)
+ dma_set_coherent_mask(dev, mask);
+ return rc;
+}
+
extern u64 dma_get_required_mask(struct device *dev);
static inline unsigned int dma_get_max_seg_size(struct device *dev)
--
1.7.4.4
^ permalink raw reply related
* [PATCH 05/51] DMA-API: net: intel/igbvf: fix 32-bit DMA mask handling
From: Russell King @ 2013-09-19 21:29 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Alex Duyck, Don Skidmore, Peter P Waskiewicz Jr, Bruce Allan,
Jesse Brandeburg, Greg Rose, John Ronciak, Jeff Kirsher,
Carolyn Wyborny, Tushar Dave
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev, "No usable DMA "
"configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
drivers/net/ethernet/intel/igbvf/netdev.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index 93eb7ee..4e6b02f 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -2638,21 +2638,15 @@ static int igbvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
pci_using_dac = 0;
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
- err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
- if (!err)
- pci_using_dac = 1;
+ pci_using_dac = 1;
} else {
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
- err = dma_set_coherent_mask(&pdev->dev,
- DMA_BIT_MASK(32));
- if (err) {
- dev_err(&pdev->dev, "No usable DMA "
- "configuration, aborting\n");
- goto err_dma;
- }
+ dev_err(&pdev->dev, "No usable DMA "
+ "configuration, aborting\n");
+ goto err_dma;
}
}
--
1.7.4.4
^ permalink raw reply related
* [PATCH 02/51] DMA-API: net: brocade/bna/bnad.c: fix 32-bit DMA mask handling
From: Russell King @ 2013-09-19 21:26 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Rasesh Mody
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
The fallback to 32-bit DMA mask is rather odd:
if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) &&
!dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
*using_dac = true;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err)
goto release_regions;
}
This means we only try and set the coherent DMA mask if we failed to
set a 32-bit DMA mask, and only if both fail do we fail the driver.
Adjust this so that if either setting fails, we fail the driver - and
thereby end up properly setting both the DMA mask and the coherent
DMA mask in the fallback case.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
drivers/net/ethernet/brocade/bna/bnad.c | 13 ++++---------
1 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index b78e69e..45ce6e2 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -3300,17 +3300,12 @@ bnad_pci_init(struct bnad *bnad,
err = pci_request_regions(pdev, BNAD_NAME);
if (err)
goto disable_device;
- if (!dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) &&
- !dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64))) {
+ if (!dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) {
*using_dac = true;
} else {
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
- if (err) {
- err = dma_set_coherent_mask(&pdev->dev,
- DMA_BIT_MASK(32));
- if (err)
- goto release_regions;
- }
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+ if (err)
+ goto release_regions;
*using_dac = false;
}
pci_set_master(pdev);
--
1.7.4.4
^ permalink raw reply related
* [PATCH 06/51] DMA-API: net: intel/ixgb: fix 32-bit DMA mask handling
From: Russell King @ 2013-09-19 21:30 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Alex Duyck, Don Skidmore, Peter P Waskiewicz Jr, Bruce Allan,
Jesse Brandeburg, Greg Rose, John Ronciak, Jeff Kirsher,
Carolyn Wyborny, Tushar Dave
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
pr_err("No usable DMA configuration, aborting\n");
goto err_dma_mask;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
drivers/net/ethernet/intel/ixgb/ixgb_main.c | 16 +++++-----------
1 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index 9f6b236..57e390c 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -408,20 +408,14 @@ ixgb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
pci_using_dac = 0;
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
- err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
- if (!err)
- pci_using_dac = 1;
+ pci_using_dac = 1;
} else {
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
- err = dma_set_coherent_mask(&pdev->dev,
- DMA_BIT_MASK(32));
- if (err) {
- pr_err("No usable DMA configuration, aborting\n");
- goto err_dma_mask;
- }
+ pr_err("No usable DMA configuration, aborting\n");
+ goto err_dma_mask;
}
}
--
1.7.4.4
^ permalink raw reply related
* [PATCH 03/51] DMA-API: net: intel/e1000e: fix 32-bit DMA mask handling
From: Russell King @ 2013-09-19 21:27 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Alex Duyck, Don Skidmore, Peter P Waskiewicz Jr, Bruce Allan,
Jesse Brandeburg, Greg Rose, John Ronciak, Jeff Kirsher,
Carolyn Wyborny, Tushar Dave
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev,
"No usable DMA configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
drivers/net/ethernet/intel/e1000e/netdev.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e87e9b0..519e293 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6553,21 +6553,15 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
pci_using_dac = 0;
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
- err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
- if (!err)
- pci_using_dac = 1;
+ pci_using_dac = 1;
} else {
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
- err = dma_set_coherent_mask(&pdev->dev,
- DMA_BIT_MASK(32));
- if (err) {
- dev_err(&pdev->dev,
- "No usable DMA configuration, aborting\n");
- goto err_dma;
- }
+ dev_err(&pdev->dev,
+ "No usable DMA configuration, aborting\n");
+ goto err_dma;
}
}
--
1.7.4.4
^ permalink raw reply related
* [PATCH 04/51] DMA-API: net: intel/igb: fix 32-bit DMA mask handling
From: Russell King @ 2013-09-19 21:28 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
Cc: Alex Duyck, Don Skidmore, Peter P Waskiewicz Jr, Bruce Allan,
Jesse Brandeburg, Greg Rose, John Ronciak, Jeff Kirsher,
Carolyn Wyborny, Tushar Dave
In-Reply-To: <20130919212235.GD12758@n2100.arm.linux.org.uk>
The fallback to 32-bit DMA mask is rather odd:
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!err)
pci_using_dac = 1;
} else {
err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
err = dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32));
if (err) {
dev_err(&pdev->dev,
"No usable DMA configuration, aborting\n");
goto err_dma;
}
}
}
This means we only set the coherent DMA mask in the fallback path if
the DMA mask set failed, which is silly. This fixes it to set the
coherent DMA mask only if dma_set_mask() succeeded, and to error out
if either fails.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
drivers/net/ethernet/intel/igb/igb_main.c | 18 ++++++------------
1 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8cf44f2..7579383 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2034,21 +2034,15 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
return err;
pci_using_dac = 0;
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (!err) {
- err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
- if (!err)
- pci_using_dac = 1;
+ pci_using_dac = 1;
} else {
- err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
+ err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (err) {
- err = dma_set_coherent_mask(&pdev->dev,
- DMA_BIT_MASK(32));
- if (err) {
- dev_err(&pdev->dev,
- "No usable DMA configuration, aborting\n");
- goto err_dma;
- }
+ dev_err(&pdev->dev,
+ "No usable DMA configuration, aborting\n");
+ goto err_dma;
}
}
--
1.7.4.4
^ permalink raw reply related
* [PATCH 00/51] DMA mask changes
From: Russell King - ARM Linux @ 2013-09-19 21:22 UTC (permalink / raw)
To: alsa-devel, b43-dev, devel, devicetree, dri-devel, e1000-devel,
linux-arm-kernel, linux-crypto, linux-doc, linux-fbdev, linux-ide,
linux-media, linux-mmc, linux-nvme, linux-omap, linuxppc-dev,
linux-samsung-soc, linux-scsi, linux-tegra, linux-usb,
linux-wireless, netdev, Solarflare linux maintainers,
uclinux-dist-devel
This started out as a request to look at the DMA mask situation, and how
to solve the issues which we have on ARM - notably how the DMA mask
should be setup.
However, I started off reviewing how the dma_mask and coherent_dma_mask
was being used, and what I found was rather messy, and in some cases
rather buggy. I tried to get some of the bug fixes in before the last
merge window, but it seems that the maintainers preferred to have the
full solution rather than a simple -rc suitable bug fix.
So, this is an attempt to clean things up.
The first point here is that drivers performing DMA should be calling
dma_set_mask()/dma_set_coherent_mask() in their probe function to verify
that DMA can be performed. Lots of ARM drivers omit this step; please
refer to the DMA API documentation on this subject.
What this means is that the DMA mask provided by bus code is a default
value - nothing more. It doesn't have to accurately reflect what the
device is actually capable of. Apart from the storage for dev->dma_mask
being initialised for any device which is DMA capable, there is no other
initialisation which is strictly necessary at device creation time.
Now, these cleanups address two major areas:
1. The setting of DMA masks, particularly when both the coherent and
streaming DMA masks are set together.
2. The initialisation of DMA masks by drivers - this seems to be becoming
a popular habbit, one which may not be entirely the right solution.
Rather than having this scattered throughout the tree, I've pulled
that into a central location (and called it coercing the DMA mask -
because it really is about forcing the DMA mask to be that value.)
3. Finally, addressing the long held misbelief that DMA masks somehow
correspond with physical addresses. We already have established
long ago that dma_addr_t values returned from the DMA API are the
values which you program into the DMA controller, and so are the
bus addresses. It is _only_ sane that DMA masks are also bus
related too, and not related to physical address spaces.
(3) is a very important point for LPAE systems, which may still have
less than 4GB of memory, but this memory is all located above the 4GB
physical boundary. This means with the current model, any device
using a 32-bit DMA mask fails - even though the DMA controller is
still only a 32-bit DMA controller but the 32-bit bus addresses map
to system memory. To put it another way, the bus addresses have a
4GB physical offset on them.
This email is only being sent to the mailing lists in question, not to
anyone personally. The list of individuals is far to great to do that.
I'm hoping no mailing lists reject the patches based on the number of
recipients.
Patches based on v3.12-rc1.
Documentation/DMA-API-HOWTO.txt | 37 +++++++++------
Documentation/DMA-API.txt | 8 +++
arch/arm/include/asm/dma-mapping.h | 8 +++
arch/arm/mm/dma-mapping.c | 49 ++++++++++++++++++--
arch/arm/mm/init.c | 12 +++---
arch/arm/mm/mm.h | 2 +
arch/powerpc/kernel/vio.c | 3 +-
block/blk-settings.c | 8 ++--
drivers/amba/bus.c | 6 +--
drivers/ata/pata_ixp4xx_cf.c | 5 ++-
drivers/ata/pata_octeon_cf.c | 5 +-
drivers/block/nvme-core.c | 10 ++---
drivers/crypto/ixp4xx_crypto.c | 48 ++++++++++----------
drivers/dma/amba-pl08x.c | 5 ++
drivers/dma/dw/platform.c | 8 +--
drivers/dma/edma.c | 6 +--
drivers/dma/pl330.c | 4 ++
drivers/firmware/dcdbas.c | 23 +++++-----
drivers/firmware/google/gsmi.c | 13 +++--
drivers/gpu/drm/exynos/exynos_drm_drv.c | 6 ++-
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 5 +-
drivers/media/platform/omap3isp/isp.c | 6 +-
drivers/media/platform/omap3isp/isp.h | 3 -
drivers/mmc/card/queue.c | 3 +-
drivers/mmc/host/sdhci-acpi.c | 5 +-
drivers/net/ethernet/broadcom/b44.c | 3 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 8 +---
drivers/net/ethernet/brocade/bna/bnad.c | 13 ++----
drivers/net/ethernet/emulex/benet/be_main.c | 12 +----
drivers/net/ethernet/intel/e1000/e1000_main.c | 9 +---
drivers/net/ethernet/intel/e1000e/netdev.c | 18 +++-----
drivers/net/ethernet/intel/igb/igb_main.c | 18 +++-----
drivers/net/ethernet/intel/igbvf/netdev.c | 18 +++-----
drivers/net/ethernet/intel/ixgb/ixgb_main.c | 16 ++-----
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 ++----
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 15 ++----
drivers/net/ethernet/nxp/lpc_eth.c | 6 ++-
drivers/net/ethernet/octeon/octeon_mgmt.c | 5 +-
drivers/net/ethernet/sfc/efx.c | 12 +-----
drivers/net/wireless/b43/dma.c | 9 +---
drivers/net/wireless/b43legacy/dma.c | 9 +---
drivers/of/platform.c | 3 -
drivers/parport/parport_pc.c | 8 +++-
drivers/scsi/scsi_lib.c | 2 +-
drivers/staging/dwc2/platform.c | 5 +-
drivers/staging/et131x/et131x.c | 17 +------
drivers/staging/imx-drm/imx-drm-core.c | 8 +++-
drivers/staging/imx-drm/ipuv3-crtc.c | 4 +-
drivers/staging/media/dt3155v4l/dt3155v4l.c | 5 +--
drivers/usb/chipidea/ci_hdrc_imx.c | 7 +--
drivers/usb/dwc3/dwc3-exynos.c | 7 +--
drivers/usb/gadget/lpc32xx_udc.c | 4 +-
drivers/usb/host/bcma-hcd.c | 3 +-
drivers/usb/host/ehci-atmel.c | 7 +--
drivers/usb/host/ehci-octeon.c | 4 +-
drivers/usb/host/ehci-omap.c | 10 ++--
drivers/usb/host/ehci-orion.c | 7 +--
drivers/usb/host/ehci-platform.c | 10 ++--
drivers/usb/host/ehci-s5p.c | 7 +--
drivers/usb/host/ehci-spear.c | 7 +--
drivers/usb/host/ehci-tegra.c | 7 +--
drivers/usb/host/ohci-at91.c | 9 ++--
drivers/usb/host/ohci-exynos.c | 7 +--
drivers/usb/host/ohci-nxp.c | 5 +-
drivers/usb/host/ohci-octeon.c | 5 +-
drivers/usb/host/ohci-omap3.c | 10 ++--
drivers/usb/host/ohci-pxa27x.c | 8 ++--
drivers/usb/host/ohci-sa1111.c | 6 +++
drivers/usb/host/ohci-spear.c | 7 +--
drivers/usb/host/ssb-hcd.c | 3 +-
drivers/usb/host/uhci-platform.c | 7 +--
drivers/usb/musb/am35x.c | 50 +++++++--------------
drivers/usb/musb/da8xx.c | 49 +++++++-------------
drivers/usb/musb/davinci.c | 48 +++++++-------------
drivers/usb/musb/tusb6010.c | 49 +++++++-------------
drivers/video/amba-clcd.c | 5 ++
include/linux/amba/bus.h | 2 -
include/linux/dma-mapping.h | 31 +++++++++++++
sound/arm/pxa2xx-pcm.c | 9 +---
sound/soc/atmel/atmel-pcm.c | 11 ++---
sound/soc/blackfin/bf5xx-ac97-pcm.c | 11 ++---
sound/soc/blackfin/bf5xx-i2s-pcm.c | 10 ++---
sound/soc/davinci/davinci-pcm.c | 9 +---
sound/soc/fsl/fsl_dma.c | 9 +---
sound/soc/fsl/mpc5200_dma.c | 10 ++---
sound/soc/jz4740/jz4740-pcm.c | 12 ++---
sound/soc/kirkwood/kirkwood-dma.c | 9 +---
sound/soc/nuc900/nuc900-pcm.c | 9 ++--
sound/soc/omap/omap-pcm.c | 11 ++---
sound/soc/pxa/pxa2xx-pcm.c | 11 ++---
sound/soc/s6000/s6000-pcm.c | 9 +---
sound/soc/samsung/dma.c | 11 ++---
sound/soc/samsung/idma.c | 11 ++---
93 files changed, 493 insertions(+), 566 deletions(-)
^ permalink raw reply
* Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation
From: Scott Wood @ 2013-09-19 21:07 UTC (permalink / raw)
To: Bharat Bhushan; +Cc: kvm, agraf, kvm-ppc, Bharat Bhushan, paulus, linuxppc-dev
In-Reply-To: <1379570566-3715-6-git-send-email-Bharat.Bhushan@freescale.com>
On Thu, 2013-09-19 at 11:32 +0530, Bharat Bhushan wrote:
> On booke, "struct tlbe_ref" contains host tlb mapping information
> (pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
> for a guest tlb entry. So when a guest creates a TLB entry then
> "struct tlbe_ref" is set to point to valid "pfn" and set attributes in
> "flags" field of the above said structure. When a guest TLB entry is
> invalidated then flags field of corresponding "struct tlbe_ref" is
> updated to point that this is no more valid, also we selectively clear
> some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
> E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.
>
> Ideally we should clear complete "flags" as this entry is invalid and does not
> have anything to re-used. The other part of the problem is that when we use
> the same entry again then also we do not clear (started doing or-ing etc).
>
> So far it was working because the selectively clearing mentioned above
> actually clears "flags" what was set during TLB mapping. But the problem
> starts coming when we add more attributes to this then we need to selectively
> clear them and which is not needed.
>
> This patch we do both
> - Clear "flags" when invalidating;
> - Clear "flags" when reusing same entry later
>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v3-> v5
> - New patch (found this issue when doing vfio-pci development)
>
> arch/powerpc/kvm/e500_mmu_host.c | 12 +++++++-----
> 1 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..60f5a3c 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -217,7 +217,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
> }
> mb();
> vcpu_e500->g2h_tlb1_map[esel] = 0;
> - ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
> + /* Clear flags as TLB is not backed by the host anymore */
> + ref->flags = 0;
> local_irq_restore(flags);
> }
This breaks when you have both E500_TLB_BITMAP and E500_TLB_TLB0 set.
Instead, just convert the final E500_TLB_VALID clearing at the end into
ref->flags = 0, and convert the early return a few lines earlier into
conditional execution of the tlbil_one().
-Scott
^ permalink raw reply
* Re: [PATCH 1/2][v3] powerpc/fsl-booke: Add initial T104x_QDS board support
From: Timur Tabi @ 2013-09-19 20:32 UTC (permalink / raw)
To: Prabhakar Kushwaha
Cc: Scott Wood, Priyanka Jain, linuxppc-dev@lists.ozlabs.org,
Poonam Aggrwal
In-Reply-To: <1379581205-24424-1-git-send-email-prabhakar@freescale.com>
On Thu, Sep 19, 2013 at 4:00 AM, Prabhakar Kushwaha
<prabhakar@freescale.com> wrote:
> - Video
> - DIU supports video at up to 1280x1024x32bpp
You mention DIU support, except there's no DIU enablement in the
platform file. You need the T104x equivalent of
p1022ds_set_pixel_clock() and the other functions.
^ permalink raw reply
* Does iommu_init_table need to use GFP_ATOMIC allocations?
From: Nishanth Aravamudan @ 2013-09-19 16:50 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Paul Mackerras, linuxppc-dev, Thadeu Lima de Souza Cascardo,
Anton Blanchard
Under heavy (DLPAR?) stress, we tripped this panic() in
arch/powerpc/kernel/iommu.c::iommu_init_table():
page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz));
if (!page)
panic("iommu_init_table: Can't allocate %ld bytes\n",
sz);
Before the panic() we got a page allocation failure for an order-2
allocation. There appears to be memory free, but perhaps not in the
ATOMIC context. I looked through all the call-sites of
iommu_init_table() and didn't see any obvious reason to need an ATOMIC
allocation. Most call-sites in fact have an explicit GFP_KERNEL
allocation shortly before the call to iommu_init_table(), indicating we
are not in an atomic context. There is some indirection for some paths,
but I didn't see any locks indicating that GFP_KERNEL is inappropriate.
Does anyone know if/why ATOMIC allocations are necessary here?
Thanks,
Nish
^ permalink raw reply
* Re: Preliminary kexec support for Linux/m68k
From: Geert Uytterhoeven @ 2013-09-19 9:20 UTC (permalink / raw)
To: Anton Blanchard, Benjamin Herrenschmidt
Cc: linuxppc-dev@lists.ozlabs.org, linux-m68k, kexec,
linux-kernel@vger.kernel.org
In-Reply-To: <1379412095-7213-1-git-send-email-geert@linux-m68k.org>
On Tue, Sep 17, 2013 at 12:01 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> This is a preliminary set of patches to add kexec support for m68k.
> - [PATCH 1/3] m68k: Add preliminary kexec support
> - [PATCH 2/3] m68k: Add support to export bootinfo in procfs
> - [PATCH 3/3] [RFC] m68k: Add System RAM to /proc/iomem
>
> Notes:
> - The bootinfo is now saved and exported to /proc/bootinfo, so kexec-tools
> can read it and pass it (possibly after modification) to the new kernel.
> This is similar to /proc/atags on ARM.
> - I based [PATCH 3/3] on the PowerPC version, but it's no longer needed as we
> now get this information from the bootinfo.
> Does anyone think this is nice to have anyway?
It seems kexec/kdump on ppc don't use /proc/iomem anymore, and only rely on
/proc/device-tree these days?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH 2/2][v3] powerpc/configs: Enable T1040QDS by default in corenet
From: Prabhakar Kushwaha @ 2013-09-19 9:00 UTC (permalink / raw)
To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha
T1040 supports both 32 & 64 bit kernel.
so enable T1040QDS by default in the config files.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
Branch next
Changes for v2: Sending as it is
Changes for v3: Sending as it is
arch/powerpc/configs/corenet32_smp_defconfig | 1 +
arch/powerpc/configs/corenet64_smp_defconfig | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/powerpc/configs/corenet32_smp_defconfig b/arch/powerpc/configs/corenet32_smp_defconfig
index 3dfab4c..19d1d31 100644
--- a/arch/powerpc/configs/corenet32_smp_defconfig
+++ b/arch/powerpc/configs/corenet32_smp_defconfig
@@ -28,6 +28,7 @@ CONFIG_P3041_DS=y
CONFIG_P4080_DS=y
CONFIG_P5020_DS=y
CONFIG_P5040_DS=y
+CONFIG_T104x_QDS=y
CONFIG_HIGHMEM=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_MISC=m
diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index fa94fb3..d23ee10 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -24,6 +24,7 @@ CONFIG_MAC_PARTITION=y
CONFIG_B4_QDS=y
CONFIG_P5020_DS=y
CONFIG_P5040_DS=y
+CONFIG_T104x_QDS=y
CONFIG_T4240_QDS=y
# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set
CONFIG_BINFMT_MISC=m
--
1.7.9.5
^ permalink raw reply related
* [PATCH 1/2][v3] powerpc/fsl-booke: Add initial T104x_QDS board support
From: Prabhakar Kushwaha @ 2013-09-19 9:00 UTC (permalink / raw)
To: linuxppc-dev; +Cc: scottwood, Priyanka Jain, Poonam Aggrwal, Prabhakar Kushwaha
Add support for T104x board in board file t104x_qds.c, It is common for
both T1040 and T1042 as they share same QDS board.
T1040QDS board Overview
-----------------------
- SERDES Connections, 8 lanes supporting:
=E2=80=94 PCI Express: supporting Gen 1 and Gen 2;
=E2=80=94 SGMII
=E2=80=94 QSGMII
=E2=80=94 SATA 2.0
=E2=80=94 Aurora debug with dedicated connectors (T1040 only)
- DDR Controller
- Supports rates of up to 1600 MHz data-rate
- Supports one DDR3LP UDIMM/RDIMMs, of single-, dual- or quad-rank t=
ypes.
-IFC/Local Bus
- NAND flash: 8-bit, async, up to 2GB.
- NOR: 8-bit or 16-bit, non-multiplexed, up to 512MB
- GASIC: Simple (minimal) target within Qixis FPGA
- PromJET rapid memory download support
- Ethernet
- Two on-board RGMII 10/100/1G ethernet ports.
- PHY #0 remains powered up during deep-sleep (T1040 only)
- QIXIS System Logic FPGA
- Clocks
- System and DDR clock (SYSCLK, =E2=80=9CDDRCLK=E2=80=9D)
- SERDES clocks
- Power Supplies
- Video
- DIU supports video at up to 1280x1024x32bpp
- USB
- Supports two USB 2.0 ports with integrated PHYs
=E2=80=94 Two type A ports with 5V@1.5A per port.
=E2=80=94 Second port can be converted to OTG mini-AB
- SDHC
- SDHC port connects directly to an adapter card slot, featuring:
- Supporting SD slots for: SD, SDHC (1x, 4x, 8x) and/or MMC
=E2=80=94 Supporting eMMC memory devices
- SPI
- On-board support of 3 different devices and sizes
- Other IO
- Two Serial ports
- ProfiBus port
- Four I2C ports
Add T104xQDS support in Kconfig and Makefile. Also create device tree.
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.gi=
t
Branch next
Changes for v2: Incorporated Scott's comments
- Created t104xqds.dtsi, both t1040qds & t1042qds include it
- Updated get_irq=20
Changes for v3: Sending as it is
arch/powerpc/boot/dts/t1040qds.dts | 46 ++++++++
arch/powerpc/boot/dts/t1042qds.dts | 46 ++++++++
arch/powerpc/boot/dts/t104xqds.dtsi | 192 +++++++++++++++++++++++++=
++++++
arch/powerpc/platforms/85xx/Kconfig | 20 ++++
arch/powerpc/platforms/85xx/Makefile | 1 +
arch/powerpc/platforms/85xx/t104x_qds.c | 118 +++++++++++++++++++
6 files changed, 423 insertions(+)
create mode 100644 arch/powerpc/boot/dts/t1040qds.dts
create mode 100644 arch/powerpc/boot/dts/t1042qds.dts
create mode 100644 arch/powerpc/boot/dts/t104xqds.dtsi
create mode 100644 arch/powerpc/platforms/85xx/t104x_qds.c
diff --git a/arch/powerpc/boot/dts/t1040qds.dts b/arch/powerpc/boot/dts/t=
1040qds.dts
new file mode 100644
index 0000000..973c29c
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1040qds.dts
@@ -0,0 +1,46 @@
+/*
+ * T1040QDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyrig=
ht
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t104xsi-pre.dtsi"
+/include/ "t104xqds.dtsi"
+
+/ {
+ model =3D "fsl,T1040QDS";
+ compatible =3D "fsl,T1040QDS";
+ #address-cells =3D <2>;
+ #size-cells =3D <2>;
+ interrupt-parent =3D <&mpic>;
+};
+
+/include/ "fsl/t1040si-post.dtsi"
diff --git a/arch/powerpc/boot/dts/t1042qds.dts b/arch/powerpc/boot/dts/t=
1042qds.dts
new file mode 100644
index 0000000..45bd037
--- /dev/null
+++ b/arch/powerpc/boot/dts/t1042qds.dts
@@ -0,0 +1,46 @@
+/*
+ * T1042QDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyrig=
ht
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "fsl/t104xsi-pre.dtsi"
+/include/ "t104xqds.dtsi"
+
+/ {
+ model =3D "fsl,T1042QDS";
+ compatible =3D "fsl,T1042QDS";
+ #address-cells =3D <2>;
+ #size-cells =3D <2>;
+ interrupt-parent =3D <&mpic>;
+};
+
+/include/ "fsl/t1042si-post.dtsi"
diff --git a/arch/powerpc/boot/dts/t104xqds.dtsi b/arch/powerpc/boot/dts/=
t104xqds.dtsi
new file mode 100644
index 0000000..5a518b3
--- /dev/null
+++ b/arch/powerpc/boot/dts/t104xqds.dtsi
@@ -0,0 +1,192 @@
+/*
+ * T104xQDS Device Tree Source
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions ar=
e met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyrig=
ht
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission=
.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of th=
e
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor "AS IS" AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMP=
LIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AR=
E
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR A=
NY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DA=
MAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SE=
RVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS=
ED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR=
TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE=
OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/ {
+ model =3D "fsl,T1040QDS";
+ compatible =3D "fsl,T1040QDS";
+ #address-cells =3D <2>;
+ #size-cells =3D <2>;
+ interrupt-parent =3D <&mpic>;
+
+ ifc: localbus@ffe124000 {
+ reg =3D <0xf 0xfe124000 0 0x2000>;
+ ranges =3D <0 0 0xf 0xe8000000 0x08000000
+ 2 0 0xf 0xff800000 0x00010000
+ 3 0 0xf 0xffdf0000 0x00008000>;
+
+ nor@0,0 {
+ #address-cells =3D <1>;
+ #size-cells =3D <1>;
+ compatible =3D "cfi-flash";
+ reg =3D <0x0 0x0 0x8000000>;
+
+ bank-width =3D <2>;
+ device-width =3D <1>;
+ };
+
+ nand@2,0 {
+ #address-cells =3D <1>;
+ #size-cells =3D <1>;
+ compatible =3D "fsl,ifc-nand";
+ reg =3D <0x2 0x0 0x10000>;
+
+ partition@0 {
+ /* This location must not be altered */
+ /* 1MB for u-boot Bootloader Image */
+ reg =3D <0x0 0x00100000>;
+ label =3D "NAND U-Boot Image";
+ read-only;
+ };
+
+ partition@100000 {
+ /* 1MB for DTB Image */
+ reg =3D <0x00100000 0x00100000>;
+ label =3D "NAND DTB Image";
+ };
+
+ partition@200000 {
+ /* 10MB for Linux Kernel Image */
+ reg =3D <0x00200000 0x00A00000>;
+ label =3D "NAND Linux Kernel Image";
+ };
+
+ partition@C00000 {
+ /* 500MB for Root file System Image */
+ reg =3D <0x00c00000 0x1F400000>;
+ label =3D "NAND RFS Image";
+ };
+ };
+
+ board-control@3,0 {
+ #address-cells =3D <1>;
+ #size-cells =3D <1>;
+ compatible =3D "fsl,tetra-fpga", "fsl,fpga-qixis";
+ reg =3D <3 0 0x300>;
+ };
+ };
+
+ memory {
+ device_type =3D "memory";
+ };
+
+ dcsr: dcsr@f00000000 {
+ ranges =3D <0x00000000 0xf 0x00000000 0x01072000>;
+ };
+
+ soc: soc@ffe000000 {
+ ranges =3D <0x00000000 0xf 0xfe000000 0x1000000>;
+ reg =3D <0xf 0xfe000000 0 0x00001000>;
+ spi@110000 {
+ flash@0 {
+ #address-cells =3D <1>;
+ #size-cells =3D <1>;
+ compatible =3D "micron,n25q512a";
+ reg =3D <0>;
+ spi-max-frequency =3D <10000000>; /* input clock */
+ };
+ };
+
+ i2c@118000 {
+ pca9547@77 {
+ compatible =3D "philips,pca9547";
+ reg =3D <0x77>;
+ };
+ rtc@68 {
+ compatible =3D "dallas,ds3232";
+ reg =3D <0x68>;
+ interrupts =3D <0x1 0x1 0 0>;
+ };
+ };
+ };
+
+ pci0: pcie@ffe240000 {
+ reg =3D <0xf 0xfe240000 0 0x10000>;
+ ranges =3D <0x02000000 0 0xe0000000 0xc 0x00000000 0x0 0x10000000
+ 0x01000000 0 0x00000000 0xf 0xf8000000 0x0 0x00010000>;
+ pcie@0 {
+ ranges =3D <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x10000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+
+ pci1: pcie@ffe250000 {
+ reg =3D <0xf 0xfe250000 0 0x10000>;
+ ranges =3D <0x02000000 0x0 0xe0000000 0xc 0x20000000 0x0 0x10000000
+ 0x01000000 0x0 0x00000000 0xf 0xf8010000 0x0 0x00010000>;
+ pcie@0 {
+ ranges =3D <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x10000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+
+ pci2: pcie@ffe260000 {
+ reg =3D <0xf 0xfe260000 0 0x1000>;
+ ranges =3D <0x02000000 0 0xe0000000 0xc 0x40000000 0 0x10000000
+ 0x01000000 0 0x00000000 0xf 0xf8020000 0 0x00010000>;
+ pcie@0 {
+ ranges =3D <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x10000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+
+ pci3: pcie@ffe270000 {
+ reg =3D <0xf 0xfe270000 0 0x10000>;
+ ranges =3D <0x02000000 0 0xe0000000 0xc 0x60000000 0 0x10000000
+ 0x01000000 0 0x00000000 0xf 0xf8030000 0 0x00010000>;
+ pcie@0 {
+ ranges =3D <0x02000000 0 0xe0000000
+ 0x02000000 0 0xe0000000
+ 0 0x10000000
+
+ 0x01000000 0 0x00000000
+ 0x01000000 0 0x00000000
+ 0 0x00010000>;
+ };
+ };
+};
diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms=
/85xx/Kconfig
index de2eb93..81d97b5 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -295,6 +295,26 @@ config P5040_DS
help
This option enables support for the P5040 DS board
=20
+config T104x_QDS
+ bool "Freescale T104x QDS"
+ select DEFAULT_UIMAGE
+ select E500
+ select PPC_E500MC
+ select PHYS_64BIT
+ select SWIOTLB
+ select ARCH_REQUIRE_GPIOLIB
+ select GENERIC_GPIO
+ select HAS_RAPIDIO
+ select PPC_EPAPR_HV_PIC
+ select HAS_FSL_QBMAN
+ select MDIO_BUS_MUX if FSL_DPAA_ETH
+ select MDIO_BUS_MUX_MMIOREG if FSL_DPAA_ETH
+ help
+ This option enables support for the T04x QDS board
+ The T104x application development system T104x QDS is a complete
+ debugging environment intended for engineers developing
+ applications for the T1040/T1042.
+
config PPC_QEMU_E500
bool "QEMU generic e500 platform"
select DEFAULT_UIMAGE
diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platform=
s/85xx/Makefile
index 53c9f75..879c238 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_P3041_DS) +=3D p3041_ds.o corenet_ds.o
obj-$(CONFIG_P4080_DS) +=3D p4080_ds.o corenet_ds.o
obj-$(CONFIG_P5020_DS) +=3D p5020_ds.o corenet_ds.o
obj-$(CONFIG_P5040_DS) +=3D p5040_ds.o corenet_ds.o
+obj-$(CONFIG_T104x_QDS) +=3D t104x_qds.o corenet_ds.o
obj-$(CONFIG_T4240_QDS) +=3D t4240_qds.o corenet_ds.o
obj-$(CONFIG_B4_QDS) +=3D b4_qds.o corenet_ds.o
obj-$(CONFIG_STX_GP3) +=3D stx_gp3.o
diff --git a/arch/powerpc/platforms/85xx/t104x_qds.c b/arch/powerpc/platf=
orms/85xx/t104x_qds.c
new file mode 100644
index 0000000..547d44d
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/t104x_qds.c
@@ -0,0 +1,118 @@
+/*
+ * T104x QDS Setup
+ * Should apply for QDS platform of T1040 and it's personalities.
+ * viz T1040/T1042
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify =
it
+ * under the terms of the GNU General Public License as published by t=
he
+ * Free Software Foundation; either version 2 of the License, or (at y=
our
+ * option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/kdev_t.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/phy.h>
+
+#include <asm/time.h>
+#include <asm/machdep.h>
+#include <asm/pci-bridge.h>
+#include <mm/mmu_decl.h>
+#include <asm/prom.h>
+#include <asm/udbg.h>
+#include <asm/mpic.h>
+
+#include <linux/of_platform.h>
+#include <sysdev/fsl_soc.h>
+#include <sysdev/fsl_pci.h>
+#include <asm/ehv_pic.h>
+
+#include "corenet_ds.h"
+
+/*
+ * Called very early, device-tree isn't unflattened
+ */
+static int __init t104x_qds_probe(void)
+{
+ unsigned long root =3D of_get_flat_dt_root();
+#ifdef CONFIG_SMP
+ extern struct smp_ops_t smp_85xx_ops;
+#endif
+
+ if (of_flat_dt_is_compatible(root, "fsl,T1040QDS") ||
+ of_flat_dt_is_compatible(root, "fsl,T1042QDS"))
+
+ return 1;
+
+ /* Check if we're running under the Freescale hypervisor */
+ if (of_flat_dt_is_compatible(root, "fsl,T1040QDS-hv") ||
+ of_flat_dt_is_compatible(root, "fsl,T1042QDS-hv")) {
+ ppc_md.init_IRQ =3D ehv_pic_init;
+ ppc_md.get_irq =3D ehv_pic_get_irq;
+ ppc_md.restart =3D fsl_hv_restart;
+ ppc_md.power_off =3D fsl_hv_halt;
+ ppc_md.halt =3D fsl_hv_halt;
+#ifdef CONFIG_SMP
+ /*
+ * Disable the timebase sync operations because we can't write
+ * to the timebase registers under the hypervisor.
+ */
+ smp_85xx_ops.give_timebase =3D NULL;
+ smp_85xx_ops.take_timebase =3D NULL;
+#endif
+
+ return 1;
+ }
+
+ return 0;
+}
+
+define_machine(t1042_qds) {
+ .name =3D "T1042 QDS",
+ .probe =3D t104x_qds_probe,
+ .setup_arch =3D corenet_ds_setup_arch,
+ .init_IRQ =3D corenet_ds_pic_init,
+#ifdef CONFIG_PCI
+ .pcibios_fixup_bus =3D fsl_pcibios_fixup_bus,
+#endif
+/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
+ .get_irq =3D mpic_get_coreint_irq,
+ .restart =3D fsl_rstcr_restart,
+ .calibrate_decr =3D generic_calibrate_decr,
+ .progress =3D udbg_progress,
+#ifdef CONFIG_PPC64
+ .power_save =3D book3e_idle,
+#else
+ .power_save =3D e500_idle,
+#endif
+};
+
+define_machine(t1040_qds) {
+ .name =3D "T1040 QDS",
+ .probe =3D t104x_qds_probe,
+ .setup_arch =3D corenet_ds_setup_arch,
+ .init_IRQ =3D corenet_ds_pic_init,
+#ifdef CONFIG_PCI
+ .pcibios_fixup_bus =3D fsl_pcibios_fixup_bus,
+#endif
+/* coreint doesn't play nice with lazy EE, use legacy mpic for now */
+ .get_irq =3D mpic_get_coreint_irq,
+ .restart =3D fsl_rstcr_restart,
+ .calibrate_decr =3D generic_calibrate_decr,
+ .progress =3D udbg_progress,
+#ifdef CONFIG_PPC64
+ .power_save =3D book3e_idle,
+#else
+ .power_save =3D e500_idle,
+#endif
+};
+
+machine_arch_initcall(t104x_qds, corenet_ds_publish_devices);
+
+#ifdef CONFIG_SWIOTLB
+machine_arch_initcall(t104x_qds, swiotlb_setup_bus_notifier);
+#endif
--=20
1.7.9.5
^ permalink raw reply related
* [PATCH][v3] powerpc/mpc85xx:Add initial device tree support of T104x
From: Prabhakar Kushwaha @ 2013-09-19 8:59 UTC (permalink / raw)
To: linuxppc-dev
Cc: scottwood, Priyanka Jain, Poonam Aggrwal, Prabhakar Kushwaha,
Varun Sethi
The QorIQ T1040/T1042 processor support four integrated 64-bit e5500 PA
processor cores with high-performance data path acceleration architecture
and network peripheral interfaces required for networking & telecommunications.
T1042 personality is a reduced personality of T1040 without Integrated 8-port
Gigabit Ethernet switch.
The T1040/T1042 SoC includes the following function and features:
- Four e5500 cores, each with a private 256 KB L2 cache
- 256 KB shared L3 CoreNet platform cache (CPC)
- Interconnect CoreNet platform
- 32-/64-bit DDR3L/DDR4 SDRAM memory controller with ECC and interleaving
support
- Data Path Acceleration Architecture (DPAA) incorporating acceleration
for the following functions:
- Packet parsing, classification, and distribution
- Queue management for scheduling, packet sequencing, and congestion
management
- Cryptography Acceleration (SEC 5.0)
- RegEx Pattern Matching Acceleration (PME 2.2)
- IEEE Std 1588 support
- Hardware buffer management for buffer allocation and deallocation
- Ethernet interfaces
- Integrated 8-port Gigabit Ethernet switch (T1040 only)
- Four 1 Gbps Ethernet controllers
- Two RGMII interfaces or one RGMII and one MII interfaces
- High speed peripheral interfaces
- Four PCI Express 2.0 controllers running at up to 5 GHz
- Two SATA controllers supporting 1.5 and 3.0 Gb/s operation
- Upto two QSGMII interface
- Upto six SGMII interface supporting 1000 Mbps
- One SGMII interface supporting upto 2500 Mbps
- Additional peripheral interfaces
- Two USB 2.0 controllers with integrated PHY
- SD/eSDHC/eMMC
- eSPI controller
- Four I2C controllers
- Four UARTs
- Four GPIO controllers
- Integrated flash controller (IFC)
- Change this to LCD/ HDMI interface (DIU) with 12 bit dual data rate
- TDM interface
- Multicore programmable interrupt controller (PIC)
- Two 8-channel DMA engines
- Single source clocking implementation
- Deep Sleep power implementaion (wakeup from GPIO/Timer/Ethernet/USB)
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
Branch next
Changes for v2: Incorporated Scott's comments
- Update t1040si-post.dtsi
- update clock device tree node as per
http://patchwork.ozlabs.org/patch/274134/
- removed DMA node, It will be added later as per
http://patchwork.ozlabs.org/patch/271238/
- Updated display compatible field
Changes for v3: Incorporated Scott's comments
- Updated soc compatible field
- updated clock compatible field
arch/powerpc/boot/dts/fsl/t1040si-post.dtsi | 423 +++++++++++++++++++++++++++
arch/powerpc/boot/dts/fsl/t1042si-post.dtsi | 41 +++
arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi | 109 +++++++
3 files changed, 573 insertions(+)
create mode 100644 arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
create mode 100644 arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
create mode 100644 arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi
diff --git a/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
new file mode 100644
index 0000000..b16b528
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t1040si-post.dtsi
@@ -0,0 +1,423 @@
+/*
+ * T1040 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+&ifc {
+ #address-cells = <2>;
+ #size-cells = <1>;
+ compatible = "fsl,ifc", "simple-bus";
+ interrupts = <25 2 0 0>;
+};
+
+&pci0 {
+ compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0x0 0xff>;
+ interrupts = <20 2 0 0>;
+ fsl,iommu-parent = <&pamu0>;
+ pcie@0 {
+ reg = <0 0 0 0 0>;
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <20 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 40 1 0 0
+ 0000 0 0 2 &mpic 1 1 0 0
+ 0000 0 0 3 &mpic 2 1 0 0
+ 0000 0 0 4 &mpic 3 1 0 0
+ >;
+ };
+};
+
+&pci1 {
+ compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0 0xff>;
+ interrupts = <21 2 0 0>;
+ fsl,iommu-parent = <&pamu0>;
+ pcie@0 {
+ reg = <0 0 0 0 0>;
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <21 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 41 1 0 0
+ 0000 0 0 2 &mpic 5 1 0 0
+ 0000 0 0 3 &mpic 6 1 0 0
+ 0000 0 0 4 &mpic 7 1 0 0
+ >;
+ };
+};
+
+&pci2 {
+ compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0x0 0xff>;
+ interrupts = <22 2 0 0>;
+ fsl,iommu-parent = <&pamu0>;
+ pcie@0 {
+ reg = <0 0 0 0 0>;
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <22 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 42 1 0 0
+ 0000 0 0 2 &mpic 9 1 0 0
+ 0000 0 0 3 &mpic 10 1 0 0
+ 0000 0 0 4 &mpic 11 1 0 0
+ >;
+ };
+};
+
+&pci3 {
+ compatible = "fsl,t1040-pcie", "fsl,qoriq-pcie-v2.4", "fsl,qoriq-pcie";
+ device_type = "pci";
+ #size-cells = <2>;
+ #address-cells = <3>;
+ bus-range = <0x0 0xff>;
+ interrupts = <23 2 0 0>;
+ fsl,iommu-parent = <&pamu0>;
+ pcie@0 {
+ reg = <0 0 0 0 0>;
+ #interrupt-cells = <1>;
+ #size-cells = <2>;
+ #address-cells = <3>;
+ device_type = "pci";
+ interrupts = <23 2 0 0>;
+ interrupt-map-mask = <0xf800 0 0 7>;
+ interrupt-map = <
+ /* IDSEL 0x0 */
+ 0000 0 0 1 &mpic 43 1 0 0
+ 0000 0 0 2 &mpic 0 1 0 0
+ 0000 0 0 3 &mpic 4 1 0 0
+ 0000 0 0 4 &mpic 8 1 0 0
+ >;
+ };
+};
+
+&dcsr {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "fsl,dcsr", "simple-bus";
+
+ dcsr-epu@0 {
+ compatible = "fsl,t1040-dcsr-epu", "fsl,dcsr-epu";
+ interrupts = <52 2 0 0
+ 84 2 0 0
+ 85 2 0 0>;
+ reg = <0x0 0x1000>;
+ };
+ dcsr-npc {
+ compatible = "fsl,t1040-dcsr-cnpc", "fsl,dcsr-cnpc";
+ reg = <0x1000 0x1000 0x1002000 0x10000>;
+ };
+ dcsr-nxc@2000 {
+ compatible = "fsl,dcsr-nxc";
+ reg = <0x2000 0x1000>;
+ };
+ dcsr-corenet {
+ compatible = "fsl,dcsr-corenet";
+ reg = <0x8000 0x1000 0x1A000 0x1000>;
+ };
+ dcsr-dpaa@9000 {
+ compatible = "fsl,t1040-dcsr-dpaa", "fsl,dcsr-dpaa";
+ reg = <0x9000 0x1000>;
+ };
+ dcsr-ocn@11000 {
+ compatible = "fsl,t1040-dcsr-ocn", "fsl,dcsr-ocn";
+ reg = <0x11000 0x1000>;
+ };
+ dcsr-ddr@12000 {
+ compatible = "fsl,dcsr-ddr";
+ dev-handle = <&ddr1>;
+ reg = <0x12000 0x1000>;
+ };
+ dcsr-nal@18000 {
+ compatible = "fsl,t1040-dcsr-nal", "fsl,dcsr-nal";
+ reg = <0x18000 0x1000>;
+ };
+ dcsr-rcpm@22000 {
+ compatible = "fsl,t1040-dcsr-rcpm", "fsl,dcsr-rcpm";
+ reg = <0x22000 0x1000>;
+ };
+ dcsr-snpc@30000 {
+ compatible = "fsl,t1040-dcsr-snpc", "fsl,dcsr-snpc";
+ reg = <0x30000 0x1000 0x1022000 0x10000>;
+ };
+ dcsr-snpc@31000 {
+ compatible = "fsl,t1040-dcsr-snpc", "fsl,dcsr-snpc";
+ reg = <0x31000 0x1000 0x1042000 0x10000>;
+ };
+ dcsr-cpu-sb-proxy@100000 {
+ compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+ cpu-handle = <&cpu0>;
+ reg = <0x100000 0x1000 0x101000 0x1000>;
+ };
+ dcsr-cpu-sb-proxy@108000 {
+ compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+ cpu-handle = <&cpu1>;
+ reg = <0x108000 0x1000 0x109000 0x1000>;
+ };
+ dcsr-cpu-sb-proxy@110000 {
+ compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+ cpu-handle = <&cpu2>;
+ reg = <0x110000 0x1000 0x111000 0x1000>;
+ };
+ dcsr-cpu-sb-proxy@118000 {
+ compatible = "fsl,dcsr-e5500-sb-proxy", "fsl,dcsr-cpu-sb-proxy";
+ cpu-handle = <&cpu3>;
+ reg = <0x118000 0x1000 0x119000 0x1000>;
+ };
+};
+
+&soc {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ device_type = "soc";
+ compatible = "simple-bus";
+
+ soc-sram-error {
+ compatible = "fsl,soc-sram-error";
+ interrupts = <16 2 1 29>;
+ };
+
+ corenet-law@0 {
+ compatible = "fsl,corenet-law";
+ reg = <0x0 0x1000>;
+ fsl,num-laws = <16>;
+ };
+
+ ddr1: memory-controller@8000 {
+ compatible = "fsl,qoriq-memory-controller-v5.0",
+ "fsl,qoriq-memory-controller";
+ reg = <0x8000 0x1000>;
+ interrupts = <16 2 1 23>;
+ };
+
+ cpc: l3-cache-controller@10000 {
+ compatible = "fsl,t1040-l3-cache-controller", "cache";
+ reg = <0x10000 0x1000>;
+ interrupts = <16 2 1 27>;
+ };
+
+ corenet-cf@18000 {
+ compatible = "fsl,corenet2-cf";
+ reg = <0x18000 0x1000>;
+ interrupts = <16 2 1 31>;
+ fsl,ccf-num-csdids = <32>;
+ fsl,ccf-num-snoopids = <32>;
+ };
+
+ iommu@20000 {
+ compatible = "fsl,pamu-v1.0", "fsl,pamu";
+ reg = <0x20000 0x1000>;
+ ranges = <0 0x20000 0x1000>;
+ #address-cells = <1>;
+ #size-cells = <1>;
+ interrupts = <
+ 24 2 0 0
+ 16 2 1 30>;
+ pamu0: pamu@0 {
+ reg = <0 0x1000>;
+ fsl,primary-cache-geometry = <128 1>;
+ fsl,secondary-cache-geometry = <16 2>;
+ };
+ };
+
+/include/ "qoriq-mpic.dtsi"
+
+ guts: global-utilities@e0000 {
+ compatible = "fsl,t1040-device-config", "fsl,qoriq-device-config-2.0";
+ reg = <0xe0000 0xe00>;
+ fsl,has-rstcr;
+ fsl,liodn-bits = <12>;
+ };
+
+ clockgen: global-utilities@e1000 {
+ compatible = "fsl,t1040-clockgen", "fsl,qoriq-clockgen-2.0",
+ "fixed-clock";
+ reg = <0xe1000 0x1000>;
+ clock-output-names = "sysclk";
+ #clock-cells = <0>;
+
+ #address-cells = <1>;
+ #size-cells = <1>;
+ pll0: pll0@800 {
+ #clock-cells = <1>;
+ reg = <0x800 4>;
+ compatible = "fsl,qoriq-core-pll-2.0";
+ clocks = <&clockgen>;
+ clock-output-names = "pll0", "pll0-div2", "pll0-div4";
+ };
+ pll1: pll1@820 {
+ #clock-cells = <1>;
+ reg = <0x820 4>;
+ compatible = "fsl,qoriq-core-pll-2.0";
+ clocks = <&clockgen>;
+ clock-output-names = "pll1", "pll1-div2", "pll1-div4";
+ };
+ mux0: mux0@0 {
+ #clock-cells = <0>;
+ reg = <0x0 4>;
+ compatible = "fsl,core-mux-clock";
+ clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+ <&pll1 0>, <&pll1 1>, <&pll1 2>;
+ clock-names = "pll0_0", "pll0_1", "pll0_2",
+ "pll1_0", "pll1_1", "pll1_2";
+ clock-output-names = "cmux0";
+ };
+ mux1: mux1@20 {
+ #clock-cells = <0>;
+ reg = <0x20 4>;
+ compatible = "fsl,core-mux-clock";
+ clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+ <&pll1 0>, <&pll1 1>, <&pll1 2>;
+ clock-names = "pll0_0", "pll0_1", "pll0_2",
+ "pll1_0", "pll1_1", "pll1_2";
+ clock-output-names = "cmux1";
+ };
+ mux2: mux2@40 {
+ #clock-cells = <0>;
+ reg = <0x40 4>;
+ compatible = "fsl,core-mux-clock";
+ clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+ <&pll1 0>, <&pll1 1>, <&pll1 2>;
+ clock-names = "pll0_0", "pll0_1", "pll0_2",
+ "pll1_0", "pll1_1", "pll1_2";
+ clock-output-names = "cmux2";
+ };
+ mux3: mux3@60 {
+ #clock-cells = <0>;
+ reg = <0x60 4>;
+ compatible = "fsl,core-mux-clock";
+ clocks = <&pll0 0>, <&pll0 1>, <&pll0 2>,
+ <&pll1 0>, <&pll1 1>, <&pll1 2>;
+ clock-names = "pll0_0", "pll0_1", "pll0_2",
+ "pll1_0", "pll1_1", "pll1_2";
+ clock-output-names = "cmux3";
+ };
+ };
+
+ rcpm: global-utilities@e2000 {
+ compatible = "fsl,t1040-rcpm", "fsl,qoriq-rcpm-2.0";
+ reg = <0xe2000 0x1000>;
+ };
+
+ sfp: sfp@e8000 {
+ compatible = "fsl,t1040-sfp";
+ reg = <0xe8000 0x1000>;
+ };
+
+ serdes: serdes@ea000 {
+ compatible = "fsl,t1040-serdes";
+ reg = <0xea000 0x4000>;
+ };
+
+/include/ "qoriq-espi-0.dtsi"
+ spi@110000 {
+ fsl,espi-num-chipselects = <4>;
+ };
+
+/include/ "qoriq-esdhc-0.dtsi"
+ sdhc@114000 {
+ compatible = "fsl,t1040-esdhc", "fsl,esdhc";
+ fsl,iommu-parent = <&pamu0>;
+ fsl,liodn-reg = <&guts 0x530>; /* eSDHCLIODNR */
+ sdhci,auto-cmd12;
+ };
+/include/ "qoriq-i2c-0.dtsi"
+/include/ "qoriq-i2c-1.dtsi"
+/include/ "qoriq-duart-0.dtsi"
+/include/ "qoriq-duart-1.dtsi"
+/include/ "qoriq-gpio-0.dtsi"
+/include/ "qoriq-gpio-1.dtsi"
+/include/ "qoriq-gpio-2.dtsi"
+/include/ "qoriq-gpio-3.dtsi"
+/include/ "qoriq-usb2-mph-0.dtsi"
+ usb0: usb@210000 {
+ compatible = "fsl-usb2-mph-v2.4", "fsl-usb2-mph";
+ fsl,iommu-parent = <&pamu0>;
+ fsl,liodn-reg = <&guts 0x520>; /* USB1LIODNR */
+ phy_type = "utmi";
+ port0;
+ };
+/include/ "qoriq-usb2-dr-0.dtsi"
+ usb1: usb@211000 {
+ compatible = "fsl-usb2-dr-v2.4", "fsl-usb2-dr";
+ fsl,iommu-parent = <&pamu0>;
+ fsl,liodn-reg = <&guts 0x524>; /* USB2LIODNR */
+ dr_mode = "host";
+ phy_type = "utmi";
+ };
+
+ display@180000 {
+ compatible = "fsl,t1040-diu", "fsl,diu";
+ reg = <0x180000 1000>;
+ interrupts = <74 2 0 0>;
+ };
+
+/include/ "qoriq-sata2-0.dtsi"
+sata@220000 {
+ fsl,iommu-parent = <&pamu0>;
+ fsl,liodn-reg = <&guts 0x550>; /* SATA1LIODNR */
+};
+/include/ "qoriq-sata2-1.dtsi"
+sata@221000 {
+ fsl,iommu-parent = <&pamu0>;
+ fsl,liodn-reg = <&guts 0x554>; /* SATA2LIODNR */
+};
+/include/ "qoriq-sec5.0-0.dtsi"
+
+l2switch@800000 {
+ compatible = "fsl,t1040-l2s";
+ reg = <0x800000 0x400000>;
+};
+};
diff --git a/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
new file mode 100644
index 0000000..cc8f133
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t1042si-post.dtsi
@@ -0,0 +1,41 @@
+/*
+ * T1042 Silicon/SoC Device Tree Source (post include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/include/ "t1040si-post.dtsi"
+
+&soc {
+ l2switch@800000 {
+ status = "disabled";
+ };
+};
diff --git a/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi b/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi
new file mode 100644
index 0000000..5cd8cc3
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/t104xsi-pre.dtsi
@@ -0,0 +1,109 @@
+/*
+ * T1040/T1042 Silicon/SoC Device Tree Source (pre include)
+ *
+ * Copyright 2013 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ * names of its contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/dts-v1/;
+
+/include/ "e5500_power_isa.dtsi"
+
+/ {
+ compatible = "fsl,T104x";
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&mpic>;
+
+ aliases {
+ ccsr = &soc;
+ dcsr = &dcsr;
+
+ serial0 = &serial0;
+ serial1 = &serial1;
+ serial2 = &serial2;
+ serial3 = &serial3;
+ pci0 = &pci0;
+ pci1 = &pci1;
+ pci2 = &pci2;
+ pci3 = &pci3;
+ usb0 = &usb0;
+ usb1 = &usb1;
+ sdhc = &sdhc;
+
+ crypto = &crypto;
+
+ };
+
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ cpu0: PowerPC,e5500@0 {
+ device_type = "cpu";
+ reg = <0>;
+ clocks = <&mux0>;
+ next-level-cache = <&L2_1>;
+ L2_1: l2-cache {
+ next-level-cache = <&cpc>;
+ };
+ };
+ cpu1: PowerPC,e5500@1 {
+ device_type = "cpu";
+ reg = <1>;
+ clocks = <&mux1>;
+ next-level-cache = <&L2_2>;
+ L2_2: l2-cache {
+ next-level-cache = <&cpc>;
+ };
+
+ };
+ cpu2: PowerPC,e5500@2 {
+ device_type = "cpu";
+ reg = <2>;
+ clocks = <&mux2>;
+ next-level-cache = <&L2_3>;
+ L2_3: l2-cache {
+ next-level-cache = <&cpc>;
+ };
+
+ };
+ cpu3: PowerPC,e5500@3 {
+ device_type = "cpu";
+ reg = <3>;
+ clocks = <&mux3>;
+ next-level-cache = <&L2_4>;
+ L2_4: l2-cache {
+ next-level-cache = <&cpc>;
+ };
+ };
+
+ };
+};
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
From: Anshuman Khandual @ 2013-09-19 8:41 UTC (permalink / raw)
To: Sukadev Bhattiprolu
Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
Stephane Eranian
In-Reply-To: <1379119755-21025-9-git-send-email-sukadev@linux.vnet.ibm.com>
On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
> + struct pt_regs *regs)
> +{
> + u64 idx;
> + u64 mmcra = regs->dsisr;
> + u64 addr;
> + int ret;
> + unsigned int instr;
> +
> + if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
> + idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
> + idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
> +
> + dsrc->val |= dcache_src_map[idx];
> + return;
> + }
> +
> + instr = 0;
> + addr = perf_instruction_pointer(regs);
> +
> + if (is_kernel_addr(addr))
> + instr = *(unsigned int *)addr;
> + else {
> + pagefault_disable();
> + ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
> + pagefault_enable();
> + if (ret)
> + instr = 0;
> + }
> + if (instr && instr_is_load_store(&instr))
Wondering if there is any possibility of getting positive values for
"(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
bit set. In that case we should actually compute dsrc->val as in the previous
case. I did couple of experiments on a P7 box, but was not able to find a
instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.
^ permalink raw reply
* [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
This patch adds vfio iommu support for Freescale IOMMU
(PAMU - Peripheral Access Management Unit).
The Freescale PAMU is an aperture-based IOMMU with the following
characteristics. Each device has an entry in a table in memory
describing the iova->phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
is naturally aligned
-has 1 or more windows within the aperture
-number of windows must be power of 2, max is 256
-size of each window is determined by aperture size / # of windows
-iova of each window is determined by aperture start iova / # of windows
-the mapped region in each window can be different than
the window size...mapping must power of 2
-physical address of the mapping must be naturally aligned
with the mapping size
Some of the code is derived from TYPE1 iommu (driver/vfio/vfio_iommu_type1.c).
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
drivers/vfio/Kconfig | 6 +
drivers/vfio/Makefile | 1 +
drivers/vfio/vfio_iommu_fsl_pamu.c | 952 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/vfio.h | 100 ++++
4 files changed, 1059 insertions(+), 0 deletions(-)
create mode 100644 drivers/vfio/vfio_iommu_fsl_pamu.c
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 26b3d9d..7d1da26 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
depends on VFIO && SPAPR_TCE_IOMMU
default n
+config VFIO_IOMMU_FSL_PAMU
+ tristate
+ depends on VFIO
+ default n
+
menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
select VFIO_IOMMU_TYPE1 if X86
select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
+ select VFIO_IOMMU_FSL_PAMU if FSL_PAMU
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/vfio.txt for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index c5792ec..7461350 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_VFIO) += vfio.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o vfio_iommu_type1.o
obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o vfio_iommu_spapr_tce.o
+obj-$(CONFIG_VFIO_IOMMU_FSL_PAMU) += vfio_iommu_common.o vfio_iommu_fsl_pamu.o
obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_fsl_pamu.c b/drivers/vfio/vfio_iommu_fsl_pamu.c
new file mode 100644
index 0000000..b29365f
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_fsl_pamu.c
@@ -0,0 +1,952 @@
+/*
+ * VFIO: IOMMU DMA mapping support for FSL PAMU IOMMU
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright (C) 2013 Freescale Semiconductor, Inc.
+ *
+ * Author: Bharat Bhushan <bharat.bhushan@freescale.com>
+ *
+ * This file is derived from driver/vfio/vfio_iommu_type1.c
+ *
+ * The Freescale PAMU is an aperture-based IOMMU with the following
+ * characteristics. Each device has an entry in a table in memory
+ * describing the iova->phys mapping. The mapping has:
+ * -an overall aperture that is power of 2 sized, and has a start iova that
+ * is naturally aligned
+ * -has 1 or more windows within the aperture
+ * -number of windows must be power of 2, max is 256
+ * -size of each window is determined by aperture size / # of windows
+ * -iova of each window is determined by aperture start iova / # of windows
+ * -the mapped region in each window can be different than
+ * the window size...mapping must power of 2
+ * -physical address of the mapping must be naturally aligned
+ * with the mapping size
+ */
+
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/pci.h> /* pci_bus_type */
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/workqueue.h>
+#include <linux/hugetlb.h>
+#include <linux/msi.h>
+#include <asm/fsl_pamu_stash.h>
+
+#include "vfio_iommu_common.h"
+
+#define DRIVER_VERSION "0.1"
+#define DRIVER_AUTHOR "Bharat Bhushan <bharat.bhushan@freescale.com>"
+#define DRIVER_DESC "FSL PAMU IOMMU driver for VFIO"
+
+struct vfio_iommu {
+ struct iommu_domain *domain;
+ struct mutex lock;
+ dma_addr_t aperture_start;
+ dma_addr_t aperture_end;
+ dma_addr_t page_size; /* Maximum mapped Page size */
+ int nsubwindows; /* Number of subwindows */
+ struct rb_root dma_list;
+ struct list_head msi_dma_list;
+ struct list_head group_list;
+};
+
+struct vfio_dma {
+ struct rb_node node;
+ dma_addr_t iova; /* Device address */
+ unsigned long vaddr; /* Process virtual addr */
+ size_t size; /* Number of pages */
+ int prot; /* IOMMU_READ/WRITE */
+};
+
+struct vfio_msi_dma {
+ struct list_head next;
+ dma_addr_t iova; /* Device address */
+ int bank_id;
+ int prot; /* IOMMU_READ/WRITE */
+};
+
+struct vfio_group {
+ struct iommu_group *iommu_group;
+ struct list_head next;
+};
+
+static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
+ dma_addr_t start, size_t size)
+{
+ struct rb_node *node = iommu->dma_list.rb_node;
+
+ while (node) {
+ struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
+
+ if (start + size <= dma->iova)
+ node = node->rb_left;
+ else if (start >= dma->iova + dma->size)
+ node = node->rb_right;
+ else
+ return dma;
+ }
+
+ return NULL;
+}
+
+static void vfio_insert_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
+{
+ struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
+ struct vfio_dma *dma;
+
+ while (*link) {
+ parent = *link;
+ dma = rb_entry(parent, struct vfio_dma, node);
+
+ if (new->iova + new->size <= dma->iova)
+ link = &(*link)->rb_left;
+ else
+ link = &(*link)->rb_right;
+ }
+
+ rb_link_node(&new->node, parent, link);
+ rb_insert_color(&new->node, &iommu->dma_list);
+}
+
+static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
+{
+ rb_erase(&old->node, &iommu->dma_list);
+}
+
+static int iova_to_win(struct vfio_iommu *iommu, dma_addr_t iova)
+{
+ u64 offset = iova - iommu->aperture_start;
+ do_div(offset, iommu->page_size);
+ return (int) offset;
+}
+
+static int vfio_disable_iommu_domain(struct vfio_iommu *iommu)
+{
+ int enable = 0;
+ return iommu_domain_set_attr(iommu->domain,
+ DOMAIN_ATTR_FSL_PAMU_ENABLE, &enable);
+}
+
+static int vfio_enable_iommu_domain(struct vfio_iommu *iommu)
+{
+ int enable = 1;
+ return iommu_domain_set_attr(iommu->domain,
+ DOMAIN_ATTR_FSL_PAMU_ENABLE, &enable);
+}
+
+/* Unmap DMA region */
+static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
+ dma_addr_t iova, size_t *size)
+{
+ dma_addr_t start = iova;
+ int win, win_start, win_end;
+ long unlocked = 0;
+ unsigned int nr_pages;
+
+ nr_pages = iommu->page_size / PAGE_SIZE;
+ win_start = iova_to_win(iommu, iova);
+ win_end = iova_to_win(iommu, iova + *size - 1);
+
+ /* Release the pinned pages */
+ for (win = win_start; win <= win_end; iova += iommu->page_size, win++) {
+ unsigned long pfn;
+
+ pfn = iommu_iova_to_phys(iommu->domain, iova) >> PAGE_SHIFT;
+ if (!pfn)
+ continue;
+
+ iommu_domain_window_disable(iommu->domain, win);
+
+ unlocked += vfio_unpin_pages(pfn, nr_pages, dma->prot, 1);
+ }
+
+ vfio_lock_acct(-unlocked);
+ *size = iova - start;
+ return 0;
+}
+
+static int vfio_remove_dma_overlap(struct vfio_iommu *iommu, dma_addr_t start,
+ size_t *size, struct vfio_dma *dma)
+{
+ size_t offset, overlap, tmp;
+ struct vfio_dma *split;
+ int ret;
+
+ if (!*size)
+ return 0;
+
+ /*
+ * Existing dma region is completely covered, unmap all. This is
+ * the likely case since userspace tends to map and unmap buffers
+ * in one shot rather than multiple mappings within a buffer.
+ */
+ if (likely(start <= dma->iova &&
+ start + *size >= dma->iova + dma->size)) {
+ *size = dma->size;
+ ret = vfio_unmap_unpin(iommu, dma, dma->iova, size);
+ if (ret)
+ return ret;
+
+ /*
+ * Did we remove more than we have? Should never happen
+ * since a vfio_dma is contiguous in iova and vaddr.
+ */
+ WARN_ON(*size != dma->size);
+
+ vfio_remove_dma(iommu, dma);
+ kfree(dma);
+ return 0;
+ }
+
+ /* Overlap low address of existing range */
+ if (start <= dma->iova) {
+ overlap = start + *size - dma->iova;
+ ret = vfio_unmap_unpin(iommu, dma, dma->iova, &overlap);
+ if (ret)
+ return ret;
+
+ vfio_remove_dma(iommu, dma);
+
+ /*
+ * Check, we may have removed to whole vfio_dma. If not
+ * fixup and re-insert.
+ */
+ if (overlap < dma->size) {
+ dma->iova += overlap;
+ dma->vaddr += overlap;
+ dma->size -= overlap;
+ vfio_insert_dma(iommu, dma);
+ } else
+ kfree(dma);
+
+ *size = overlap;
+ return 0;
+ }
+
+ /* Overlap high address of existing range */
+ if (start + *size >= dma->iova + dma->size) {
+ offset = start - dma->iova;
+ overlap = dma->size - offset;
+
+ ret = vfio_unmap_unpin(iommu, dma, start, &overlap);
+ if (ret)
+ return ret;
+
+ dma->size -= overlap;
+ *size = overlap;
+ return 0;
+ }
+
+ /* Split existing */
+
+ /*
+ * Allocate our tracking structure early even though it may not
+ * be used. An Allocation failure later loses track of pages and
+ * is more difficult to unwind.
+ */
+ split = kzalloc(sizeof(*split), GFP_KERNEL);
+ if (!split)
+ return -ENOMEM;
+
+ offset = start - dma->iova;
+
+ ret = vfio_unmap_unpin(iommu, dma, start, size);
+ if (ret || !*size) {
+ kfree(split);
+ return ret;
+ }
+
+ tmp = dma->size;
+
+ /* Resize the lower vfio_dma in place, before the below insert */
+ dma->size = offset;
+
+ /* Insert new for remainder, assuming it didn't all get unmapped */
+ if (likely(offset + *size < tmp)) {
+ split->size = tmp - offset - *size;
+ split->iova = dma->iova + offset + *size;
+ split->vaddr = dma->vaddr + offset + *size;
+ split->prot = dma->prot;
+ vfio_insert_dma(iommu, split);
+ } else
+ kfree(split);
+
+ return 0;
+}
+
+/* Map DMA region */
+static int vfio_dma_map(struct vfio_iommu *iommu, dma_addr_t iova,
+ unsigned long vaddr, long npage, int prot)
+{
+ int ret = 0, i;
+ size_t size;
+ unsigned int win, nr_subwindows;
+ dma_addr_t iovamap;
+
+ /* total size to be mapped */
+ size = npage << PAGE_SHIFT;
+ do_div(size, iommu->page_size);
+ nr_subwindows = size;
+ size = npage << PAGE_SHIFT;
+ iovamap = iova;
+ for (i = 0; i < nr_subwindows; i++) {
+ unsigned long pfn;
+ unsigned long nr_pages;
+ dma_addr_t mapsize;
+ struct vfio_dma *dma = NULL;
+
+ win = iova_to_win(iommu, iovamap);
+ if (iovamap != iommu->aperture_start + iommu->page_size * win) {
+ pr_err("%s iova(%llx) unalligned to window size %llx\n",
+ __func__, iovamap, iommu->page_size);
+ ret = -EINVAL;
+ break;
+ }
+
+ mapsize = min(iova + size - iovamap, iommu->page_size);
+ /*
+ * FIXME: Currently we only support mapping page-size
+ * of subwindow-size.
+ */
+ if (mapsize < iommu->page_size) {
+ pr_err("%s iova (%llx) not alligned to window size %llx\n",
+ __func__, iovamap, iommu->page_size);
+ ret = -EINVAL;
+ break;
+ }
+
+ nr_pages = mapsize >> PAGE_SHIFT;
+
+ /* Pin a contiguous chunk of memory */
+ ret = vfio_pin_pages(vaddr, nr_pages, prot, &pfn);
+ if (ret != nr_pages) {
+ pr_err("%s unable to pin pages = %lx, pinned(%lx/%lx)\n",
+ __func__, vaddr, npage, nr_pages);
+ ret = -EINVAL;
+ break;
+ }
+
+ ret = iommu_domain_window_enable(iommu->domain, win,
+ (phys_addr_t)pfn << PAGE_SHIFT,
+ mapsize, prot);
+ if (ret) {
+ pr_err("%s unable to iommu_map()\n", __func__);
+ ret = -EINVAL;
+ break;
+ }
+
+ /*
+ * Check if we abut a region below - nothing below 0.
+ * This is the most likely case when mapping chunks of
+ * physically contiguous regions within a virtual address
+ * range. Update the abutting entry in place since iova
+ * doesn't change.
+ */
+ if (likely(iovamap)) {
+ struct vfio_dma *tmp;
+ tmp = vfio_find_dma(iommu, iovamap - 1, 1);
+ if (tmp && tmp->prot == prot &&
+ tmp->vaddr + tmp->size == vaddr) {
+ tmp->size += mapsize;
+ dma = tmp;
+ }
+ }
+
+ /*
+ * Check if we abut a region above - nothing above ~0 + 1.
+ * If we abut above and below, remove and free. If only
+ * abut above, remove, modify, reinsert.
+ */
+ if (likely(iovamap + mapsize)) {
+ struct vfio_dma *tmp;
+ tmp = vfio_find_dma(iommu, iovamap + mapsize, 1);
+ if (tmp && tmp->prot == prot &&
+ tmp->vaddr == vaddr + mapsize) {
+ vfio_remove_dma(iommu, tmp);
+ if (dma) {
+ dma->size += tmp->size;
+ kfree(tmp);
+ } else {
+ tmp->size += mapsize;
+ tmp->iova = iovamap;
+ tmp->vaddr = vaddr;
+ vfio_insert_dma(iommu, tmp);
+ dma = tmp;
+ }
+ }
+ }
+
+ if (!dma) {
+ dma = kzalloc(sizeof(*dma), GFP_KERNEL);
+ if (!dma) {
+ iommu_unmap(iommu->domain, iovamap, mapsize);
+ vfio_unpin_pages(pfn, npage, prot, true);
+ ret = -ENOMEM;
+ break;
+ }
+
+ dma->size = mapsize;
+ dma->iova = iovamap;
+ dma->vaddr = vaddr;
+ dma->prot = prot;
+ vfio_insert_dma(iommu, dma);
+ }
+
+ iovamap += mapsize;
+ vaddr += mapsize;
+ }
+
+ if (ret) {
+ struct vfio_dma *tmp;
+ while ((tmp = vfio_find_dma(iommu, iova, size))) {
+ int r = vfio_remove_dma_overlap(iommu, iova,
+ &size, tmp);
+ if (WARN_ON(r || !size))
+ break;
+ }
+ }
+
+ vfio_enable_iommu_domain(iommu);
+ return 0;
+}
+
+static int vfio_dma_do_map(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_dma_map *map)
+{
+ dma_addr_t iova = map->iova;
+ size_t size = map->size;
+ unsigned long vaddr = map->vaddr;
+ int ret = 0, prot = 0;
+ long npage;
+
+ /* READ/WRITE from device perspective */
+ if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
+ prot |= IOMMU_WRITE;
+ if (map->flags & VFIO_DMA_MAP_FLAG_READ)
+ prot |= IOMMU_READ;
+
+ if (!prot)
+ return -EINVAL; /* No READ/WRITE? */
+
+ /* Don't allow IOVA wrap */
+ if (iova + size && iova + size < iova)
+ return -EINVAL;
+
+ /* Don't allow virtual address wrap */
+ if (vaddr + size && vaddr + size < vaddr)
+ return -EINVAL;
+
+ /*
+ * FIXME: Currently we only support mapping page-size
+ * of subwindow-size.
+ */
+ if (size < iommu->page_size)
+ return -EINVAL;
+
+ npage = size >> PAGE_SHIFT;
+ if (!npage)
+ return -EINVAL;
+
+ mutex_lock(&iommu->lock);
+
+ if (vfio_find_dma(iommu, iova, size)) {
+ ret = -EEXIST;
+ goto out_lock;
+ }
+
+ vfio_dma_map(iommu, iova, vaddr, npage, prot);
+
+out_lock:
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
+static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
+ struct vfio_iommu_type1_dma_unmap *unmap)
+{
+ struct vfio_dma *dma;
+ size_t unmapped = 0, size;
+ int ret = 0;
+
+ mutex_lock(&iommu->lock);
+
+ while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
+ size = unmap->size;
+ ret = vfio_remove_dma_overlap(iommu, unmap->iova, &size, dma);
+ if (ret || !size)
+ break;
+ unmapped += size;
+ }
+
+ mutex_unlock(&iommu->lock);
+
+ /*
+ * We may unmap more than requested, update the unmap struct so
+ * userspace can know.
+ */
+ unmap->size = unmapped;
+
+ return ret;
+}
+
+static int vfio_handle_get_attr(struct vfio_iommu *iommu,
+ struct vfio_pamu_attr *pamu_attr)
+{
+ switch (pamu_attr->attribute) {
+ case VFIO_ATTR_GEOMETRY: {
+ struct iommu_domain_geometry geom;
+ if (iommu_domain_get_attr(iommu->domain,
+ DOMAIN_ATTR_GEOMETRY, &geom)) {
+ pr_err("%s Error getting domain geometry\n",
+ __func__);
+ return -EFAULT;
+ }
+
+ pamu_attr->attr_info.attr.aperture_start = geom.aperture_start;
+ pamu_attr->attr_info.attr.aperture_end = geom.aperture_end;
+ break;
+ }
+ case VFIO_ATTR_WINDOWS: {
+ u32 count;
+ if (iommu_domain_get_attr(iommu->domain,
+ DOMAIN_ATTR_WINDOWS, &count)) {
+ pr_err("%s Error getting domain windows\n",
+ __func__);
+ return -EFAULT;
+ }
+
+ pamu_attr->attr_info.windows = count;
+ break;
+ }
+ case VFIO_ATTR_PAMU_STASH: {
+ struct pamu_stash_attribute stash;
+ if (iommu_domain_get_attr(iommu->domain,
+ DOMAIN_ATTR_FSL_PAMU_STASH, &stash)) {
+ pr_err("%s Error getting domain windows\n",
+ __func__);
+ return -EFAULT;
+ }
+
+ pamu_attr->attr_info.stash.cpu = stash.cpu;
+ pamu_attr->attr_info.stash.cache = stash.cache;
+ break;
+ }
+
+ default:
+ pr_err("%s Error: Invalid attribute (%d)\n",
+ __func__, pamu_attr->attribute);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_handle_set_attr(struct vfio_iommu *iommu,
+ struct vfio_pamu_attr *pamu_attr)
+{
+ switch (pamu_attr->attribute) {
+ case VFIO_ATTR_GEOMETRY: {
+ struct iommu_domain_geometry geom;
+
+ geom.aperture_start = pamu_attr->attr_info.attr.aperture_start;
+ geom.aperture_end = pamu_attr->attr_info.attr.aperture_end;
+ iommu->aperture_start = geom.aperture_start;
+ iommu->aperture_end = geom.aperture_end;
+ geom.force_aperture = 1;
+ if (iommu_domain_set_attr(iommu->domain,
+ DOMAIN_ATTR_GEOMETRY, &geom)) {
+ pr_err("%s Error setting domain geometry\n", __func__);
+ return -EFAULT;
+ }
+
+ break;
+ }
+ case VFIO_ATTR_WINDOWS: {
+ u32 count = pamu_attr->attr_info.windows;
+ u64 size;
+ if (count > 256) {
+ pr_err("Number of subwindows requested (%d) is 256\n",
+ count);
+ return -EINVAL;
+ }
+ iommu->nsubwindows = pamu_attr->attr_info.windows;
+ size = iommu->aperture_end - iommu->aperture_start + 1;
+ do_div(size, count);
+ iommu->page_size = size;
+ if (iommu_domain_set_attr(iommu->domain,
+ DOMAIN_ATTR_WINDOWS, &count)) {
+ pr_err("%s Error getting domain windows\n",
+ __func__);
+ return -EFAULT;
+ }
+
+ break;
+ }
+ case VFIO_ATTR_PAMU_STASH: {
+ struct pamu_stash_attribute stash;
+
+ stash.cpu = pamu_attr->attr_info.stash.cpu;
+ stash.cache = pamu_attr->attr_info.stash.cache;
+ if (iommu_domain_set_attr(iommu->domain,
+ DOMAIN_ATTR_FSL_PAMU_STASH, &stash)) {
+ pr_err("%s Error getting domain windows\n",
+ __func__);
+ return -EFAULT;
+ }
+ break;
+ }
+
+ default:
+ pr_err("%s Error: Invalid attribute (%d)\n",
+ __func__, pamu_attr->attribute);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int vfio_msi_map(struct vfio_iommu *iommu,
+ struct vfio_pamu_msi_bank_map *msi_map, int prot)
+{
+ struct msi_region region;
+ int window;
+ int ret;
+
+ ret = msi_get_region(msi_map->msi_bank_index, ®ion);
+ if (ret) {
+ pr_err("%s MSI region (%d) not found\n", __func__,
+ msi_map->msi_bank_index);
+ return ret;
+ }
+
+ window = iova_to_win(iommu, msi_map->iova);
+ ret = iommu_domain_window_enable(iommu->domain, window, region.addr,
+ region.size, prot);
+ if (ret) {
+ pr_err("%s Error: unable to map msi region\n", __func__);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int vfio_do_msi_map(struct vfio_iommu *iommu,
+ struct vfio_pamu_msi_bank_map *msi_map)
+{
+ struct vfio_msi_dma *msi_dma;
+ int ret, prot = 0;
+
+ /* READ/WRITE from device perspective */
+ if (msi_map->flags & VFIO_DMA_MAP_FLAG_WRITE)
+ prot |= IOMMU_WRITE;
+ if (msi_map->flags & VFIO_DMA_MAP_FLAG_READ)
+ prot |= IOMMU_READ;
+
+ if (!prot)
+ return -EINVAL; /* No READ/WRITE? */
+
+ ret = vfio_msi_map(iommu, msi_map, prot);
+ if (ret)
+ return ret;
+
+ msi_dma = kzalloc(sizeof(*msi_dma), GFP_KERNEL);
+ if (!msi_dma)
+ return -ENOMEM;
+
+ msi_dma->iova = msi_map->iova;
+ msi_dma->bank_id = msi_map->msi_bank_index;
+ list_add(&msi_dma->next, &iommu->msi_dma_list);
+ return 0;
+}
+
+static void vfio_msi_unmap(struct vfio_iommu *iommu, dma_addr_t iova)
+{
+ int window;
+ window = iova_to_win(iommu, iova);
+ iommu_domain_window_disable(iommu->domain, window);
+}
+
+static int vfio_do_msi_unmap(struct vfio_iommu *iommu,
+ struct vfio_pamu_msi_bank_unmap *msi_unmap)
+{
+ struct vfio_msi_dma *mdma, *mdma_tmp;
+
+ list_for_each_entry_safe(mdma, mdma_tmp, &iommu->msi_dma_list, next) {
+ if (mdma->iova == msi_unmap->iova) {
+ vfio_msi_unmap(iommu, mdma->iova);
+ list_del(&mdma->next);
+ kfree(mdma);
+ return 0;
+ }
+ }
+
+ return -EINVAL;
+}
+static void *vfio_iommu_fsl_pamu_open(unsigned long arg)
+{
+ struct vfio_iommu *iommu;
+
+ if (arg != VFIO_FSL_PAMU_IOMMU)
+ return ERR_PTR(-EINVAL);
+
+ iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
+ if (!iommu)
+ return ERR_PTR(-ENOMEM);
+
+ INIT_LIST_HEAD(&iommu->group_list);
+ iommu->dma_list = RB_ROOT;
+ INIT_LIST_HEAD(&iommu->msi_dma_list);
+ mutex_init(&iommu->lock);
+
+ /*
+ * Wish we didn't have to know about bus_type here.
+ */
+ iommu->domain = iommu_domain_alloc(&pci_bus_type);
+ if (!iommu->domain) {
+ kfree(iommu);
+ return ERR_PTR(-EIO);
+ }
+
+ return iommu;
+}
+
+static void vfio_iommu_fsl_pamu_release(void *iommu_data)
+{
+ struct vfio_iommu *iommu = iommu_data;
+ struct vfio_group *group, *group_tmp;
+ struct vfio_msi_dma *mdma, *mdma_tmp;
+ struct rb_node *node;
+
+ list_for_each_entry_safe(group, group_tmp, &iommu->group_list, next) {
+ iommu_detach_group(iommu->domain, group->iommu_group);
+ list_del(&group->next);
+ kfree(group);
+ }
+
+ while ((node = rb_first(&iommu->dma_list))) {
+ struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
+ size_t size = dma->size;
+ vfio_remove_dma_overlap(iommu, dma->iova, &size, dma);
+ if (WARN_ON(!size))
+ break;
+ }
+
+ list_for_each_entry_safe(mdma, mdma_tmp, &iommu->msi_dma_list, next) {
+ vfio_msi_unmap(iommu, mdma->iova);
+ list_del(&mdma->next);
+ kfree(mdma);
+ }
+
+ iommu_domain_free(iommu->domain);
+ iommu->domain = NULL;
+ kfree(iommu);
+}
+
+static long vfio_iommu_fsl_pamu_ioctl(void *iommu_data,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_iommu *iommu = iommu_data;
+ unsigned long minsz;
+
+ if (cmd == VFIO_CHECK_EXTENSION) {
+ switch (arg) {
+ case VFIO_FSL_PAMU_IOMMU:
+ return 1;
+ default:
+ return 0;
+ }
+ } else if (cmd == VFIO_IOMMU_MAP_DMA) {
+ struct vfio_iommu_type1_dma_map map;
+ uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
+ VFIO_DMA_MAP_FLAG_WRITE;
+
+ minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
+
+ if (copy_from_user(&map, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (map.argsz < minsz || map.flags & ~mask)
+ return -EINVAL;
+
+ return vfio_dma_do_map(iommu, &map);
+
+ } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
+ struct vfio_iommu_type1_dma_unmap unmap;
+ long ret;
+
+ minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
+
+ if (copy_from_user(&unmap, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (unmap.argsz < minsz || unmap.flags)
+ return -EINVAL;
+
+ ret = vfio_dma_do_unmap(iommu, &unmap);
+ if (ret)
+ return ret;
+
+ return copy_to_user((void __user *)arg, &unmap, minsz);
+ } else if (cmd == VFIO_IOMMU_PAMU_GET_ATTR) {
+ struct vfio_pamu_attr pamu_attr;
+
+ minsz = offsetofend(struct vfio_pamu_attr, attr_info);
+ if (copy_from_user(&pamu_attr, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (pamu_attr.argsz < minsz)
+ return -EINVAL;
+
+ vfio_handle_get_attr(iommu, &pamu_attr);
+
+ copy_to_user((void __user *)arg, &pamu_attr, minsz);
+ return 0;
+ } else if (cmd == VFIO_IOMMU_PAMU_SET_ATTR) {
+ struct vfio_pamu_attr pamu_attr;
+
+ minsz = offsetofend(struct vfio_pamu_attr, attr_info);
+ if (copy_from_user(&pamu_attr, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (pamu_attr.argsz < minsz)
+ return -EINVAL;
+
+ vfio_handle_set_attr(iommu, &pamu_attr);
+ return 0;
+ } else if (cmd == VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT) {
+ return msi_get_region_count();
+ } else if (cmd == VFIO_IOMMU_PAMU_MAP_MSI_BANK) {
+ struct vfio_pamu_msi_bank_map msi_map;
+
+ minsz = offsetofend(struct vfio_pamu_msi_bank_map, iova);
+ if (copy_from_user(&msi_map, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (msi_map.argsz < minsz)
+ return -EINVAL;
+
+ vfio_do_msi_map(iommu, &msi_map);
+ return 0;
+ } else if (cmd == VFIO_IOMMU_PAMU_UNMAP_MSI_BANK) {
+ struct vfio_pamu_msi_bank_unmap msi_unmap;
+
+ minsz = offsetofend(struct vfio_pamu_msi_bank_unmap, iova);
+ if (copy_from_user(&msi_unmap, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (msi_unmap.argsz < minsz)
+ return -EINVAL;
+
+ vfio_do_msi_unmap(iommu, &msi_unmap);
+ return 0;
+
+ }
+
+ return -ENOTTY;
+}
+
+static int vfio_iommu_fsl_pamu_attach_group(void *iommu_data,
+ struct iommu_group *iommu_group)
+{
+ struct vfio_iommu *iommu = iommu_data;
+ struct vfio_group *group, *tmp;
+ int ret;
+
+ group = kzalloc(sizeof(*group), GFP_KERNEL);
+ if (!group)
+ return -ENOMEM;
+
+ mutex_lock(&iommu->lock);
+
+ list_for_each_entry(tmp, &iommu->group_list, next) {
+ if (tmp->iommu_group == iommu_group) {
+ mutex_unlock(&iommu->lock);
+ kfree(group);
+ return -EINVAL;
+ }
+ }
+
+ ret = iommu_attach_group(iommu->domain, iommu_group);
+ if (ret) {
+ mutex_unlock(&iommu->lock);
+ kfree(group);
+ return ret;
+ }
+
+ group->iommu_group = iommu_group;
+ list_add(&group->next, &iommu->group_list);
+
+ mutex_unlock(&iommu->lock);
+
+ return 0;
+}
+
+static void vfio_iommu_fsl_pamu_detach_group(void *iommu_data,
+ struct iommu_group *iommu_group)
+{
+ struct vfio_iommu *iommu = iommu_data;
+ struct vfio_group *group;
+
+ mutex_lock(&iommu->lock);
+
+ list_for_each_entry(group, &iommu->group_list, next) {
+ if (group->iommu_group == iommu_group) {
+ iommu_detach_group(iommu->domain, iommu_group);
+ list_del(&group->next);
+ kfree(group);
+ break;
+ }
+ }
+
+ mutex_unlock(&iommu->lock);
+}
+
+static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_fsl_pamu = {
+ .name = "vfio-iommu-fsl_pamu",
+ .owner = THIS_MODULE,
+ .open = vfio_iommu_fsl_pamu_open,
+ .release = vfio_iommu_fsl_pamu_release,
+ .ioctl = vfio_iommu_fsl_pamu_ioctl,
+ .attach_group = vfio_iommu_fsl_pamu_attach_group,
+ .detach_group = vfio_iommu_fsl_pamu_detach_group,
+};
+
+static int __init vfio_iommu_fsl_pamu_init(void)
+{
+ if (!iommu_present(&pci_bus_type))
+ return -ENODEV;
+
+ return vfio_register_iommu_driver(&vfio_iommu_driver_ops_fsl_pamu);
+}
+
+static void __exit vfio_iommu_fsl_pamu_cleanup(void)
+{
+ vfio_unregister_iommu_driver(&vfio_iommu_driver_ops_fsl_pamu);
+}
+
+module_init(vfio_iommu_fsl_pamu_init);
+module_exit(vfio_iommu_fsl_pamu_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0fd47f5..d359055 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -23,6 +23,7 @@
#define VFIO_TYPE1_IOMMU 1
#define VFIO_SPAPR_TCE_IOMMU 2
+#define VFIO_FSL_PAMU_IOMMU 3
/*
* The IOCTL interface is designed for extensibility by embedding the
@@ -451,4 +452,103 @@ struct vfio_iommu_spapr_tce_info {
/* ***************************************************************** */
+/*********** APIs for VFIO_PAMU type only ****************/
+/*
+ * VFIO_IOMMU_PAMU_GET_ATTR - _IO(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_pamu_attr)
+ *
+ * Gets the iommu attributes for the current vfio container.
+ * Caller sets argsz and attribute. The ioctl fills in
+ * the provided struct vfio_pamu_attr based on the attribute
+ * value that was set.
+ * Return: 0 on success, -errno on failure
+ */
+struct vfio_pamu_attr {
+ __u32 argsz;
+ __u32 flags; /* no flags currently */
+#define VFIO_ATTR_GEOMETRY 0
+#define VFIO_ATTR_WINDOWS 1
+#define VFIO_ATTR_PAMU_STASH 2
+ __u32 attribute;
+
+ union {
+ /* VFIO_ATTR_GEOMETRY */
+ struct {
+ /* first addr that can be mapped */
+ __u64 aperture_start;
+ /* last addr that can be mapped */
+ __u64 aperture_end;
+ } attr;
+
+ /* VFIO_ATTR_WINDOWS */
+ __u32 windows; /* number of windows in the aperture
+ * initially this will be the max number
+ * of windows that can be set
+ */
+ /* VFIO_ATTR_PAMU_STASH */
+ struct {
+ __u32 cpu; /* CPU number for stashing */
+ __u32 cache; /* cache ID for stashing */
+ } stash;
+ } attr_info;
+};
+#define VFIO_IOMMU_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * VFIO_IOMMU_PAMU_SET_ATTR - _IO(VFIO_TYPE, VFIO_BASE + 18,
+ * struct vfio_pamu_attr)
+ *
+ * Sets the iommu attributes for the current vfio container.
+ * Caller sets struct vfio_pamu attr, including argsz and attribute and
+ * setting any fields that are valid for the attribute.
+ * Return: 0 on success, -errno on failure
+ */
+#define VFIO_IOMMU_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + 18)
+
+/*
+ * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT - _IO(VFIO_TYPE, VFIO_BASE + 19, __u32)
+ *
+ * Returns the number of MSI banks for this platform. This tells user space
+ * how many aperture windows should be reserved for MSI banks when setting
+ * the PAMU geometry and window count.
+ * Return: __u32 bank count on success, -errno on failure
+ */
+#define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_IOMMU_PAMU_MAP_MSI_BANK - _IO(VFIO_TYPE, VFIO_BASE + 20,
+ * struct vfio_pamu_msi_bank_map)
+ *
+ * Maps the MSI bank at the specified index and iova. User space must
+ * call this ioctl once for each MSI bank (count of banks is returned by
+ * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT).
+ * Caller provides struct vfio_pamu_msi_bank_map with all fields set.
+ * Return: 0 on success, -errno on failure
+ */
+
+struct vfio_pamu_msi_bank_map {
+ __u32 argsz;
+ __u32 flags; /* no flags currently */
+ __u32 msi_bank_index; /* the index of the MSI bank */
+ __u64 iova; /* the iova the bank is to be mapped to */
+};
+#define VFIO_IOMMU_PAMU_MAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + 20)
+
+/*
+ * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK - _IO(VFIO_TYPE, VFIO_BASE + 21,
+ * struct vfio_pamu_msi_bank_unmap)
+ *
+ * Unmaps the MSI bank at the specified iova.
+ * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set.
+ * Operates on VFIO file descriptor (/dev/vfio/vfio).
+ * Return: 0 on success, -errno on failure
+ */
+
+struct vfio_pamu_msi_bank_unmap {
+ __u32 argsz;
+ __u32 flags; /* no flags currently */
+ __u64 iova; /* the iova to be unmapped to */
+};
+#define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + 21)
+
#endif /* _UAPIVFIO_H */
--
1.7.0.4
^ permalink raw reply related
* [PATCH 6/7] vfio: moving some functions in common file
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
Some function defined in vfio_iommu_type1.c were common and
we want to use these for FSL IOMMU (PAMU) and iommu-none driver.
So some of them are moved to vfio_iommu_common.c
I think we can do more of that but we will take this step by step.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
drivers/vfio/Makefile | 4 +-
drivers/vfio/vfio_iommu_common.c | 235 ++++++++++++++++++++++++++++++++++++++
drivers/vfio/vfio_iommu_common.h | 30 +++++
drivers/vfio/vfio_iommu_type1.c | 206 +---------------------------------
4 files changed, 268 insertions(+), 207 deletions(-)
create mode 100644 drivers/vfio/vfio_iommu_common.c
create mode 100644 drivers/vfio/vfio_iommu_common.h
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 72bfabc..c5792ec 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,4 @@
obj-$(CONFIG_VFIO) += vfio.o
-obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
-obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
+obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o vfio_iommu_type1.o
+obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o vfio_iommu_spapr_tce.o
obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_common.c b/drivers/vfio/vfio_iommu_common.c
new file mode 100644
index 0000000..8bdc0ea
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_common.c
@@ -0,0 +1,235 @@
+/*
+ * VFIO: Common code for vfio IOMMU support
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ * Author: Bharat Bhushan <bharat.bhushan@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc. All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/pci.h> /* pci_bus_type */
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/workqueue.h>
+
+static bool disable_hugepages;
+module_param_named(disable_hugepages,
+ disable_hugepages, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_hugepages,
+ "Disable VFIO IOMMU support for IOMMU hugepages.");
+
+struct vwork {
+ struct mm_struct *mm;
+ long npage;
+ struct work_struct work;
+};
+
+/* delayed decrement/increment for locked_vm */
+void vfio_lock_acct_bg(struct work_struct *work)
+{
+ struct vwork *vwork = container_of(work, struct vwork, work);
+ struct mm_struct *mm;
+
+ mm = vwork->mm;
+ down_write(&mm->mmap_sem);
+ mm->locked_vm += vwork->npage;
+ up_write(&mm->mmap_sem);
+ mmput(mm);
+ kfree(vwork);
+}
+
+void vfio_lock_acct(long npage)
+{
+ struct vwork *vwork;
+ struct mm_struct *mm;
+
+ if (!current->mm || !npage)
+ return; /* process exited or nothing to do */
+
+ if (down_write_trylock(¤t->mm->mmap_sem)) {
+ current->mm->locked_vm += npage;
+ up_write(¤t->mm->mmap_sem);
+ return;
+ }
+
+ /*
+ * Couldn't get mmap_sem lock, so must setup to update
+ * mm->locked_vm later. If locked_vm were atomic, we
+ * wouldn't need this silliness
+ */
+ vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
+ if (!vwork)
+ return;
+ mm = get_task_mm(current);
+ if (!mm) {
+ kfree(vwork);
+ return;
+ }
+ INIT_WORK(&vwork->work, vfio_lock_acct_bg);
+ vwork->mm = mm;
+ vwork->npage = npage;
+ schedule_work(&vwork->work);
+}
+
+/*
+ * Some mappings aren't backed by a struct page, for example an mmap'd
+ * MMIO range for our own or another device. These use a different
+ * pfn conversion and shouldn't be tracked as locked pages.
+ */
+bool is_invalid_reserved_pfn(unsigned long pfn)
+{
+ if (pfn_valid(pfn)) {
+ bool reserved;
+ struct page *tail = pfn_to_page(pfn);
+ struct page *head = compound_trans_head(tail);
+ reserved = !!(PageReserved(head));
+ if (head != tail) {
+ /*
+ * "head" is not a dangling pointer
+ * (compound_trans_head takes care of that)
+ * but the hugepage may have been split
+ * from under us (and we may not hold a
+ * reference count on the head page so it can
+ * be reused before we run PageReferenced), so
+ * we've to check PageTail before returning
+ * what we just read.
+ */
+ smp_rmb();
+ if (PageTail(tail))
+ return reserved;
+ }
+ return PageReserved(tail);
+ }
+
+ return true;
+}
+
+int put_pfn(unsigned long pfn, int prot)
+{
+ if (!is_invalid_reserved_pfn(pfn)) {
+ struct page *page = pfn_to_page(pfn);
+ if (prot & IOMMU_WRITE)
+ SetPageDirty(page);
+ put_page(page);
+ return 1;
+ }
+ return 0;
+}
+
+static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
+{
+ struct page *page[1];
+ struct vm_area_struct *vma;
+ int ret = -EFAULT;
+
+ if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
+ *pfn = page_to_pfn(page[0]);
+ return 0;
+ }
+
+ printk("via vma\n");
+ down_read(¤t->mm->mmap_sem);
+
+ vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
+
+ if (vma && vma->vm_flags & VM_PFNMAP) {
+ *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ if (is_invalid_reserved_pfn(*pfn))
+ ret = 0;
+ }
+
+ up_read(¤t->mm->mmap_sem);
+
+ return ret;
+}
+
+/*
+ * Attempt to pin pages. We really don't want to track all the pfns and
+ * the iommu can only map chunks of consecutive pfns anyway, so get the
+ * first page and all consecutive pages with the same locking.
+ */
+long vfio_pin_pages(unsigned long vaddr, long npage,
+ int prot, unsigned long *pfn_base)
+{
+ unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+ bool lock_cap = capable(CAP_IPC_LOCK);
+ long ret, i;
+
+ if (!current->mm)
+ return -ENODEV;
+
+ ret = vaddr_get_pfn(vaddr, prot, pfn_base);
+ if (ret)
+ return ret;
+
+ if (is_invalid_reserved_pfn(*pfn_base))
+ return 1;
+
+ if (!lock_cap && current->mm->locked_vm + 1 > limit) {
+ put_pfn(*pfn_base, prot);
+ pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
+ limit << PAGE_SHIFT);
+ return -ENOMEM;
+ }
+
+ if (unlikely(disable_hugepages)) {
+ vfio_lock_acct(1);
+ return 1;
+ }
+
+ /* Lock all the consecutive pages from pfn_base */
+ for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
+ unsigned long pfn = 0;
+
+ ret = vaddr_get_pfn(vaddr, prot, &pfn);
+ if (ret)
+ break;
+
+ if (pfn != *pfn_base + i || is_invalid_reserved_pfn(pfn)) {
+ put_pfn(pfn, prot);
+ break;
+ }
+
+ if (!lock_cap && current->mm->locked_vm + i + 1 > limit) {
+ put_pfn(pfn, prot);
+ pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
+ __func__, limit << PAGE_SHIFT);
+ break;
+ }
+ }
+
+ vfio_lock_acct(i);
+
+ return i;
+}
+
+long vfio_unpin_pages(unsigned long pfn, long npage,
+ int prot, bool do_accounting)
+{
+ unsigned long unlocked = 0;
+ long i;
+
+ for (i = 0; i < npage; i++)
+ unlocked += put_pfn(pfn++, prot);
+
+ if (do_accounting)
+ vfio_lock_acct(-unlocked);
+
+ return unlocked;
+}
diff --git a/drivers/vfio/vfio_iommu_common.h b/drivers/vfio/vfio_iommu_common.h
new file mode 100644
index 0000000..4738391
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_common.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2013 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _VFIO_IOMMU_COMMON_H
+#define _VFIO_IOMMU_COMMON_H
+
+void vfio_lock_acct_bg(struct work_struct *work);
+void vfio_lock_acct(long npage);
+bool is_invalid_reserved_pfn(unsigned long pfn);
+int put_pfn(unsigned long pfn, int prot);
+long vfio_pin_pages(unsigned long vaddr, long npage, int prot,
+ unsigned long *pfn_base);
+long vfio_unpin_pages(unsigned long pfn, long npage,
+ int prot, bool do_accounting);
+#endif
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a9807de..e9a58fa 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/workqueue.h>
+#include "vfio_iommu_common.h"
#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
@@ -48,12 +49,6 @@ module_param_named(allow_unsafe_interrupts,
MODULE_PARM_DESC(allow_unsafe_interrupts,
"Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
-static bool disable_hugepages;
-module_param_named(disable_hugepages,
- disable_hugepages, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(disable_hugepages,
- "Disable VFIO IOMMU support for IOMMU hugepages.");
-
struct vfio_iommu {
struct iommu_domain *domain;
struct mutex lock;
@@ -123,205 +118,6 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
rb_erase(&old->node, &iommu->dma_list);
}
-struct vwork {
- struct mm_struct *mm;
- long npage;
- struct work_struct work;
-};
-
-/* delayed decrement/increment for locked_vm */
-static void vfio_lock_acct_bg(struct work_struct *work)
-{
- struct vwork *vwork = container_of(work, struct vwork, work);
- struct mm_struct *mm;
-
- mm = vwork->mm;
- down_write(&mm->mmap_sem);
- mm->locked_vm += vwork->npage;
- up_write(&mm->mmap_sem);
- mmput(mm);
- kfree(vwork);
-}
-
-static void vfio_lock_acct(long npage)
-{
- struct vwork *vwork;
- struct mm_struct *mm;
-
- if (!current->mm || !npage)
- return; /* process exited or nothing to do */
-
- if (down_write_trylock(¤t->mm->mmap_sem)) {
- current->mm->locked_vm += npage;
- up_write(¤t->mm->mmap_sem);
- return;
- }
-
- /*
- * Couldn't get mmap_sem lock, so must setup to update
- * mm->locked_vm later. If locked_vm were atomic, we
- * wouldn't need this silliness
- */
- vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
- if (!vwork)
- return;
- mm = get_task_mm(current);
- if (!mm) {
- kfree(vwork);
- return;
- }
- INIT_WORK(&vwork->work, vfio_lock_acct_bg);
- vwork->mm = mm;
- vwork->npage = npage;
- schedule_work(&vwork->work);
-}
-
-/*
- * Some mappings aren't backed by a struct page, for example an mmap'd
- * MMIO range for our own or another device. These use a different
- * pfn conversion and shouldn't be tracked as locked pages.
- */
-static bool is_invalid_reserved_pfn(unsigned long pfn)
-{
- if (pfn_valid(pfn)) {
- bool reserved;
- struct page *tail = pfn_to_page(pfn);
- struct page *head = compound_trans_head(tail);
- reserved = !!(PageReserved(head));
- if (head != tail) {
- /*
- * "head" is not a dangling pointer
- * (compound_trans_head takes care of that)
- * but the hugepage may have been split
- * from under us (and we may not hold a
- * reference count on the head page so it can
- * be reused before we run PageReferenced), so
- * we've to check PageTail before returning
- * what we just read.
- */
- smp_rmb();
- if (PageTail(tail))
- return reserved;
- }
- return PageReserved(tail);
- }
-
- return true;
-}
-
-static int put_pfn(unsigned long pfn, int prot)
-{
- if (!is_invalid_reserved_pfn(pfn)) {
- struct page *page = pfn_to_page(pfn);
- if (prot & IOMMU_WRITE)
- SetPageDirty(page);
- put_page(page);
- return 1;
- }
- return 0;
-}
-
-static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
-{
- struct page *page[1];
- struct vm_area_struct *vma;
- int ret = -EFAULT;
-
- if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
- *pfn = page_to_pfn(page[0]);
- return 0;
- }
-
- down_read(¤t->mm->mmap_sem);
-
- vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
-
- if (vma && vma->vm_flags & VM_PFNMAP) {
- *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
- if (is_invalid_reserved_pfn(*pfn))
- ret = 0;
- }
-
- up_read(¤t->mm->mmap_sem);
-
- return ret;
-}
-
-/*
- * Attempt to pin pages. We really don't want to track all the pfns and
- * the iommu can only map chunks of consecutive pfns anyway, so get the
- * first page and all consecutive pages with the same locking.
- */
-static long vfio_pin_pages(unsigned long vaddr, long npage,
- int prot, unsigned long *pfn_base)
-{
- unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
- bool lock_cap = capable(CAP_IPC_LOCK);
- long ret, i;
-
- if (!current->mm)
- return -ENODEV;
-
- ret = vaddr_get_pfn(vaddr, prot, pfn_base);
- if (ret)
- return ret;
-
- if (is_invalid_reserved_pfn(*pfn_base))
- return 1;
-
- if (!lock_cap && current->mm->locked_vm + 1 > limit) {
- put_pfn(*pfn_base, prot);
- pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
- limit << PAGE_SHIFT);
- return -ENOMEM;
- }
-
- if (unlikely(disable_hugepages)) {
- vfio_lock_acct(1);
- return 1;
- }
-
- /* Lock all the consecutive pages from pfn_base */
- for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
- unsigned long pfn = 0;
-
- ret = vaddr_get_pfn(vaddr, prot, &pfn);
- if (ret)
- break;
-
- if (pfn != *pfn_base + i || is_invalid_reserved_pfn(pfn)) {
- put_pfn(pfn, prot);
- break;
- }
-
- if (!lock_cap && current->mm->locked_vm + i + 1 > limit) {
- put_pfn(pfn, prot);
- pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
- __func__, limit << PAGE_SHIFT);
- break;
- }
- }
-
- vfio_lock_acct(i);
-
- return i;
-}
-
-static long vfio_unpin_pages(unsigned long pfn, long npage,
- int prot, bool do_accounting)
-{
- unsigned long unlocked = 0;
- long i;
-
- for (i = 0; i < npage; i++)
- unlocked += put_pfn(pfn++, prot);
-
- if (do_accounting)
- vfio_lock_acct(-unlocked);
-
- return unlocked;
-}
-
static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
dma_addr_t iova, size_t *size)
{
--
1.7.0.4
^ permalink raw reply related
* [PATCH 5/7] iommu: supress loff_t compilation error on powerpc
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 210db24..8a8156a 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -181,7 +181,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
int ret;
- loff_t off, pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ loff_t off;
+ u64 pos = (u64 )(*ppos & VFIO_PCI_OFFSET_MASK);
void __iomem *iomem = NULL;
unsigned int rsrc;
bool is_ioport;
--
1.7.0.4
^ permalink raw reply related
* [PATCH 4/7] powerpc: translate msi addr to iova if iommu is in use
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
If the device is attached with iommu domain then set MSI address
to the iova configured in PAMU.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
arch/powerpc/sysdev/fsl_msi.c | 56 +++++++++++++++++++++++++++++++++++++++-
1 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index ed045cb..c7cf018 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -18,6 +18,7 @@
#include <linux/pci.h>
#include <linux/slab.h>
#include <linux/of_platform.h>
+#include <linux/iommu.h>
#include <sysdev/fsl_soc.h>
#include <asm/prom.h>
#include <asm/hw_irq.h>
@@ -150,7 +151,40 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
return;
}
-static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
+static uint64_t fsl_iommu_get_iova(struct pci_dev *pdev, dma_addr_t msi_phys)
+{
+ struct iommu_domain *domain;
+ struct iommu_domain_geometry geometry;
+ u32 wins = 0;
+ uint64_t iova, size;
+ int ret, i;
+
+ domain = iommu_get_dev_domain(&pdev->dev);
+ if (!domain)
+ return 0;
+
+ ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_WINDOWS, &wins);
+ if (ret)
+ return 0;
+
+ ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_GEOMETRY, &geometry);
+ if (ret)
+ return 0;
+
+ iova = geometry.aperture_start;
+ size = geometry.aperture_end - geometry.aperture_start + 1;
+ do_div(size, wins);
+ for (i = 0; i < wins; i++) {
+ phys_addr_t phys;
+ phys = iommu_iova_to_phys(domain, iova);
+ if (phys == (msi_phys & ~(PAGE_SIZE - 1)))
+ return (iova + (msi_phys & (PAGE_SIZE - 1)));
+ iova += size;
+ }
+ return 0;
+}
+
+static int fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
struct msi_msg *msg,
struct fsl_msi *fsl_msi_data)
{
@@ -168,6 +202,16 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
address = fsl_pci_immrbar_base(hose) +
(msi_data->msiir & 0xfffff);
+ /*
+ * If the device is attached with iommu domain then set MSI address
+ * to the iova configured in PAMU.
+ */
+ if (iommu_get_dev_domain(&pdev->dev)) {
+ address = fsl_iommu_get_iova(pdev, msi_data->msiir);
+ if (!address)
+ return -ENODEV;
+ }
+
msg->address_lo = lower_32_bits(address);
msg->address_hi = upper_32_bits(address);
@@ -175,6 +219,8 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
pr_debug("%s: allocated srs: %d, ibs: %d\n",
__func__, hwirq / IRQS_PER_MSI_REG, hwirq % IRQS_PER_MSI_REG);
+
+ return 0;
}
static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
@@ -244,7 +290,13 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
/* chip_data is msi_data via host->hostdata in host->map() */
irq_set_msi_desc(virq, entry);
- fsl_compose_msi_msg(pdev, hwirq, &msg, msi_data);
+ if (fsl_compose_msi_msg(pdev, hwirq, &msg, msi_data)) {
+ dev_err(&pdev->dev, "Fail to set MSI for hwirq %i\n",
+ hwirq);
+ msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
+ rc = -ENODEV;
+ goto out_free;
+ }
write_msi_msg(virq, &msg);
}
return 0;
--
1.7.0.4
^ permalink raw reply related
* [PATCH 0/7] vfio-pci: add support for Freescale IOMMU (PAMU)
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
From: Bharat Bhushan <bharat.bhushan@freescale.com>
This patchset adds support for vfio-pci with Freescale
IOMMU (PAMU- Peripheral Access Management Unit)
The Freescale PAMU is an aperture-based IOMMU with the following
characteristics. Each device has an entry in a table in memory
describing the iova->phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
is naturally aligned
-has 1 or more windows within the aperture
-number of windows must be power of 2, max is 256
-size of each window is determined by aperture size / # of windows
-iova of each window is determined by aperture start iova / # of windows
-the mapped region in each window can be different than
the window size...mapping must power of 2
-physical address of the mapping must be naturally aligned
with the mapping size
Because of some of above said limitations we need to set limited aperture
window which will have space for MSI address mapping. So we create space
for MSI windows just after the IOVA (guest memory).
First 4 patches in this patchset are for setting up MSI window and MSI address
at device accordingly.
Fifth patch resolves compilation error.
Sixth patch moves some common functions in a separate file so that they can be
used by FSL_PAMU implementation (next patch uses this). These will be used later for
iommu-none implementation. I believe we can do more of this but will take step by step.
Finally the seventh patch actually adds the support for FSL-PAMU :)
Bharat Bhushan (7):
powerpc: Add interface to get msi region information
iommu: add api to get iommu_domain of a device
fsl iommu: add get_dev_iommu_domain
powerpc: translate msi addr to iova if iommu is in use
iommu: supress loff_t compilation error on powerpc
vfio: moving some functions in common file
vfio pci: Add vfio iommu implementation for FSL_PAMU
arch/powerpc/include/asm/machdep.h | 8 +
arch/powerpc/include/asm/pci.h | 2 +
arch/powerpc/kernel/msi.c | 18 +
arch/powerpc/sysdev/fsl_msi.c | 95 ++++-
arch/powerpc/sysdev/fsl_msi.h | 11 +-
drivers/iommu/fsl_pamu_domain.c | 30 ++
drivers/iommu/iommu.c | 10 +
drivers/pci/msi.c | 26 +
drivers/vfio/Kconfig | 6 +
drivers/vfio/Makefile | 5 +-
drivers/vfio/pci/vfio_pci_rdwr.c | 3 +-
drivers/vfio/vfio_iommu_common.c | 235 +++++++++
drivers/vfio/vfio_iommu_common.h | 30 ++
drivers/vfio/vfio_iommu_fsl_pamu.c | 952 ++++++++++++++++++++++++++++++++++++
drivers/vfio/vfio_iommu_type1.c | 206 +--------
include/linux/iommu.h | 7 +
include/linux/msi.h | 8 +
include/linux/pci.h | 13 +
include/uapi/linux/vfio.h | 100 ++++
19 files changed, 1550 insertions(+), 215 deletions(-)
create mode 100644 drivers/vfio/vfio_iommu_common.c
create mode 100644 drivers/vfio/vfio_iommu_common.h
create mode 100644 drivers/vfio/vfio_iommu_fsl_pamu.c
^ permalink raw reply
* [PATCH 3/7] fsl iommu: add get_dev_iommu_domain
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
From: Bharat Bhushan <bharat.bhushan@freescale.com>
returns the iommu_domain of the requested device for fsl pamu.
Use PCI controller dev struct for pci devices as current LIODN schema
assign LIODN to PCI controller not PCI device. This will be corrected
with proper LIODN schema.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
drivers/iommu/fsl_pamu_domain.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 14d803a..1d0dfe3 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -1140,6 +1140,35 @@ static u32 fsl_pamu_get_windows(struct iommu_domain *domain)
return dma_domain->win_cnt;
}
+static struct iommu_domain *fsl_get_dev_domain(struct device *dev)
+{
+ struct pci_controller *pci_ctl;
+ struct device_domain_info *info;
+ struct pci_dev *pdev;
+
+ /*
+ * Use PCI controller dev struct for pci devices as current
+ * LIODN schema assign LIODN to PCI controller not PCI device
+ * This should get corrected with proper LIODN schema.
+ */
+ if (dev->bus == &pci_bus_type) {
+ pdev = to_pci_dev(dev);
+ pci_ctl = pci_bus_to_host(pdev->bus);
+ /*
+ * make dev point to pci controller device
+ * so we can get the LIODN programmed by
+ * u-boot.
+ */
+ dev = pci_ctl->parent;
+ }
+
+ info = dev->archdata.iommu_domain;
+ if (info && info->domain)
+ return info->domain->iommu_domain;
+
+ return NULL;
+}
+
static struct iommu_ops fsl_pamu_ops = {
.domain_init = fsl_pamu_domain_init,
.domain_destroy = fsl_pamu_domain_destroy,
@@ -1155,6 +1184,7 @@ static struct iommu_ops fsl_pamu_ops = {
.domain_get_attr = fsl_pamu_get_domain_attr,
.add_device = fsl_pamu_add_device,
.remove_device = fsl_pamu_remove_device,
+ .get_dev_iommu_domain = fsl_get_dev_domain,
};
int pamu_domain_init()
--
1.7.0.4
^ permalink raw reply related
* [PATCH 2/7] iommu: add api to get iommu_domain of a device
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
This api return the iommu domain to which the device is attached.
The iommu_domain is required for making API calls related to iommu.
Follow up patches which use this API to know iommu maping.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
drivers/iommu/iommu.c | 10 ++++++++++
include/linux/iommu.h | 7 +++++++
2 files changed, 17 insertions(+), 0 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fbe9ca7..6ac5f50 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
}
EXPORT_SYMBOL_GPL(iommu_detach_device);
+struct iommu_domain *iommu_get_dev_domain(struct device *dev)
+{
+ struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (unlikely(ops == NULL || ops->get_dev_iommu_domain == NULL))
+ return NULL;
+
+ return ops->get_dev_iommu_domain(dev);
+}
+EXPORT_SYMBOL_GPL(iommu_get_dev_domain);
/*
* IOMMU groups are really the natrual working unit of the IOMMU, but
* the IOMMU API works on domains and devices. Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ea319e..fa046bd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ struct iommu_ops {
int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
/* Get the numer of window per domain */
u32 (*domain_get_windows)(struct iommu_domain *domain);
+ struct iommu_domain *(*get_dev_iommu_domain)(struct device *dev);
unsigned long pgsize_bitmap;
};
@@ -190,6 +191,7 @@ extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
phys_addr_t offset, u64 size,
int prot);
extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr);
+extern struct iommu_domain *iommu_get_dev_domain(struct device *dev);
/**
* report_iommu_fault() - report about an IOMMU fault to the IOMMU framework
* @domain: the iommu domain where the fault has happened
@@ -284,6 +286,11 @@ static inline void iommu_domain_window_disable(struct iommu_domain *domain,
{
}
+static inline struct iommu_domain *iommu_get_dev_domain(struct device *dev)
+{
+ return NULL;
+}
+
static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
{
return 0;
--
1.7.0.4
^ permalink raw reply related
* [PATCH 1/7] powerpc: Add interface to get msi region information
From: Bharat Bhushan @ 2013-09-19 7:29 UTC (permalink / raw)
To: alex.williamson, joro, benh, galak, linux-kernel, linuxppc-dev,
linux-pci, agraf, scottwood, iommu
Cc: Bharat Bhushan
In-Reply-To: <1379575763-2091-1-git-send-email-Bharat.Bhushan@freescale.com>
This patch adds interface to get following information
- Number of MSI regions (which is number of MSI banks for powerpc).
- Get the region address range: Physical page which have the
address/addresses used for generating MSI interrupt
and size of the page.
These are required to create IOMMU (Freescale PAMU) mapping for
devices which are directly assigned using VFIO.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
arch/powerpc/include/asm/machdep.h | 8 +++++++
arch/powerpc/include/asm/pci.h | 2 +
arch/powerpc/kernel/msi.c | 18 ++++++++++++++++
arch/powerpc/sysdev/fsl_msi.c | 39 +++++++++++++++++++++++++++++++++--
arch/powerpc/sysdev/fsl_msi.h | 11 ++++++++-
drivers/pci/msi.c | 26 ++++++++++++++++++++++++
include/linux/msi.h | 8 +++++++
include/linux/pci.h | 13 ++++++++++++
8 files changed, 120 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 8b48090..8d1b787 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -30,6 +30,7 @@ struct file;
struct pci_controller;
struct kimage;
struct pci_host_bridge;
+struct msi_region;
struct machdep_calls {
char *name;
@@ -124,6 +125,13 @@ struct machdep_calls {
int (*setup_msi_irqs)(struct pci_dev *dev,
int nvec, int type);
void (*teardown_msi_irqs)(struct pci_dev *dev);
+
+ /* Returns the number of MSI regions (banks) */
+ int (*msi_get_region_count)(void);
+
+ /* Returns the requested region's address and size */
+ int (*msi_get_region)(int region_num,
+ struct msi_region *region);
#endif
void (*restart)(char *cmd);
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 6653f27..e575349 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -117,6 +117,8 @@ extern int pci_proc_domain(struct pci_bus *bus);
#define arch_setup_msi_irqs arch_setup_msi_irqs
#define arch_teardown_msi_irqs arch_teardown_msi_irqs
#define arch_msi_check_device arch_msi_check_device
+#define arch_msi_get_region_count arch_msi_get_region_count
+#define arch_msi_get_region arch_msi_get_region
struct vm_area_struct;
/* Map a range of PCI memory or I/O space for a device into user space */
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..1a67787 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -13,6 +13,24 @@
#include <asm/machdep.h>
+int arch_msi_get_region_count(void)
+{
+ if (ppc_md.msi_get_region_count) {
+ pr_debug("msi: Using platform get_region_count routine.\n");
+ return ppc_md.msi_get_region_count();
+ }
+ return 0;
+}
+
+int arch_msi_get_region(int region_num, struct msi_region *region)
+{
+ if (ppc_md.msi_get_region) {
+ pr_debug("msi: Using platform get_region routine.\n");
+ return ppc_md.msi_get_region(region_num, region);
+ }
+ return 0;
+}
+
int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
{
if (!ppc_md.setup_msi_irqs || !ppc_md.teardown_msi_irqs) {
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index ab02db3..ed045cb 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -96,6 +96,34 @@ static int fsl_msi_init_allocator(struct fsl_msi *msi_data)
return 0;
}
+static int fsl_msi_get_region_count(void)
+{
+ int count = 0;
+ struct fsl_msi *msi_data;
+
+ list_for_each_entry(msi_data, &msi_head, list)
+ count++;
+
+ return count;
+}
+
+static int fsl_msi_get_region(int region_num, struct msi_region *region)
+{
+ struct fsl_msi *msi_data;
+
+ list_for_each_entry(msi_data, &msi_head, list) {
+ if (msi_data->bank_index == region_num) {
+ region->region_num = msi_data->bank_index;
+ /* Setting PAGE_SIZE as MSIIR is a 4 byte register */
+ region->size = PAGE_SIZE;
+ region->addr = msi_data->msiir & ~(region->size - 1);
+ return 0;
+ }
+ }
+
+ return -ENODEV;
+}
+
static int fsl_msi_check_device(struct pci_dev *pdev, int nvec, int type)
{
if (type == PCI_CAP_ID_MSIX)
@@ -137,7 +165,8 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
if (reg && (len == sizeof(u64)))
address = be64_to_cpup(reg);
else
- address = fsl_pci_immrbar_base(hose) + msi_data->msiir_offset;
+ address = fsl_pci_immrbar_base(hose) +
+ (msi_data->msiir & 0xfffff);
msg->address_lo = lower_32_bits(address);
msg->address_hi = upper_32_bits(address);
@@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
int len;
u32 offset;
static const u32 all_avail[] = { 0, NR_MSI_IRQS };
+ static int bank_index;
match = of_match_device(fsl_of_msi_ids, &dev->dev);
if (!match)
@@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
dev->dev.of_node->full_name);
goto error_out;
}
- msi->msiir_offset =
- features->msiir_offset + (res.start & 0xfffff);
+ msi->msiir = res.start + features->msiir_offset;
+ printk("msi->msiir = %llx\n", msi->msiir);
}
msi->feature = features->fsl_pic_ip;
@@ -470,6 +500,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
}
}
+ msi->bank_index = bank_index++;
list_add_tail(&msi->list, &msi_head);
/* The multiple setting ppc_md.setup_msi_irqs will not harm things */
@@ -477,6 +508,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
ppc_md.setup_msi_irqs = fsl_setup_msi_irqs;
ppc_md.teardown_msi_irqs = fsl_teardown_msi_irqs;
ppc_md.msi_check_device = fsl_msi_check_device;
+ ppc_md.msi_get_region_count = fsl_msi_get_region_count;
+ ppc_md.msi_get_region = fsl_msi_get_region;
} else if (ppc_md.setup_msi_irqs != fsl_setup_msi_irqs) {
dev_err(&dev->dev, "Different MSI driver already installed!\n");
err = -ENODEV;
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index 8225f86..6bd5cfc 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -29,12 +29,19 @@ struct fsl_msi {
struct irq_domain *irqhost;
unsigned long cascade_irq;
-
- u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
+ dma_addr_t msiir; /* MSIIR Address in CCSR */
void __iomem *msi_regs;
u32 feature;
int msi_virqs[NR_MSI_REG];
+ /*
+ * During probe each bank is assigned a index number.
+ * index number ranges from 0 to 2^32.
+ * Example MSI bank 1 = 0
+ * MSI bank 2 = 1, and so on.
+ */
+ int bank_index;
+
struct msi_bitmap bitmap;
struct list_head list; /* support multiple MSI banks */
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index aca7578..6d85c15 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -30,6 +30,20 @@ static int pci_msi_enable = 1;
/* Arch hooks */
+#ifndef arch_msi_get_region_count
+int arch_msi_get_region_count(void)
+{
+ return 0;
+}
+#endif
+
+#ifndef arch_msi_get_region
+int arch_msi_get_region(int region_num, struct msi_region *region)
+{
+ return 0;
+}
+#endif
+
#ifndef arch_msi_check_device
int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
{
@@ -903,6 +917,18 @@ void pci_disable_msi(struct pci_dev *dev)
}
EXPORT_SYMBOL(pci_disable_msi);
+int msi_get_region_count(void)
+{
+ return arch_msi_get_region_count();
+}
+EXPORT_SYMBOL(msi_get_region_count);
+
+int msi_get_region(int region_num, struct msi_region *region)
+{
+ return arch_msi_get_region(region_num, region);
+}
+EXPORT_SYMBOL(msi_get_region);
+
/**
* pci_msix_table_size - return the number of device's MSI-X table entries
* @dev: pointer to the pci_dev data structure of MSI-X device function
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ee66f3a..ae32601 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -50,6 +50,12 @@ struct msi_desc {
struct kobject kobj;
};
+struct msi_region {
+ int region_num;
+ dma_addr_t addr;
+ size_t size;
+};
+
/*
* The arch hook for setup up msi irqs
*/
@@ -58,5 +64,7 @@ void arch_teardown_msi_irq(unsigned int irq);
int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
void arch_teardown_msi_irqs(struct pci_dev *dev);
int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
+int arch_msi_get_region_count(void);
+int arch_msi_get_region(int region_num, struct msi_region *region);
#endif /* LINUX_MSI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 186540d..2b26a59 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1126,6 +1126,7 @@ struct msix_entry {
u16 entry; /* driver uses to specify entry, OS writes */
};
+struct msi_region;
#ifndef CONFIG_PCI_MSI
static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
@@ -1168,6 +1169,16 @@ static inline int pci_msi_enabled(void)
{
return 0;
}
+
+static inline int msi_get_region_count(void)
+{
+ return 0;
+}
+
+static inline int msi_get_region(int region_num, struct msi_region *region)
+{
+ return 0;
+}
#else
int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
@@ -1180,6 +1191,8 @@ void pci_disable_msix(struct pci_dev *dev);
void msi_remove_pci_irq_vectors(struct pci_dev *dev);
void pci_restore_msi_state(struct pci_dev *dev);
int pci_msi_enabled(void);
+int msi_get_region_count(void);
+int msi_get_region(int region_num, struct msi_region *region);
#endif
#ifdef CONFIG_PCIEPORTBUS
--
1.7.0.4
^ permalink raw reply related
* [PATCH 6/6 v5] kvm: powerpc: use caching attributes as per linux pte
From: Bharat Bhushan @ 2013-09-19 6:02 UTC (permalink / raw)
To: benh, agraf, paulus, kvm, kvm-ppc, linuxppc-dev, scottwood; +Cc: Bharat Bhushan
In-Reply-To: <1379570566-3715-1-git-send-email-Bharat.Bhushan@freescale.com>
KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v4->v5
- No change
arch/powerpc/include/asm/kvm_host.h | 2 +-
arch/powerpc/kvm/booke.c | 2 +-
arch/powerpc/kvm/e500.h | 8 ++++--
arch/powerpc/kvm/e500_mmu_host.c | 38 ++++++++++++++++++++--------------
4 files changed, 29 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9741bf0..775f0e8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -538,6 +538,7 @@ struct kvm_vcpu_arch {
#endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
+ pgd_t *pgdir;
u8 io_gpr; /* GPR used as IO source/target */
u8 mmio_is_bigendian;
@@ -595,7 +596,6 @@ struct kvm_vcpu_arch {
struct list_head run_list;
struct task_struct *run_task;
struct kvm_run *kvm_run;
- pgd_t *pgdir;
spinlock_t vpa_update_lock;
struct kvmppc_vpa vpa;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 17722d8..4171c7d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -695,7 +695,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
#endif
-
+ vcpu->arch.pgdir = current->mm->pgd;
kvmppc_fix_ee_before_entry();
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 4fd9650..a326178 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -31,11 +31,13 @@ enum vcpu_ftr {
#define E500_TLB_NUM 2
/* entry is mapped somewhere in host TLB */
-#define E500_TLB_VALID (1 << 0)
+#define E500_TLB_VALID (1 << 31)
/* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
-#define E500_TLB_BITMAP (1 << 1)
+#define E500_TLB_BITMAP (1 << 30)
/* TLB1 entry is mapped by host TLB0 */
-#define E500_TLB_TLB0 (1 << 2)
+#define E500_TLB_TLB0 (1 << 29)
+/* bits [6-5] MAS2_X1 and MAS2_X0 and [4-0] bits for WIMGE */
+#define E500_TLB_MAS2_ATTR (0x7f)
struct tlbe_ref {
pfn_t pfn; /* valid only for TLB0, except briefly */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 60f5a3c..654c368 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
return mas3;
}
-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
-{
-#ifdef CONFIG_SMP
- return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
- return mas2 & MAS2_ATTRIB_MASK;
-#endif
-}
-
/*
* writing shadow tlb entry to host TLB
*/
@@ -250,10 +241,12 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
struct kvm_book3e_206_tlb_entry *gtlbe,
- pfn_t pfn)
+ pfn_t pfn, unsigned int wimg)
{
ref->pfn = pfn;
ref->flags = E500_TLB_VALID;
+ /* Use guest supplied MAS2_G and MAS2_E */
+ ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
if (tlbe_is_writable(gtlbe))
kvm_set_pfn_dirty(pfn);
@@ -314,8 +307,7 @@ static void kvmppc_e500_setup_stlbe(
/* Force IPROT=0 for all guest mappings. */
stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
- stlbe->mas2 = (gvaddr & MAS2_EPN) |
- e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+ stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_MAS2_ATTR);
stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
@@ -334,6 +326,10 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
unsigned long hva;
int pfnmap = 0;
int tsize = BOOK3E_PAGESZ_4K;
+ unsigned long tsize_pages = 0;
+ pte_t *ptep;
+ unsigned int wimg = 0;
+ pgd_t *pgdir;
/*
* Translate guest physical to true physical, acquiring
@@ -396,7 +392,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
*/
for (; tsize > BOOK3E_PAGESZ_4K; tsize -= 2) {
- unsigned long gfn_start, gfn_end, tsize_pages;
+ unsigned long gfn_start, gfn_end;
tsize_pages = 1 << (tsize - 2);
gfn_start = gfn & ~(tsize_pages - 1);
@@ -438,9 +434,10 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
}
if (likely(!pfnmap)) {
- unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
+ tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
+
pfn = gfn_to_pfn_memslot(slot, gfn);
- if (is_error_noslot_pfn(pfn)) {
+ if (is_error_noslot_pfn(pfn) && printk_ratelimit()) {
printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
(long)gfn);
return -EINVAL;
@@ -451,7 +448,16 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
}
- kvmppc_e500_ref_setup(ref, gtlbe, pfn);
+ pgdir = vcpu_e500->vcpu.arch.pgdir;
+ ptep = lookup_linux_pte(pgdir, hva, &tsize_pages);
+ if (pte_present(*ptep)) {
+ wimg = (pte_val(*ptep) >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
+ } else if (printk_ratelimit()) {
+ printk(KERN_ERR "%s: pte not present: gfn %lx, pfn %lx\n",
+ __func__, (long)gfn, pfn);
+ return -EINVAL;
+ }
+ kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
--
1.7.0.4
^ permalink raw reply related
* [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation
From: Bharat Bhushan @ 2013-09-19 6:02 UTC (permalink / raw)
To: benh, agraf, paulus, kvm, kvm-ppc, linuxppc-dev, scottwood; +Cc: Bharat Bhushan
In-Reply-To: <1379570566-3715-1-git-send-email-Bharat.Bhushan@freescale.com>
On booke, "struct tlbe_ref" contains host tlb mapping information
(pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
for a guest tlb entry. So when a guest creates a TLB entry then
"struct tlbe_ref" is set to point to valid "pfn" and set attributes in
"flags" field of the above said structure. When a guest TLB entry is
invalidated then flags field of corresponding "struct tlbe_ref" is
updated to point that this is no more valid, also we selectively clear
some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.
Ideally we should clear complete "flags" as this entry is invalid and does not
have anything to re-used. The other part of the problem is that when we use
the same entry again then also we do not clear (started doing or-ing etc).
So far it was working because the selectively clearing mentioned above
actually clears "flags" what was set during TLB mapping. But the problem
starts coming when we add more attributes to this then we need to selectively
clear them and which is not needed.
This patch we do both
- Clear "flags" when invalidating;
- Clear "flags" when reusing same entry later
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v3-> v5
- New patch (found this issue when doing vfio-pci development)
arch/powerpc/kvm/e500_mmu_host.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..60f5a3c 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -217,7 +217,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
}
mb();
vcpu_e500->g2h_tlb1_map[esel] = 0;
- ref->flags &= ~(E500_TLB_BITMAP | E500_TLB_VALID);
+ /* Clear flags as TLB is not backed by the host anymore */
+ ref->flags = 0;
local_irq_restore(flags);
}
@@ -227,7 +228,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
* rarely and is not worth optimizing. Invalidate everything.
*/
kvmppc_e500_tlbil_all(vcpu_e500);
- ref->flags &= ~(E500_TLB_TLB0 | E500_TLB_VALID);
+ /* Clear flags as TLB is not backed by the host anymore */
+ ref->flags = 0;
}
/* Already invalidated in between */
@@ -237,8 +239,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel,
/* Guest tlbe is backed by at most one host tlbe per shadow pid. */
kvmppc_e500_tlbil_one(vcpu_e500, gtlbe);
- /* Mark the TLB as not backed by the host anymore */
- ref->flags &= ~E500_TLB_VALID;
+ /* Clear flags as TLB is not backed by the host anymore */
+ ref->flags = 0;
}
static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
@@ -251,7 +253,7 @@ static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
pfn_t pfn)
{
ref->pfn = pfn;
- ref->flags |= E500_TLB_VALID;
+ ref->flags = E500_TLB_VALID;
if (tlbe_is_writable(gtlbe))
kvm_set_pfn_dirty(pfn);
--
1.7.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox