* [PATCH 1/7] powerpc/52xx: define FSL_SOC
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
mpc52xx socs need to have FSL_SOC defined to build their drivers (i2c-mpc, ASoC)
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/arch/powerpc/platforms/52xx/Kconfig b/arch/powerpc/platforms/52xx/Kconfig
index 90f4496..fb35944 100644
--- a/arch/powerpc/platforms/52xx/Kconfig
+++ b/arch/powerpc/platforms/52xx/Kconfig
@@ -1,6 +1,7 @@
config PPC_MPC52xx
bool "52xx-based boards"
depends on 6xx
+ select FSL_SOC
select PPC_CLOCK
select PPC_PCI_CHOICE
--
1.7.2.5
^ permalink raw reply related
* [PATCH 3/7] ASoC: fsl: mpc5200 add missing information to snd_soc_dai_driver
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
Add missing dai_driver information to avoid these runtime errors
[ 16.433788] asoc: error - multiple DAI f0002c00.i2s registered with no name
[ 16.453551] Failed to register DAI
[ 16.461222] mpc5200-psc-i2s: probe of f0002c00.i2s failed with error -22
[ 16.475242] asoc: error - multiple DAI f0002000.ac97 registered with no name
[ 16.488087] mpc5200-psc-ac97 f0002000.ac97: Failed to register DAI
[ 16.502222] mpc5200-psc-ac97: probe of f0002000.ac97 failed with error -22
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/mpc5200_psc_ac97.c b/sound/soc/fsl/mpc5200_psc_ac97.c
index 9a09453..a313c0a 100644
--- a/sound/soc/fsl/mpc5200_psc_ac97.c
+++ b/sound/soc/fsl/mpc5200_psc_ac97.c
@@ -237,15 +237,18 @@ static const struct snd_soc_dai_ops psc_ac97_digital_ops = {
static struct snd_soc_dai_driver psc_ac97_dai[] = {
{
+ .name = "mpc5200-psc-ac97.0",
.ac97_control = 1,
.probe = psc_ac97_probe,
.playback = {
+ .stream_name = "AC97 Playback",
.channels_min = 1,
.channels_max = 6,
.rates = SNDRV_PCM_RATE_8000_48000,
.formats = SNDRV_PCM_FMTBIT_S32_BE,
},
.capture = {
+ .stream_name = "AC97 Capture",
.channels_min = 1,
.channels_max = 2,
.rates = SNDRV_PCM_RATE_8000_48000,
@@ -254,8 +257,10 @@ static struct snd_soc_dai_driver psc_ac97_dai[] = {
.ops = &psc_ac97_analog_ops,
},
{
+ .name = "mpc5200-psc-ac97.1",
.ac97_control = 1,
.playback = {
+ .stream_name = "AC97 SPDIF",
.channels_min = 1,
.channels_max = 2,
.rates = SNDRV_PCM_RATE_32000 | \
diff --git a/sound/soc/fsl/mpc5200_psc_i2s.c b/sound/soc/fsl/mpc5200_psc_i2s.c
index c0b7a23..ba1f0a6 100644
--- a/sound/soc/fsl/mpc5200_psc_i2s.c
+++ b/sound/soc/fsl/mpc5200_psc_i2s.c
@@ -130,13 +130,16 @@ static const struct snd_soc_dai_ops psc_i2s_dai_ops = {
};
static struct snd_soc_dai_driver psc_i2s_dai[] = {{
+ .name = "mpc5200-psc-i2s.0",
.playback = {
+ .stream_name = "I2S Playback",
.channels_min = 2,
.channels_max = 2,
.rates = PSC_I2S_RATES,
.formats = PSC_I2S_FORMATS,
},
.capture = {
+ .stream_name = "I2S Capture",
.channels_min = 2,
.channels_max = 2,
.rates = PSC_I2S_RATES,
--
1.7.2.5
^ permalink raw reply related
* [RFC 0/7] mpc5200 ASoC and pcm030 board fixes
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
This series is a respin of "mpc5200 ASoC fixups"
The mpc5200 ASoC and pcm030 board code compiled, but did not run after the
multi-codec patches. This series add the missing pieces into the mpc5200 ASoC
drivers and updates the pcm030 board to the current api.
Eric Millbrandt (7):
powerpc/52xx: define FSL_SOC
ASoC: fsl: mpc5200 combine psc_dma platform data
ASoC: fsl: mpc5200 add missing information to snd_soc_dai_driver
ASoC: fsl: cleanup headers in pcm030-audio-fabric
ASoC: fsl: convert pcm030-audio-fabric to a platform-driver
ASoC: fsl: convert pcm030-audio-fabric to use snd_soc_register_card
ASoC: fsl: register the wm9712-codec
arch/powerpc/platforms/52xx/Kconfig | 1 +
sound/soc/fsl/mpc5200_dma.c | 24 ++-------
sound/soc/fsl/mpc5200_dma.h | 3 +
sound/soc/fsl/mpc5200_psc_ac97.c | 10 ++++
sound/soc/fsl/mpc5200_psc_i2s.c | 8 +++
sound/soc/fsl/pcm030-audio-fabric.c | 100 +++++++++++++++++++++++++---------
6 files changed, 99 insertions(+), 47 deletions(-)
--
1.7.2.5
^ permalink raw reply
* [PATCH 4/7] ASoC: fsl: cleanup headers in pcm030-audio-fabric
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
Remove unreferenced header files.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/pcm030-audio-fabric.c b/sound/soc/fsl/pcm030-audio-fabric.c
index b3af55d..1353e8f 100644
--- a/sound/soc/fsl/pcm030-audio-fabric.c
+++ b/sound/soc/fsl/pcm030-audio-fabric.c
@@ -12,22 +12,13 @@
#include <linux/init.h>
#include <linux/module.h>
-#include <linux/interrupt.h>
#include <linux/device.h>
-#include <linux/delay.h>
#include <linux/of_device.h>
#include <linux/of_platform.h>
-#include <linux/dma-mapping.h>
-#include <sound/core.h>
-#include <sound/pcm.h>
-#include <sound/pcm_params.h>
-#include <sound/initval.h>
#include <sound/soc.h>
#include "mpc5200_dma.h"
-#include "mpc5200_psc_ac97.h"
-#include "../codecs/wm9712.h"
#define DRV_NAME "pcm030-audio-fabric"
--
1.7.2.5
^ permalink raw reply related
* [PATCH 2/7] ASoC: fsl: mpc5200 combine psc_dma platform data
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
The mpc5200_psc_ac97 and mpc5200_psc_i2s modules rely on shared platform data
with mpc5200_dma.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/mpc5200_dma.c b/sound/soc/fsl/mpc5200_dma.c
index 9a3f7c5..9997c03 100644
--- a/sound/soc/fsl/mpc5200_dma.c
+++ b/sound/soc/fsl/mpc5200_dma.c
@@ -370,7 +370,7 @@ static struct snd_soc_platform_driver mpc5200_audio_dma_platform = {
.pcm_free = &psc_dma_free,
};
-static int mpc5200_hpcd_probe(struct platform_device *op)
+int mpc5200_audio_dma_create(struct platform_device *op)
{
phys_addr_t fifo;
struct psc_dma *psc_dma;
@@ -487,8 +487,9 @@ out_unmap:
iounmap(regs);
return ret;
}
+EXPORT_SYMBOL_GPL(mpc5200_audio_dma_create);
-static int mpc5200_hpcd_remove(struct platform_device *op)
+int mpc5200_audio_dma_destroy(struct platform_device *op)
{
struct psc_dma *psc_dma = dev_get_drvdata(&op->dev);
@@ -510,24 +511,7 @@ static int mpc5200_hpcd_remove(struct platform_device *op)
return 0;
}
-
-static struct of_device_id mpc5200_hpcd_match[] = {
- { .compatible = "fsl,mpc5200-pcm", },
- {}
-};
-MODULE_DEVICE_TABLE(of, mpc5200_hpcd_match);
-
-static struct platform_driver mpc5200_hpcd_of_driver = {
- .probe = mpc5200_hpcd_probe,
- .remove = mpc5200_hpcd_remove,
- .driver = {
- .owner = THIS_MODULE,
- .name = "mpc5200-pcm-audio",
- .of_match_table = mpc5200_hpcd_match,
- }
-};
-
-module_platform_driver(mpc5200_hpcd_of_driver);
+EXPORT_SYMBOL_GPL(mpc5200_audio_dma_destroy);
MODULE_AUTHOR("Grant Likely <grant.likely@secretlab.ca>");
MODULE_DESCRIPTION("Freescale MPC5200 PSC in DMA mode ASoC Driver");
diff --git a/sound/soc/fsl/mpc5200_dma.h b/sound/soc/fsl/mpc5200_dma.h
index a3c0cd5..dff253f 100644
--- a/sound/soc/fsl/mpc5200_dma.h
+++ b/sound/soc/fsl/mpc5200_dma.h
@@ -81,4 +81,7 @@ to_psc_dma_stream(struct snd_pcm_substream *substream, struct psc_dma *psc_dma)
return &psc_dma->playback;
}
+int mpc5200_audio_dma_create(struct platform_device *op);
+int mpc5200_audio_dma_destroy(struct platform_device *op);
+
#endif /* __SOUND_SOC_FSL_MPC5200_DMA_H__ */
diff --git a/sound/soc/fsl/mpc5200_psc_ac97.c b/sound/soc/fsl/mpc5200_psc_ac97.c
index ffa00a2..9a09453 100644
--- a/sound/soc/fsl/mpc5200_psc_ac97.c
+++ b/sound/soc/fsl/mpc5200_psc_ac97.c
@@ -278,6 +278,10 @@ static int __devinit psc_ac97_of_probe(struct platform_device *op)
struct snd_ac97 ac97;
struct mpc52xx_psc __iomem *regs;
+ rc = mpc5200_audio_dma_create(op);
+ if (rc != 0)
+ return rc;
+
rc = snd_soc_register_dais(&op->dev, psc_ac97_dai, ARRAY_SIZE(psc_ac97_dai));
if (rc != 0) {
dev_err(&op->dev, "Failed to register DAI\n");
@@ -303,6 +307,7 @@ static int __devinit psc_ac97_of_probe(struct platform_device *op)
static int __devexit psc_ac97_of_remove(struct platform_device *op)
{
+ mpc5200_audio_dma_destroy(op);
snd_soc_unregister_dais(&op->dev, ARRAY_SIZE(psc_ac97_dai));
return 0;
}
diff --git a/sound/soc/fsl/mpc5200_psc_i2s.c b/sound/soc/fsl/mpc5200_psc_i2s.c
index 7b53032..c0b7a23 100644
--- a/sound/soc/fsl/mpc5200_psc_i2s.c
+++ b/sound/soc/fsl/mpc5200_psc_i2s.c
@@ -156,6 +156,10 @@ static int __devinit psc_i2s_of_probe(struct platform_device *op)
struct psc_dma *psc_dma;
struct mpc52xx_psc __iomem *regs;
+ rc = mpc5200_audio_dma_create(op);
+ if (rc != 0)
+ return rc;
+
rc = snd_soc_register_dais(&op->dev, psc_i2s_dai, ARRAY_SIZE(psc_i2s_dai));
if (rc != 0) {
pr_err("Failed to register DAI\n");
@@ -200,6 +204,7 @@ static int __devinit psc_i2s_of_probe(struct platform_device *op)
static int __devexit psc_i2s_of_remove(struct platform_device *op)
{
+ mpc5200_audio_dma_destroy(op);
snd_soc_unregister_dais(&op->dev, ARRAY_SIZE(psc_i2s_dai));
return 0;
}
--
1.7.2.5
^ permalink raw reply related
* [PATCH 7/7] ASoC: fsl: register the wm9712-codec
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
The mpc5200-psc-ac97 driver does not enumerate attached ac97 devices, so
register the device here.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/pcm030-audio-fabric.c b/sound/soc/fsl/pcm030-audio-fabric.c
index 893e240..4b63ec8 100644
--- a/sound/soc/fsl/pcm030-audio-fabric.c
+++ b/sound/soc/fsl/pcm030-audio-fabric.c
@@ -22,6 +22,11 @@
#define DRV_NAME "pcm030-audio-fabric"
+struct pcm030_audio_data {
+ struct snd_soc_card *card;
+ struct platform_device *codec_device;
+};
+
static struct snd_soc_dai_link pcm030_fabric_dai[] = {
{
.name = "AC97",
@@ -51,14 +56,22 @@ static int __init pcm030_fabric_probe(struct platform_device *op)
struct device_node *np = op->dev.of_node;
struct device_node *platform_np;
struct snd_soc_card *card = &pcm030_card;
+ struct pcm030_audio_data *pdata;
int ret;
int i;
if (!of_machine_is_compatible("phytec,pcm030"))
return -ENODEV;
+ pdata = devm_kzalloc(&op->dev, sizeof(struct pcm030_audio_data),
+ GFP_KERNEL);
+ if (!pdata)
+ return -ENOMEM;
+
card->dev = &op->dev;
- platform_set_drvdata(op, card);
+ platform_set_drvdata(op, pdata);
+
+ pdata->card = card;
platform_np = of_parse_phandle(np, "asoc-platform", 0);
if (!platform_np) {
@@ -69,6 +82,18 @@ static int __init pcm030_fabric_probe(struct platform_device *op)
for (i = 0; i < card->num_links; i++)
card->dai_link[i].platform_of_node = platform_np;
+ ret = request_module("snd-soc-wm9712");
+ if (ret)
+ dev_err(&op->dev, "request_module returned: %d\n", ret);
+
+ pdata->codec_device = platform_device_alloc("wm9712-codec", -1);
+ if (!pdata->codec_device)
+ dev_err(&op->dev, "platform_device_alloc() failed\n");
+
+ ret = platform_device_add(pdata->codec_device);
+ if (ret)
+ dev_err(&op->dev, "platform_device_add() failed: %d\n", ret);
+
ret = snd_soc_register_card(card);
if (ret)
dev_err(&op->dev, "snd_soc_register_card() failed: %d\n", ret);
@@ -78,10 +103,11 @@ static int __init pcm030_fabric_probe(struct platform_device *op)
static int __devexit pcm030_fabric_remove(struct platform_device *op)
{
- struct snd_soc_card *card = platform_get_drvdata(op);
+ struct pcm030_audio_data *pdata = platform_get_drvdata(op);
int ret;
- ret = snd_soc_unregister_card(card);
+ ret = snd_soc_unregister_card(pdata->card);
+ platform_device_unregister(pdata->codec_device);
return ret;
}
--
1.7.2.5
^ permalink raw reply related
* [PATCH 5/7] ASoC: fsl: convert pcm030-audio-fabric to a platform-driver
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
This patch converts the pcm030-audio-fabric driver to a platform-driver and
adds a remove function.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/pcm030-audio-fabric.c b/sound/soc/fsl/pcm030-audio-fabric.c
index 1353e8f..5c8e2d6 100644
--- a/sound/soc/fsl/pcm030-audio-fabric.c
+++ b/sound/soc/fsl/pcm030-audio-fabric.c
@@ -48,7 +48,7 @@ static struct snd_soc_card card = {
.num_links = ARRAY_SIZE(pcm030_fabric_dai),
};
-static __init int pcm030_fabric_init(void)
+static int __init pcm030_fabric_probe(struct platform_device *op)
{
struct platform_device *pdev;
int rc;
@@ -62,6 +62,7 @@ static __init int pcm030_fabric_init(void)
return -ENODEV;
}
+ platform_set_drvdata(op, pdev);
platform_set_drvdata(pdev, &card);
rc = platform_device_add(pdev);
@@ -73,7 +74,32 @@ static __init int pcm030_fabric_init(void)
return 0;
}
-module_init(pcm030_fabric_init);
+static int __devexit pcm030_fabric_remove(struct platform_device *op)
+{
+ struct platform_device *pdev = platform_get_drvdata(op);
+
+ platform_device_unregister(pdev);
+
+ return 0;
+}
+
+static struct of_device_id pcm030_audio_match[] = {
+ { .compatible = "phytec,pcm030-audio-fabric", },
+ {}
+};
+MODULE_DEVICE_TABLE(of, pcm030_audio_match);
+
+static struct platform_driver pcm030_fabric_driver = {
+ .probe = pcm030_fabric_probe,
+ .remove = __devexit_p(pcm030_fabric_remove),
+ .driver = {
+ .name = DRV_NAME,
+ .owner = THIS_MODULE,
+ .of_match_table = pcm030_audio_match,
+ },
+};
+
+module_platform_driver(pcm030_fabric_driver);
MODULE_AUTHOR("Jon Smirl <jonsmirl@gmail.com>");
--
1.7.2.5
^ permalink raw reply related
* [PATCH 6/7] ASoC: fsl: convert pcm030-audio-fabric to use snd_soc_register_card
From: Eric Millbrandt @ 2012-09-13 21:43 UTC (permalink / raw)
To: Grant Likely, Liam Girdwood, Mark Brown, Anatolij Gustschin
Cc: alsa-devel, linuxppc-dev, Eric Millbrandt
In-Reply-To: <1347572596-37540-1-git-send-email-emillbrandt@dekaresearch.com>
This patch converts pcm030-audio-fabric to use the new snd_soc_register_card
ASoC api instead of the older method of registering a separate platform
device. It also creates the dai_link to the mpc5200_psc_ac97 platform using
the device tree.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
diff --git a/sound/soc/fsl/pcm030-audio-fabric.c b/sound/soc/fsl/pcm030-audio-fabric.c
index 5c8e2d6..893e240 100644
--- a/sound/soc/fsl/pcm030-audio-fabric.c
+++ b/sound/soc/fsl/pcm030-audio-fabric.c
@@ -28,7 +28,6 @@ static struct snd_soc_dai_link pcm030_fabric_dai[] = {
.stream_name = "AC97 Analog",
.codec_dai_name = "wm9712-hifi",
.cpu_dai_name = "mpc5200-psc-ac97.0",
- .platform_name = "mpc5200-pcm-audio",
.codec_name = "wm9712-codec",
},
{
@@ -36,12 +35,11 @@ static struct snd_soc_dai_link pcm030_fabric_dai[] = {
.stream_name = "AC97 IEC958",
.codec_dai_name = "wm9712-aux",
.cpu_dai_name = "mpc5200-psc-ac97.1",
- .platform_name = "mpc5200-pcm-audio",
.codec_name = "wm9712-codec",
},
};
-static struct snd_soc_card card = {
+static struct snd_soc_card pcm030_card = {
.name = "pcm030",
.owner = THIS_MODULE,
.dai_link = pcm030_fabric_dai,
@@ -50,37 +48,42 @@ static struct snd_soc_card card = {
static int __init pcm030_fabric_probe(struct platform_device *op)
{
- struct platform_device *pdev;
- int rc;
+ struct device_node *np = op->dev.of_node;
+ struct device_node *platform_np;
+ struct snd_soc_card *card = &pcm030_card;
+ int ret;
+ int i;
if (!of_machine_is_compatible("phytec,pcm030"))
return -ENODEV;
- pdev = platform_device_alloc("soc-audio", 1);
- if (!pdev) {
- pr_err("pcm030_fabric_init: platform_device_alloc() failed\n");
+ card->dev = &op->dev;
+ platform_set_drvdata(op, card);
+
+ platform_np = of_parse_phandle(np, "asoc-platform", 0);
+ if (!platform_np) {
+ dev_err(&op->dev, "ac97 not registered\n");
return -ENODEV;
}
- platform_set_drvdata(op, pdev);
- platform_set_drvdata(pdev, &card);
+ for (i = 0; i < card->num_links; i++)
+ card->dai_link[i].platform_of_node = platform_np;
- rc = platform_device_add(pdev);
- if (rc) {
- pr_err("pcm030_fabric_init: platform_device_add() failed\n");
- platform_device_put(pdev);
- return -ENODEV;
- }
- return 0;
+ ret = snd_soc_register_card(card);
+ if (ret)
+ dev_err(&op->dev, "snd_soc_register_card() failed: %d\n", ret);
+
+ return ret;
}
static int __devexit pcm030_fabric_remove(struct platform_device *op)
{
- struct platform_device *pdev = platform_get_drvdata(op);
+ struct snd_soc_card *card = platform_get_drvdata(op);
+ int ret;
- platform_device_unregister(pdev);
+ ret = snd_soc_unregister_card(card);
- return 0;
+ return ret;
}
static struct of_device_id pcm030_audio_match[] = {
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH] powerpc/smp: Do not disable IPI interrupts during suspend
From: Benjamin Herrenschmidt @ 2012-09-13 22:11 UTC (permalink / raw)
To: Kumar Gala
Cc: linuxppc-dev@lists.ozlabs.org list, Zhao Chenhui,
linux-kernel@vger.kernel.org list
In-Reply-To: <287921FF-96B1-4BBC-B9F9-3F8D24244061@kernel.crashing.org>
On Thu, 2012-09-13 at 13:16 -0500, Kumar Gala wrote:
> >> Yes, we disabled all non-boot CPUs on suspend by calling
> disable_nonboot_cpus().
> >> The disable_nonboot_cpus() needs IPIs to work. But prior to
> >> calling disable_nonboot_cpus(), the IPIs are disabled in
> dpm_suspend_noirq().
Sure, no biggie on this one, forgot to ack it, just stick my ack in and
put it in your tree.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH] vfio: enabled and supported on power (v7)
From: Alex Williamson @ 2012-09-13 22:34 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <504EF638.3090507@ozlabs.ru>
On Tue, 2012-09-11 at 18:28 +1000, Alexey Kardashevskiy wrote:
> On 11/09/12 02:02, Alex Williamson wrote:
> > On Tue, 2012-09-04 at 17:33 +1000, Alexey Kardashevskiy wrote:
> >> Cc: David Gibson <david@gibson.dropbear.id.au>
> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >> Cc: Paul Mackerras <paulus@samba.org>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >
> > Please at least cc kvm@vger as well since we list that as the devel list
> > for vfio.
> >
> >> arch/powerpc/include/asm/iommu.h | 3 +
> >
> > I'll need an ack from Ben or Paul for this change.
> >
> >> drivers/iommu/Kconfig | 8 +
> >> drivers/vfio/Kconfig | 6 +
> >> drivers/vfio/Makefile | 1 +
> >> drivers/vfio/vfio_iommu_spapr_tce.c | 440 +++++++++++++++++++++++++++++++++++
> >> include/linux/vfio.h | 29 +++
> >> 6 files changed, 487 insertions(+)
> >> create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> >>
> >> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> >> index 957a83f..c64bce7 100644
> >> --- a/arch/powerpc/include/asm/iommu.h
> >> +++ b/arch/powerpc/include/asm/iommu.h
> >> @@ -66,6 +66,9 @@ struct iommu_table {
> >> unsigned long it_halfpoint; /* Breaking point for small/large allocs */
> >> spinlock_t it_lock; /* Protects it_map */
> >> unsigned long *it_map; /* A simple allocation bitmap for now */
> >> +#ifdef CONFIG_IOMMU_API
> >> + struct iommu_group *it_group;
> >> +#endif
> >> };
> >
> > This seems to only be valid when vfio_iommu_spapr_tce is loaded, which
> > is a bit misleading.
> >
> >>
> >> struct scatterlist;
> >> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> >> index 3bd9fff..19cf2d9 100644
> >> --- a/drivers/iommu/Kconfig
> >> +++ b/drivers/iommu/Kconfig
> >> @@ -162,4 +162,12 @@ config TEGRA_IOMMU_SMMU
> >> space through the SMMU (System Memory Management Unit)
> >> hardware included on Tegra SoCs.
> >>
> >> +config SPAPR_TCE_IOMMU
> >> + bool "sPAPR TCE IOMMU Support"
> >> + depends on PPC_PSERIES
> >> + select IOMMU_API
> >> + help
> >> + Enables bits of IOMMU API required by VFIO. The iommu_ops is
> >> + still not implemented.
> >> +
> >> endif # IOMMU_SUPPORT
> >> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> >> index 7cd5dec..b464687 100644
> >> --- a/drivers/vfio/Kconfig
> >> +++ b/drivers/vfio/Kconfig
> >> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
> >> depends on VFIO
> >> default n
> >>
> >> +config VFIO_IOMMU_SPAPR_TCE
> >> + tristate
> >> + depends on VFIO && SPAPR_TCE_IOMMU
> >> + default n
> >> +
> >> menuconfig VFIO
> >> tristate "VFIO Non-Privileged userspace driver framework"
> >> depends on IOMMU_API
> >> select VFIO_IOMMU_TYPE1 if X86
> >> + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
> >> help
> >> VFIO provides a framework for secure userspace device drivers.
> >> See Documentation/vfio.txt for more details.
> >> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> >> index 2398d4a..72bfabc 100644
> >> --- a/drivers/vfio/Makefile
> >> +++ b/drivers/vfio/Makefile
> >> @@ -1,3 +1,4 @@
> >> obj-$(CONFIG_VFIO) += vfio.o
> >> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >> obj-$(CONFIG_VFIO_PCI) += pci/
> >> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> new file mode 100644
> >> index 0000000..21f1909
> >> --- /dev/null
> >> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >> @@ -0,0 +1,440 @@
> >> +/*
> >> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> >> + *
> >> + * Copyright (C) 2012 IBM Corp. All rights reserved.
> >> + * Author: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * Derived from original vfio_iommu_x86.c:
> >
> > Should this be _type1? Only the mail archives are going to remember
> > there was a _x86, so the renamed version is probably a better reference.
> >
> >> + * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
> >> + * Author: Alex Williamson <alex.williamson@redhat.com>
> >> + */
> >> +
> >> +#include <linux/module.h>
> >> +#include <linux/pci.h>
> >> +#include <linux/slab.h>
> >> +#include <linux/uaccess.h>
> >> +#include <linux/err.h>
> >> +#include <linux/vfio.h>
> >> +#include <linux/spinlock.h>
> >> +#include <asm/iommu.h>
> >> +
> >> +#define DRIVER_VERSION "0.1"
> >> +#define DRIVER_AUTHOR "aik@ozlabs.ru"
> >> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
> >> +
> >> +
> >> +/*
> >> + * SPAPR TCE API
> >> + */
> >> +static void tce_free(struct iommu_table *tbl, unsigned long entry,
> >> + unsigned long tce)
> >> +{
> >> + struct page *page = pfn_to_page(tce >> PAGE_SHIFT);
> >> +
> >> + WARN_ON(!page);
> >> + if (page) {
> >> + if (tce & VFIO_SPAPR_TCE_WRITE)
> >> + SetPageDirty(page);
> >> + put_page(page);
> >> + }
> >> + ppc_md.tce_free(tbl, entry, 1);
> >> +}
> >> +
> >> +static long tce_put(struct iommu_table *tbl,
> >> + unsigned long entry, uint64_t tce, uint32_t flags)
> >> +{
> >> + int ret;
> >> + unsigned long oldtce, kva, offset;
> >> + struct page *page = NULL;
> >> + enum dma_data_direction direction = DMA_NONE;
> >> +
> >> + switch (flags & VFIO_SPAPR_TCE_PUT_MASK) {
> >> + case VFIO_SPAPR_TCE_READ:
> >> + direction = DMA_TO_DEVICE;
> >> + break;
> >> + case VFIO_SPAPR_TCE_WRITE:
> >> + direction = DMA_FROM_DEVICE;
> >> + break;
> >> + case VFIO_SPAPR_TCE_BIDIRECTIONAL:
> >> + direction = DMA_BIDIRECTIONAL;
> >> + break;
> >> + }
> >> +
> >> + oldtce = ppc_md.tce_get(tbl, entry);
> >> +
> >> + /* Free page if still allocated */
> >> + if (oldtce & VFIO_SPAPR_TCE_PUT_MASK)
> >> + tce_free(tbl, entry, oldtce);
> >> +
> >> + /* Map new TCE */
> >> + if (direction != DMA_NONE) {
> >> + offset = (tce & IOMMU_PAGE_MASK) - (tce & PAGE_MASK);
> >> + ret = get_user_pages_fast(tce & PAGE_MASK, 1,
> >> + direction != DMA_TO_DEVICE, &page);
> >> + BUG_ON(ret > 1);
> >
> > Can this happen?
> >
> >> + if (ret < 1) {
> >> + printk(KERN_ERR "tce_vfio: get_user_pages_fast failed "
> >> + "tce=%llx ioba=%lx ret=%d\n",
> >> + tce, entry << IOMMU_PAGE_SHIFT, ret);
> >> + if (!ret)
> >> + ret = -EFAULT;
> >> + goto unlock_exit;
> >> + }
> >> +
> >> + kva = (unsigned long) page_address(page);
> >> + kva += offset;
> >> + BUG_ON(!kva);
> >
> > Same here, can it happen? If so, should it BUG or catch the below
> > EINVAL?
> >
> >> + if (WARN_ON(kva & ~IOMMU_PAGE_MASK))
> >> + return -EINVAL;
> >
> > Page leak? Don't we want to do a put_page(), which means we probably
> > want a goto exit here.
> >
> >> +
> >> + /* Preserve access bits */
> >> + kva |= flags & VFIO_SPAPR_TCE_PUT_MASK;
> >> +
> >> + /* tce_build receives a virtual address */
> >> + entry += tbl->it_offset; /* Offset into real TCE table */
> >> + ret = ppc_md.tce_build(tbl, entry, 1, kva, direction, NULL);
> >> +
> >> + /* tce_build() only returns non-zero for transient errors */
> >> + if (unlikely(ret)) {
> >> + printk(KERN_ERR "tce_vfio: Failed to add TCE\n");
> >> + ret = -EIO;
> >> + goto unlock_exit;
> >> + }
> >> + }
> >> + /* Flush/invalidate TLB caches if necessary */
> >> + if (ppc_md.tce_flush)
> >> + ppc_md.tce_flush(tbl);
> >> +
> >> + /* Make sure updates are seen by hardware */
> >> + mb();
> >> +
> >> +unlock_exit:
> >
> > unlock seems wrong here, I had to go re-read the code looking for the
> > lock.
> >
> >> + if (ret && page)
> >> + put_page(page);
> >> +
> >> + if (ret)
> >> + printk(KERN_ERR "tce_vfio: tce_put failed on tce=%llx "
> >> + "ioba=%lx kva=%lx\n", tce,
> >> + entry << IOMMU_PAGE_SHIFT, kva);
> >> + return ret;
> >> +}
> >> +
> >> +/*
> >> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> >> + */
> >> +
> >> +/*
> >> + * The container descriptor supports only a single group per container.
> >> + * Required by the API as the container is not supplied with the IOMMU group
> >> + * at the moment of initialization.
> >> + */
> >> +struct tce_container {
> >> + struct iommu_table *tbl;
> >> +};
> >> +
> >> +static void *tce_iommu_open(unsigned long arg)
> >> +{
> >> + struct tce_container *container;
> >> +
> >> + if (arg != VFIO_SPAPR_TCE_IOMMU) {
> >> + printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
> >> + return ERR_PTR(-EINVAL);
> >> + }
> >> +
> >> + container = kzalloc(sizeof(*container), GFP_KERNEL);
> >> + if (!container)
> >> + return ERR_PTR(-ENOMEM);
> >> +
> >> + return container;
> >> +}
> >> +
> >> +static void tce_iommu_release(void *iommu_data)
> >> +{
> >> + struct tce_container *container = iommu_data;
> >> + struct iommu_table *tbl = container->tbl;
> >> + unsigned long i, tce;
> >> +
> >
> > This will segfault if releasing a container that never had an a device
> > attached.
> >
> >> + /* Unmap leftovers */
> >> + spin_lock_irq(&tbl->it_lock);
> >> + for (i = tbl->it_offset; i < tbl->it_offset + tbl->it_size; ++i) {
> >> + tce = ppc_md.tce_get(tbl, i);
> >> + if (tce & VFIO_SPAPR_TCE_PUT_MASK)
> >> + tce_free(tbl, i, tce);
> >> + }
> >> + /* Flush/invalidate TLB caches if necessary */
> >> + if (ppc_md.tce_flush)
> >> + ppc_md.tce_flush(tbl);
> >> +
> >> + /* Make sure updates are seen by hardware */
> >> + mb();
> >> +
> >> + spin_unlock_irq(&tbl->it_lock);
> >> +
> >> + kfree(container);
> >> +}
> >> +
> >> +static long tce_iommu_ioctl(void *iommu_data,
> >> + unsigned int cmd, unsigned long arg)
> >> +{
> >> + struct tce_container *container = iommu_data;
> >> + unsigned long minsz;
> >> + long ret;
> >> +
> >> + switch (cmd) {
> >> + case VFIO_CHECK_EXTENSION: {
> >> + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
> >> + }
> >> + case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
> >> + struct vfio_iommu_spapr_tce_info info;
> >> + struct iommu_table *tbl = container->tbl;
> >> +
> >> + minsz = offsetofend(struct vfio_iommu_spapr_tce_info,
> >> + dma64_window_size);
> >> +
> >> + if (copy_from_user(&info, (void __user *)arg, minsz))
> >> + return -EFAULT;
> >> +
> >> + if (info.argsz < minsz)
> >> + return -EINVAL;
> >> +
> >> + if (!tbl)
> >> + return -ENXIO;
> >
> > nit: why not check this earlier?
> >
> >> +
> >> + info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT;
> >> + info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT;
> >> + info.dma64_window_start = 0;
> >> + info.dma64_window_size = 0;
> >> + info.flags = 0;
> >> +
> >> + return copy_to_user((void __user *)arg, &info, minsz);
> >> + }
> >> + case VFIO_IOMMU_SPAPR_TCE_PUT: {
> >> + struct vfio_iommu_spapr_tce_put par;
> >> + struct iommu_table *tbl = container->tbl;
> >> +
> >> + minsz = offsetofend(struct vfio_iommu_spapr_tce_put, tce);
> >> +
> >> + if (copy_from_user(&par, (void __user *)arg, minsz))
> >> + return -EFAULT;
> >> +
> >> + if (par.argsz < minsz)
> >> + return -EINVAL;
> >> +
> >> + if (!tbl) {
> >> + return -ENXIO;
> >> + }
> >
> > Same, plus drop the braces.
> >
> >> +
> >> + spin_lock_irq(&tbl->it_lock);
> >> + ret = tce_put(tbl, par.ioba >> IOMMU_PAGE_SHIFT,
> >> + par.tce, par.flags);
> >> + spin_unlock_irq(&tbl->it_lock);
> >> +
> >> + return ret;
> >> + }
> >
> > Is "PUT" really the name we want for this?
>
>
> Yes, it is a single H_PUT_TCE hypercall from POWER architecture spec.
Ok, if it makes sense on your arch, I won't complain (too much) about
it.
> >> + default:
> >> + printk(KERN_WARNING "tce_vfio: unexpected cmd %x\n", cmd);
> >> + }
> >> +
> >> + return -ENOTTY;
> >> +}
> >> +
> >> +static int tce_iommu_attach_group(void *iommu_data,
> >> + struct iommu_group *iommu_group)
> >> +{
> >> + struct tce_container *container = iommu_data;
> >> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> >> +
> >> + printk(KERN_DEBUG "tce_vfio: Attaching group #%u to iommu %p\n",
> >> + iommu_group_id(iommu_group), iommu_group);
> >
> > Let's use pr_debug() and friends throughout.
> >
> >> + if (container->tbl) {
> >> + printk(KERN_WARNING "tce_vfio: Only one group per IOMMU "
> >> + "container is allowed, "
> >> + "existing id=%d, attaching id=%d\n",
> >> + iommu_group_id(container->tbl->it_group),
> >> + iommu_group_id(iommu_group));
> >> + return -EBUSY;
> >> + }
> >> +
> >
> > _type1 has a lock to avoid races here, I think you might need one too.
> >
> >> + container->tbl = tbl;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static void tce_iommu_detach_group(void *iommu_data,
> >> + struct iommu_group *iommu_group)
> >> +{
> >> + struct tce_container *container = iommu_data;
> >> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> >> +
> >> + BUG_ON(!tbl);
> >
> > Needed? If so, why is there no check on attach?
>
> Added to attach() :)
>
>
> >
> >> + if (tbl != container->tbl) {
> >> + printk(KERN_WARNING "tce_vfio: detaching group #%u, expected "
> >> + "group is #%u\n", iommu_group_id(iommu_group),
> >> + iommu_group_id(tbl->it_group));
> >> + return;
> >> + }
> >> + printk(KERN_DEBUG "tce_vfio: detaching group #%u from iommu %p\n",
> >> + iommu_group_id(iommu_group), iommu_group);
> >
> > container->tbl = NULL?
>
>
> Then I won't be able to release pages in tce_iommu_release().
> Releasing pages in tce_iommu_detach_group() caused some other problems,
> cannot recall now which ones.
What happens if you hot unplug a group from one VM and add it to
another? ie. we've detached it from one container and add it to another
in a different instance. Does it cause problems here?
> >> +}
> >> +
> >> +const struct vfio_iommu_driver_ops tce_iommu_driver_ops = {
> >> + .name = "iommu-vfio-powerpc",
> >> + .owner = THIS_MODULE,
> >> + .open = tce_iommu_open,
> >> + .release = tce_iommu_release,
> >> + .ioctl = tce_iommu_ioctl,
> >> + .attach_group = tce_iommu_attach_group,
> >> + .detach_group = tce_iommu_detach_group,
> >> +};
> >> +
> >> +/*
> >> + * Add/delete devices support (hotplug, module_init, module_exit)
> >> + */
> >> +static int add_device(struct device *dev)
> >> +{
> >> + struct iommu_table *tbl;
> >> + int ret = 0;
> >> +
> >> + if (dev->iommu_group) {
> >> + printk(KERN_WARNING "tce_vfio: device %s is already in iommu "
> >> + "group %d, skipping\n", dev->kobj.name,
> >
> > Watch line wrapping on strings.
>
> Pardon?
Just suggesting that you try to wrap lines so that strings are
searchable. For instance, can I search cscope for "is already in iommu
group". It's generally accepted that printks can break 80 cols for
this.
> >> + iommu_group_id(dev->iommu_group));
> >> + return -EBUSY;
> >> + }
> >> +
> >> + tbl = get_iommu_table_base(dev);
> >> + if (!tbl) {
> >> + printk(KERN_DEBUG "tce_vfio: skipping device %s with no tbl\n",
> >> + dev->kobj.name);
> >> + return 0;
> >> + }
> >> +
> >> + printk(KERN_DEBUG "tce_vfio: adding %s to iommu group %d\n",
> >> + dev->kobj.name, iommu_group_id(tbl->it_group));
> >> +
> >> + ret = iommu_group_add_device(tbl->it_group, dev);
> >> + if (ret < 0)
> >> + printk(KERN_ERR "tce_vfio: %s has not been added, ret=%d\n",
> >> + dev->kobj.name, ret);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static void del_device(struct device *dev)
> >> +{
> >> + iommu_group_remove_device(dev);
> >> +}
> >> +
> >> +static int iommu_bus_notifier(struct notifier_block *nb,
> >> + unsigned long action, void *data)
> >> +{
> >> + struct device *dev = data;
> >> +
> >> + switch (action) {
> >> + case BUS_NOTIFY_ADD_DEVICE:
> >> + return add_device(dev);
> >> + case BUS_NOTIFY_DEL_DEVICE:
> >> + del_device(dev);
> >> + return 0;
> >> + default:
> >> + return 0;
> >> + }
> >> +}
> >> +
> >> +static struct notifier_block tce_iommu_bus_nb = {
> >> + .notifier_call = iommu_bus_notifier,
> >> +};
> >> +
> >> +void group_release(void *iommu_data)
> >> +{
> >> + struct iommu_table *tbl = iommu_data;
> >> + tbl->it_group = NULL;
> >> +}
> >> +
> >> +static int __init tce_iommu_init(void)
> >> +{
> >> + struct pci_dev *pdev = NULL;
> >> + struct iommu_table *tbl;
> >> + struct iommu_group *grp;
> >> +
> >> + /* If the current platform does not support tce_get
> >> + we are unable to clean TCE table properly and
> >> + therefore it is better not to touch it at all */
> >> + if (!ppc_md.tce_get) {
> >> + printk(KERN_ERR "tce_vfio: ppc_md.tce_get isn't implemented\n");
> >> + return -EOPNOTSUPP;
> >> + }
> >> +
> >> + bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> >> +
> >> + /* Allocate and initialize VFIO groups */
> >
> > s/VFIO groups/IOMMU groups/
> >
> >> + for_each_pci_dev(pdev) {
> >> + tbl = get_iommu_table_base(&pdev->dev);
> >> + if (!tbl)
> >> + continue;
> >> +
> >> + /* Skip already initialized */
> >> + if (tbl->it_group)
> >> + continue;
> >> +
> >> + grp = iommu_group_alloc();
> >> + if (IS_ERR(grp)) {
> >> + printk(KERN_INFO "tce_vfio: cannot create "
> >> + "new IOMMU group, ret=%ld\n",
> >> + PTR_ERR(grp));
> >> + return -EFAULT;
> >> + }
> >> + tbl->it_group = grp;
> >> + iommu_group_set_iommudata(grp, tbl, group_release);
> >> + }
> >> +
> >> + /* Add PCI devices to VFIO groups */
> >> + for_each_pci_dev(pdev)
> >> + add_device(&pdev->dev);
> >> +
> >> + return vfio_register_iommu_driver(&tce_iommu_driver_ops);
> >> +}
> >> +
> >> +static void __exit tce_iommu_cleanup(void)
> >> +{
> >> + struct pci_dev *pdev = NULL;
> >> + struct iommu_table *tbl;
> >> + struct iommu_group *grp = NULL;
> >> +
> >> + bus_unregister_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> >> +
> >> + /* Delete PCI devices from VFIO groups */
> >> + for_each_pci_dev(pdev)
> >> + del_device(&pdev->dev);
> >> +
> >> + /* Release VFIO groups */
> >> + for_each_pci_dev(pdev) {
> >> + tbl = get_iommu_table_base(&pdev->dev);
> >> + if (!tbl)
> >> + continue;
> >> + grp = tbl->it_group;
> >> +
> >> + /* Skip (already) uninitialized */
> >> + if (!grp)
> >> + continue;
> >> +
> >> + /* Do actual release, group_release() is expected to work */
> >> + iommu_group_put(grp);
> >> + BUG_ON(tbl->it_group);
> >> + }
> >> +
> >
> >
> > It troubles me a bit that you're using the vfio driver to initialize and
> > tear down IOMMU groups on your platform.
>
>
> I am not following you here. Could you please explain a bit?
IOMMU groups are theoretically not just for VFIO. They expose DMA
dependencies between devices for anyone who cares to know about it.
VFIO happens to care very much about that, but is hopefully not the only
consumer. So it's a little bit like writing a driver for a device on a
new bus and incorporating the bus topology handling code into the device
driver. IOMMU groups should be created and managed independent of VFIO.
> > VFIO makes use of IOMMU groups
> > and is the only user so far, but they're hopefully useful beyond this.
> > In fact, VFIO used to manage assembling all groups from data provided by
> > the IOMMU but David wanted to see IOMMU groups be a more universally
> > available feature, so it's odd to see POWER implementing it this way.
>
>
> David, help! :)
>
>
> >> + vfio_unregister_iommu_driver(&tce_iommu_driver_ops);
> >> +}
> >> +
> >> +module_init(tce_iommu_init);
> >> +module_exit(tce_iommu_cleanup);
> >> +
> >> +MODULE_VERSION(DRIVER_VERSION);
> >> +MODULE_LICENSE("GPL v2");
> >> +MODULE_AUTHOR(DRIVER_AUTHOR);
> >> +MODULE_DESCRIPTION(DRIVER_DESC);
> >> +
> >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >> index 0a4f180..2c0a927 100644
> >> --- a/include/linux/vfio.h
> >> +++ b/include/linux/vfio.h
> >> @@ -99,6 +99,7 @@ extern void vfio_unregister_iommu_driver(
> >> /* Extensions */
> >>
> >> #define VFIO_TYPE1_IOMMU 1
> >> +#define VFIO_SPAPR_TCE_IOMMU 2
> >>
> >> /*
> >> * The IOCTL interface is designed for extensibility by embedding the
> >> @@ -442,4 +443,32 @@ struct vfio_iommu_type1_dma_unmap {
> >>
> >> #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
> >>
> >> +/* -------- API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >> +
> >> +struct vfio_iommu_spapr_tce_info {
> >> + __u32 argsz;
> >> + __u32 flags;
> >> + __u32 dma32_window_start;
> >> + __u32 dma32_window_size;
> >> + __u64 dma64_window_start;
> >> + __u64 dma64_window_size;
> >> +};
> >> +
> >> +#define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >> +
> >> +struct vfio_iommu_spapr_tce_put {
> >> + __u32 argsz;
> >> + __u32 flags;
> >> +#define VFIO_SPAPR_TCE_READ 1
> >> +#define VFIO_SPAPR_TCE_WRITE 2
> >> +#define VFIO_SPAPR_TCE_BIDIRECTIONAL (VFIO_SPAPR_TCE_READ|VFIO_SPAPR_TCE_WRITE)
> >> +#define VFIO_SPAPR_TCE_PUT_MASK VFIO_SPAPR_TCE_BIDIRECTIONAL
> >> + __u64 ioba;
> >> + __u64 tce;
> >> +};
> >
> > Ok, so if READ & WRITE are both clear and ioba is set, that's an
> > "unmap"? This is exactly why _type1 has a MAP and UNMAP, to make it
> > clear which fields are necessary for which call. I think we should
> > probably do the same here. Besides, _put makes me think there should be
> > a _get; do these have some unique meaning in POWER?
>
>
> It is a single H_PUT_TCE for putting a record into TCE table. The guest
> calls H_PUT_TCE, QEMU replaces the address and simply forwards the call to
> the host. Calling them map/unmap makes it less clear for powerpc people :)
In the unmap case we take an ioba and lookup a tce to clear, in the map
case we take an ioba and tce and insert them into the table. It's valid
to document this and use a single ioctl, but I've opted on x86 to have
separate ioctls because the documentation falls out cleaner when there
aren't fields that are only used in certain conditions. Do you really
want any userspace driver making use of this to know about powerpc
H_PUT_TCE or would it make more sense to have a MAP and UNMAP call? I
think it would be better for the VFIO API if we had some consistency in
the mapping ioctls where possible.
> >
> >> +
> >> +#define VFIO_IOMMU_SPAPR_TCE_PUT _IO(VFIO_TYPE, VFIO_BASE + 13)
> >> +
> >
> > Please document what all of the above means. Thanks,
>
>
> Something like this?
> /*
> * The VFIO_IOMMU_SPAPR_TCE_PUT is implemented as the H_PUT_TCE hypercall.
> * ioba - I/O Bus Address for indexing into TCE table
> * tce - logical address of storage
> *
> * The non-zero flags means adding new page into the table.
> * The zero flags means releasing the existing page and clearing the
> * TCE table entry.
> */
Do you only want VFIO drivers to work on POWER if they're written by
POWER people? Ideally there are a few simple concepts: a) devices have
an I/O virtual address space. On x86 we call this the iova and it's
effectively a zero-based, 64bit (not really, but close enough) address
space. You seem to have two smaller windows, one in 32bit space,
another in 64bit space (maybe we could name these more consistently).
b) Userspace has a buffer that they want to map and unmap to an iova,
potentially with some access flags. That's all you need to know to use
the x86 _type1 VFIO IOMMU API. Why do I need to know about H_PUT_TCE to
use this interface? Let's assume there might be some VFIO drivers some
day that aren't written by POWER people. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH] vfio: enabled and supported on power (v7)
From: Scott Wood @ 2012-09-13 22:41 UTC (permalink / raw)
To: Alex Williamson
Cc: Alexey Kardashevskiy, linuxppc-dev, Paul Mackerras, David Gibson
In-Reply-To: <1347575699.24938.344.camel@ul30vt.home>
On Thu, Sep 13, 2012 at 04:34:59PM -0600, Alex Williamson wrote:
> Do you only want VFIO drivers to work on POWER if they're written by
> POWER people? Ideally there are a few simple concepts: a) devices have
> an I/O virtual address space. On x86 we call this the iova and it's
> effectively a zero-based, 64bit (not really, but close enough) address
> space. You seem to have two smaller windows, one in 32bit space,
> another in 64bit space (maybe we could name these more consistently).
> b) Userspace has a buffer that they want to map and unmap to an iova,
> potentially with some access flags. That's all you need to know to use
> the x86 _type1 VFIO IOMMU API. Why do I need to know about H_PUT_TCE to
> use this interface? Let's assume there might be some VFIO drivers some
> day that aren't written by POWER people. Thanks,
I'm not familiar with the POWER IOMMU, but certainly with our chips it
would help allow generic drivers to work if there were a type of mapping
operation where the IOMMU driver decides the IOVA and returns it, instead
of the driver trying to choose the IOVA with no knowledge of the IOMMU's
constraints.
-Scott
^ permalink raw reply
* Re: [PATCH] vfio: enabled and supported on power (v7)
From: Alex Williamson @ 2012-09-13 22:55 UTC (permalink / raw)
To: Scott Wood
Cc: Alexey Kardashevskiy, linuxppc-dev, Paul Mackerras, David Gibson
In-Reply-To: <20120913224127.GA31437@buserror.net>
On Thu, 2012-09-13 at 17:41 -0500, Scott Wood wrote:
> On Thu, Sep 13, 2012 at 04:34:59PM -0600, Alex Williamson wrote:
> > Do you only want VFIO drivers to work on POWER if they're written by
> > POWER people? Ideally there are a few simple concepts: a) devices have
> > an I/O virtual address space. On x86 we call this the iova and it's
> > effectively a zero-based, 64bit (not really, but close enough) address
> > space. You seem to have two smaller windows, one in 32bit space,
> > another in 64bit space (maybe we could name these more consistently).
> > b) Userspace has a buffer that they want to map and unmap to an iova,
> > potentially with some access flags. That's all you need to know to use
> > the x86 _type1 VFIO IOMMU API. Why do I need to know about H_PUT_TCE to
> > use this interface? Let's assume there might be some VFIO drivers some
> > day that aren't written by POWER people. Thanks,
>
> I'm not familiar with the POWER IOMMU, but certainly with our chips it
> would help allow generic drivers to work if there were a type of mapping
> operation where the IOMMU driver decides the IOVA and returns it, instead
> of the driver trying to choose the IOVA with no knowledge of the IOMMU's
> constraints.
That sounds reasonable to me. If we had IOMMU API support for that in
the kernel on x86, it would only take an ALLOC_IOVA flag in the MAP
ioctl, returning the iova in the mapping structure, and the addition of
a capability to let userspace know it's there. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH v4 0/8] Avoid cache trashing on clearing huge/gigantic page
From: Andrew Morton @ 2012-09-13 23:05 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrea Arcangeli, linux-mips, Andi Kleen, Alex Shi,
Robert Richter, linuxppc-dev, x86, Hugh Dickins, linux-kernel,
Jan Beulich, Andy Lutomirski, Johannes Weiner, linux-mm, linux-sh,
Ingo Molnar, Mel Gorman, H. Peter Anvin, sparclinux,
Thomas Gleixner, Tim Chen, KAMEZAWA Hiroyuki
In-Reply-To: <1345470757-12005-1-git-send-email-kirill.shutemov@linux.intel.com>
On Mon, 20 Aug 2012 16:52:29 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches. To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
>
> This patchset implements cache avoiding version of clear_page only for
> x86. If an architecture wants to provide cache avoiding version of
> clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
> clear_page_nocache() and clear_user_highpage_nocache().
Patchset looks nice to me, but the changelogs are terribly short of
performance measurements. For this sort of change I do think it is
important that pretty exhaustive testing be performed, and that the
results (or a readable summary of them) be shown. And that testing
should be designed to probe for slowdowns, not just the speedups!
^ permalink raw reply
* Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe
From: Kim Phillips @ 2012-09-14 0:04 UTC (permalink / raw)
To: Shaohui Xie; +Cc: akpm, avorontsov, linuxppc-dev, linux-kernel, linux-edac
In-Reply-To: <1347533729-5893-1-git-send-email-Shaohui.Xie@freescale.com>
On Thu, 13 Sep 2012 18:55:29 +0800
Shaohui Xie <Shaohui.Xie@freescale.com> wrote:
> Error handle in case of DDR ECC off is wrong, sysfs entries have not been
> created, so edac_mc_free which frees a mci instance should not be called.
> Also, free mci's memory in this case.
>
> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
> ---
this fixes a 'WARNING: at lib/kobject.c:593' on a p2020, so:
Tested-by: Kim Phillips <kim.phillips@freescale.com>
Kim
^ permalink raw reply
* Re: [PATCH] vfio: enabled and supported on power (v7)
From: Alexey Kardashevskiy @ 2012-09-14 0:51 UTC (permalink / raw)
To: Alex Williamson; +Cc: Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <1347575699.24938.344.camel@ul30vt.home>
On 14/09/12 08:34, Alex Williamson wrote:
> On Tue, 2012-09-11 at 18:28 +1000, Alexey Kardashevskiy wrote:
>> On 11/09/12 02:02, Alex Williamson wrote:
>>> On Tue, 2012-09-04 at 17:33 +1000, Alexey Kardashevskiy wrote:
>>>> Cc: David Gibson <david@gibson.dropbear.id.au>
>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>> Cc: Paul Mackerras <paulus@samba.org>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>
>>> Please at least cc kvm@vger as well since we list that as the devel list
>>> for vfio.
>>>
>>>> arch/powerpc/include/asm/iommu.h | 3 +
>>>
>>> I'll need an ack from Ben or Paul for this change.
>>>
>>>> drivers/iommu/Kconfig | 8 +
>>>> drivers/vfio/Kconfig | 6 +
>>>> drivers/vfio/Makefile | 1 +
>>>> drivers/vfio/vfio_iommu_spapr_tce.c | 440 +++++++++++++++++++++++++++++++++++
>>>> include/linux/vfio.h | 29 +++
>>>> 6 files changed, 487 insertions(+)
>>>> create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
>>>>
>>>> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
>>>> index 957a83f..c64bce7 100644
>>>> --- a/arch/powerpc/include/asm/iommu.h
>>>> +++ b/arch/powerpc/include/asm/iommu.h
>>>> @@ -66,6 +66,9 @@ struct iommu_table {
>>>> unsigned long it_halfpoint; /* Breaking point for small/large allocs */
>>>> spinlock_t it_lock; /* Protects it_map */
>>>> unsigned long *it_map; /* A simple allocation bitmap for now */
>>>> +#ifdef CONFIG_IOMMU_API
>>>> + struct iommu_group *it_group;
>>>> +#endif
>>>> };
>>>
>>> This seems to only be valid when vfio_iommu_spapr_tce is loaded, which
>>> is a bit misleading.
>>>
>>>>
>>>> struct scatterlist;
>>>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>>>> index 3bd9fff..19cf2d9 100644
>>>> --- a/drivers/iommu/Kconfig
>>>> +++ b/drivers/iommu/Kconfig
>>>> @@ -162,4 +162,12 @@ config TEGRA_IOMMU_SMMU
>>>> space through the SMMU (System Memory Management Unit)
>>>> hardware included on Tegra SoCs.
>>>>
>>>> +config SPAPR_TCE_IOMMU
>>>> + bool "sPAPR TCE IOMMU Support"
>>>> + depends on PPC_PSERIES
>>>> + select IOMMU_API
>>>> + help
>>>> + Enables bits of IOMMU API required by VFIO. The iommu_ops is
>>>> + still not implemented.
>>>> +
>>>> endif # IOMMU_SUPPORT
>>>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>>>> index 7cd5dec..b464687 100644
>>>> --- a/drivers/vfio/Kconfig
>>>> +++ b/drivers/vfio/Kconfig
>>>> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
>>>> depends on VFIO
>>>> default n
>>>>
>>>> +config VFIO_IOMMU_SPAPR_TCE
>>>> + tristate
>>>> + depends on VFIO && SPAPR_TCE_IOMMU
>>>> + default n
>>>> +
>>>> menuconfig VFIO
>>>> tristate "VFIO Non-Privileged userspace driver framework"
>>>> depends on IOMMU_API
>>>> select VFIO_IOMMU_TYPE1 if X86
>>>> + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
>>>> help
>>>> VFIO provides a framework for secure userspace device drivers.
>>>> See Documentation/vfio.txt for more details.
>>>> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
>>>> index 2398d4a..72bfabc 100644
>>>> --- a/drivers/vfio/Makefile
>>>> +++ b/drivers/vfio/Makefile
>>>> @@ -1,3 +1,4 @@
>>>> obj-$(CONFIG_VFIO) += vfio.o
>>>> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>>>> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
>>>> obj-$(CONFIG_VFIO_PCI) += pci/
>>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>>>> new file mode 100644
>>>> index 0000000..21f1909
>>>> --- /dev/null
>>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>>> @@ -0,0 +1,440 @@
>>>> +/*
>>>> + * VFIO: IOMMU DMA mapping support for TCE on POWER
>>>> + *
>>>> + * Copyright (C) 2012 IBM Corp. All rights reserved.
>>>> + * Author: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * Derived from original vfio_iommu_x86.c:
>>>
>>> Should this be _type1? Only the mail archives are going to remember
>>> there was a _x86, so the renamed version is probably a better reference.
>>>
>>>> + * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
>>>> + * Author: Alex Williamson <alex.williamson@redhat.com>
>>>> + */
>>>> +
>>>> +#include <linux/module.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/uaccess.h>
>>>> +#include <linux/err.h>
>>>> +#include <linux/vfio.h>
>>>> +#include <linux/spinlock.h>
>>>> +#include <asm/iommu.h>
>>>> +
>>>> +#define DRIVER_VERSION "0.1"
>>>> +#define DRIVER_AUTHOR "aik@ozlabs.ru"
>>>> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
>>>> +
>>>> +
>>>> +/*
>>>> + * SPAPR TCE API
>>>> + */
>>>> +static void tce_free(struct iommu_table *tbl, unsigned long entry,
>>>> + unsigned long tce)
>>>> +{
>>>> + struct page *page = pfn_to_page(tce >> PAGE_SHIFT);
>>>> +
>>>> + WARN_ON(!page);
>>>> + if (page) {
>>>> + if (tce & VFIO_SPAPR_TCE_WRITE)
>>>> + SetPageDirty(page);
>>>> + put_page(page);
>>>> + }
>>>> + ppc_md.tce_free(tbl, entry, 1);
>>>> +}
>>>> +
>>>> +static long tce_put(struct iommu_table *tbl,
>>>> + unsigned long entry, uint64_t tce, uint32_t flags)
>>>> +{
>>>> + int ret;
>>>> + unsigned long oldtce, kva, offset;
>>>> + struct page *page = NULL;
>>>> + enum dma_data_direction direction = DMA_NONE;
>>>> +
>>>> + switch (flags & VFIO_SPAPR_TCE_PUT_MASK) {
>>>> + case VFIO_SPAPR_TCE_READ:
>>>> + direction = DMA_TO_DEVICE;
>>>> + break;
>>>> + case VFIO_SPAPR_TCE_WRITE:
>>>> + direction = DMA_FROM_DEVICE;
>>>> + break;
>>>> + case VFIO_SPAPR_TCE_BIDIRECTIONAL:
>>>> + direction = DMA_BIDIRECTIONAL;
>>>> + break;
>>>> + }
>>>> +
>>>> + oldtce = ppc_md.tce_get(tbl, entry);
>>>> +
>>>> + /* Free page if still allocated */
>>>> + if (oldtce & VFIO_SPAPR_TCE_PUT_MASK)
>>>> + tce_free(tbl, entry, oldtce);
>>>> +
>>>> + /* Map new TCE */
>>>> + if (direction != DMA_NONE) {
>>>> + offset = (tce & IOMMU_PAGE_MASK) - (tce & PAGE_MASK);
>>>> + ret = get_user_pages_fast(tce & PAGE_MASK, 1,
>>>> + direction != DMA_TO_DEVICE, &page);
>>>> + BUG_ON(ret > 1);
>>>
>>> Can this happen?
>>>
>>>> + if (ret < 1) {
>>>> + printk(KERN_ERR "tce_vfio: get_user_pages_fast failed "
>>>> + "tce=%llx ioba=%lx ret=%d\n",
>>>> + tce, entry << IOMMU_PAGE_SHIFT, ret);
>>>> + if (!ret)
>>>> + ret = -EFAULT;
>>>> + goto unlock_exit;
>>>> + }
>>>> +
>>>> + kva = (unsigned long) page_address(page);
>>>> + kva += offset;
>>>> + BUG_ON(!kva);
>>>
>>> Same here, can it happen? If so, should it BUG or catch the below
>>> EINVAL?
>>>
>>>> + if (WARN_ON(kva & ~IOMMU_PAGE_MASK))
>>>> + return -EINVAL;
>>>
>>> Page leak? Don't we want to do a put_page(), which means we probably
>>> want a goto exit here.
>>>
>>>> +
>>>> + /* Preserve access bits */
>>>> + kva |= flags & VFIO_SPAPR_TCE_PUT_MASK;
>>>> +
>>>> + /* tce_build receives a virtual address */
>>>> + entry += tbl->it_offset; /* Offset into real TCE table */
>>>> + ret = ppc_md.tce_build(tbl, entry, 1, kva, direction, NULL);
>>>> +
>>>> + /* tce_build() only returns non-zero for transient errors */
>>>> + if (unlikely(ret)) {
>>>> + printk(KERN_ERR "tce_vfio: Failed to add TCE\n");
>>>> + ret = -EIO;
>>>> + goto unlock_exit;
>>>> + }
>>>> + }
>>>> + /* Flush/invalidate TLB caches if necessary */
>>>> + if (ppc_md.tce_flush)
>>>> + ppc_md.tce_flush(tbl);
>>>> +
>>>> + /* Make sure updates are seen by hardware */
>>>> + mb();
>>>> +
>>>> +unlock_exit:
>>>
>>> unlock seems wrong here, I had to go re-read the code looking for the
>>> lock.
>>>
>>>> + if (ret && page)
>>>> + put_page(page);
>>>> +
>>>> + if (ret)
>>>> + printk(KERN_ERR "tce_vfio: tce_put failed on tce=%llx "
>>>> + "ioba=%lx kva=%lx\n", tce,
>>>> + entry << IOMMU_PAGE_SHIFT, kva);
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/*
>>>> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
>>>> + */
>>>> +
>>>> +/*
>>>> + * The container descriptor supports only a single group per container.
>>>> + * Required by the API as the container is not supplied with the IOMMU group
>>>> + * at the moment of initialization.
>>>> + */
>>>> +struct tce_container {
>>>> + struct iommu_table *tbl;
>>>> +};
>>>> +
>>>> +static void *tce_iommu_open(unsigned long arg)
>>>> +{
>>>> + struct tce_container *container;
>>>> +
>>>> + if (arg != VFIO_SPAPR_TCE_IOMMU) {
>>>> + printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
>>>> + return ERR_PTR(-EINVAL);
>>>> + }
>>>> +
>>>> + container = kzalloc(sizeof(*container), GFP_KERNEL);
>>>> + if (!container)
>>>> + return ERR_PTR(-ENOMEM);
>>>> +
>>>> + return container;
>>>> +}
>>>> +
>>>> +static void tce_iommu_release(void *iommu_data)
>>>> +{
>>>> + struct tce_container *container = iommu_data;
>>>> + struct iommu_table *tbl = container->tbl;
>>>> + unsigned long i, tce;
>>>> +
>>>
>>> This will segfault if releasing a container that never had an a device
>>> attached.
>>>
>>>> + /* Unmap leftovers */
>>>> + spin_lock_irq(&tbl->it_lock);
>>>> + for (i = tbl->it_offset; i < tbl->it_offset + tbl->it_size; ++i) {
>>>> + tce = ppc_md.tce_get(tbl, i);
>>>> + if (tce & VFIO_SPAPR_TCE_PUT_MASK)
>>>> + tce_free(tbl, i, tce);
>>>> + }
>>>> + /* Flush/invalidate TLB caches if necessary */
>>>> + if (ppc_md.tce_flush)
>>>> + ppc_md.tce_flush(tbl);
>>>> +
>>>> + /* Make sure updates are seen by hardware */
>>>> + mb();
>>>> +
>>>> + spin_unlock_irq(&tbl->it_lock);
>>>> +
>>>> + kfree(container);
>>>> +}
>>>> +
>>>> +static long tce_iommu_ioctl(void *iommu_data,
>>>> + unsigned int cmd, unsigned long arg)
>>>> +{
>>>> + struct tce_container *container = iommu_data;
>>>> + unsigned long minsz;
>>>> + long ret;
>>>> +
>>>> + switch (cmd) {
>>>> + case VFIO_CHECK_EXTENSION: {
>>>> + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
>>>> + }
>>>> + case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
>>>> + struct vfio_iommu_spapr_tce_info info;
>>>> + struct iommu_table *tbl = container->tbl;
>>>> +
>>>> + minsz = offsetofend(struct vfio_iommu_spapr_tce_info,
>>>> + dma64_window_size);
>>>> +
>>>> + if (copy_from_user(&info, (void __user *)arg, minsz))
>>>> + return -EFAULT;
>>>> +
>>>> + if (info.argsz < minsz)
>>>> + return -EINVAL;
>>>> +
>>>> + if (!tbl)
>>>> + return -ENXIO;
>>>
>>> nit: why not check this earlier?
>>>
>>>> +
>>>> + info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT;
>>>> + info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT;
>>>> + info.dma64_window_start = 0;
>>>> + info.dma64_window_size = 0;
>>>> + info.flags = 0;
>>>> +
>>>> + return copy_to_user((void __user *)arg, &info, minsz);
>>>> + }
>>>> + case VFIO_IOMMU_SPAPR_TCE_PUT: {
>>>> + struct vfio_iommu_spapr_tce_put par;
>>>> + struct iommu_table *tbl = container->tbl;
>>>> +
>>>> + minsz = offsetofend(struct vfio_iommu_spapr_tce_put, tce);
>>>> +
>>>> + if (copy_from_user(&par, (void __user *)arg, minsz))
>>>> + return -EFAULT;
>>>> +
>>>> + if (par.argsz < minsz)
>>>> + return -EINVAL;
>>>> +
>>>> + if (!tbl) {
>>>> + return -ENXIO;
>>>> + }
>>>
>>> Same, plus drop the braces.
>>>
>>>> +
>>>> + spin_lock_irq(&tbl->it_lock);
>>>> + ret = tce_put(tbl, par.ioba >> IOMMU_PAGE_SHIFT,
>>>> + par.tce, par.flags);
>>>> + spin_unlock_irq(&tbl->it_lock);
>>>> +
>>>> + return ret;
>>>> + }
>>>
>>> Is "PUT" really the name we want for this?
>>
>>
>> Yes, it is a single H_PUT_TCE hypercall from POWER architecture spec.
>
> Ok, if it makes sense on your arch, I won't complain (too much) about
> it.
>
>>>> + default:
>>>> + printk(KERN_WARNING "tce_vfio: unexpected cmd %x\n", cmd);
>>>> + }
>>>> +
>>>> + return -ENOTTY;
>>>> +}
>>>> +
>>>> +static int tce_iommu_attach_group(void *iommu_data,
>>>> + struct iommu_group *iommu_group)
>>>> +{
>>>> + struct tce_container *container = iommu_data;
>>>> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
>>>> +
>>>> + printk(KERN_DEBUG "tce_vfio: Attaching group #%u to iommu %p\n",
>>>> + iommu_group_id(iommu_group), iommu_group);
>>>
>>> Let's use pr_debug() and friends throughout.
>>>
>>>> + if (container->tbl) {
>>>> + printk(KERN_WARNING "tce_vfio: Only one group per IOMMU "
>>>> + "container is allowed, "
>>>> + "existing id=%d, attaching id=%d\n",
>>>> + iommu_group_id(container->tbl->it_group),
>>>> + iommu_group_id(iommu_group));
>>>> + return -EBUSY;
>>>> + }
>>>> +
>>>
>>> _type1 has a lock to avoid races here, I think you might need one too.
>>>
>>>> + container->tbl = tbl;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void tce_iommu_detach_group(void *iommu_data,
>>>> + struct iommu_group *iommu_group)
>>>> +{
>>>> + struct tce_container *container = iommu_data;
>>>> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
>>>> +
>>>> + BUG_ON(!tbl);
>>>
>>> Needed? If so, why is there no check on attach?
>>
>> Added to attach() :)
>>
>>
>>>
>>>> + if (tbl != container->tbl) {
>>>> + printk(KERN_WARNING "tce_vfio: detaching group #%u, expected "
>>>> + "group is #%u\n", iommu_group_id(iommu_group),
>>>> + iommu_group_id(tbl->it_group));
>>>> + return;
>>>> + }
>>>> + printk(KERN_DEBUG "tce_vfio: detaching group #%u from iommu %p\n",
>>>> + iommu_group_id(iommu_group), iommu_group);
>>>
>>> container->tbl = NULL?
>>
>>
>> Then I won't be able to release pages in tce_iommu_release().
>> Releasing pages in tce_iommu_detach_group() caused some other problems,
>> cannot recall now which ones.
>
> What happens if you hot unplug a group from one VM and add it to
> another? ie. we've detached it from one container and add it to another
> in a different instance. Does it cause problems here?
Then the container will be released as just one group per container is
supported at the moment, no? Cannot check though as we do not support
hotplug yet.
>>>> +}
>>>> +
>>>> +const struct vfio_iommu_driver_ops tce_iommu_driver_ops = {
>>>> + .name = "iommu-vfio-powerpc",
>>>> + .owner = THIS_MODULE,
>>>> + .open = tce_iommu_open,
>>>> + .release = tce_iommu_release,
>>>> + .ioctl = tce_iommu_ioctl,
>>>> + .attach_group = tce_iommu_attach_group,
>>>> + .detach_group = tce_iommu_detach_group,
>>>> +};
>>>> +
>>>> +/*
>>>> + * Add/delete devices support (hotplug, module_init, module_exit)
>>>> + */
>>>> +static int add_device(struct device *dev)
>>>> +{
>>>> + struct iommu_table *tbl;
>>>> + int ret = 0;
>>>> +
>>>> + if (dev->iommu_group) {
>>>> + printk(KERN_WARNING "tce_vfio: device %s is already in iommu "
>>>> + "group %d, skipping\n", dev->kobj.name,
>>>
>>> Watch line wrapping on strings.
>>
>> Pardon?
>
> Just suggesting that you try to wrap lines so that strings are
> searchable. For instance, can I search cscope for "is already in iommu
> group". It's generally accepted that printks can break 80 cols for
> this.
Aaaa. Did not know that this is accepted but was always annoyed to wrap
this way, thanks :)
>>>> + iommu_group_id(dev->iommu_group));
>>>> + return -EBUSY;
>>>> + }
>>>> +
>>>> + tbl = get_iommu_table_base(dev);
>>>> + if (!tbl) {
>>>> + printk(KERN_DEBUG "tce_vfio: skipping device %s with no tbl\n",
>>>> + dev->kobj.name);
>>>> + return 0;
>>>> + }
>>>> +
>>>> + printk(KERN_DEBUG "tce_vfio: adding %s to iommu group %d\n",
>>>> + dev->kobj.name, iommu_group_id(tbl->it_group));
>>>> +
>>>> + ret = iommu_group_add_device(tbl->it_group, dev);
>>>> + if (ret < 0)
>>>> + printk(KERN_ERR "tce_vfio: %s has not been added, ret=%d\n",
>>>> + dev->kobj.name, ret);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static void del_device(struct device *dev)
>>>> +{
>>>> + iommu_group_remove_device(dev);
>>>> +}
>>>> +
>>>> +static int iommu_bus_notifier(struct notifier_block *nb,
>>>> + unsigned long action, void *data)
>>>> +{
>>>> + struct device *dev = data;
>>>> +
>>>> + switch (action) {
>>>> + case BUS_NOTIFY_ADD_DEVICE:
>>>> + return add_device(dev);
>>>> + case BUS_NOTIFY_DEL_DEVICE:
>>>> + del_device(dev);
>>>> + return 0;
>>>> + default:
>>>> + return 0;
>>>> + }
>>>> +}
>>>> +
>>>> +static struct notifier_block tce_iommu_bus_nb = {
>>>> + .notifier_call = iommu_bus_notifier,
>>>> +};
>>>> +
>>>> +void group_release(void *iommu_data)
>>>> +{
>>>> + struct iommu_table *tbl = iommu_data;
>>>> + tbl->it_group = NULL;
>>>> +}
>>>> +
>>>> +static int __init tce_iommu_init(void)
>>>> +{
>>>> + struct pci_dev *pdev = NULL;
>>>> + struct iommu_table *tbl;
>>>> + struct iommu_group *grp;
>>>> +
>>>> + /* If the current platform does not support tce_get
>>>> + we are unable to clean TCE table properly and
>>>> + therefore it is better not to touch it at all */
>>>> + if (!ppc_md.tce_get) {
>>>> + printk(KERN_ERR "tce_vfio: ppc_md.tce_get isn't implemented\n");
>>>> + return -EOPNOTSUPP;
>>>> + }
>>>> +
>>>> + bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
>>>> +
>>>> + /* Allocate and initialize VFIO groups */
>>>
>>> s/VFIO groups/IOMMU groups/
>>>
>>>> + for_each_pci_dev(pdev) {
>>>> + tbl = get_iommu_table_base(&pdev->dev);
>>>> + if (!tbl)
>>>> + continue;
>>>> +
>>>> + /* Skip already initialized */
>>>> + if (tbl->it_group)
>>>> + continue;
>>>> +
>>>> + grp = iommu_group_alloc();
>>>> + if (IS_ERR(grp)) {
>>>> + printk(KERN_INFO "tce_vfio: cannot create "
>>>> + "new IOMMU group, ret=%ld\n",
>>>> + PTR_ERR(grp));
>>>> + return -EFAULT;
>>>> + }
>>>> + tbl->it_group = grp;
>>>> + iommu_group_set_iommudata(grp, tbl, group_release);
>>>> + }
>>>> +
>>>> + /* Add PCI devices to VFIO groups */
>>>> + for_each_pci_dev(pdev)
>>>> + add_device(&pdev->dev);
>>>> +
>>>> + return vfio_register_iommu_driver(&tce_iommu_driver_ops);
>>>> +}
>>>> +
>>>> +static void __exit tce_iommu_cleanup(void)
>>>> +{
>>>> + struct pci_dev *pdev = NULL;
>>>> + struct iommu_table *tbl;
>>>> + struct iommu_group *grp = NULL;
>>>> +
>>>> + bus_unregister_notifier(&pci_bus_type, &tce_iommu_bus_nb);
>>>> +
>>>> + /* Delete PCI devices from VFIO groups */
>>>> + for_each_pci_dev(pdev)
>>>> + del_device(&pdev->dev);
>>>> +
>>>> + /* Release VFIO groups */
>>>> + for_each_pci_dev(pdev) {
>>>> + tbl = get_iommu_table_base(&pdev->dev);
>>>> + if (!tbl)
>>>> + continue;
>>>> + grp = tbl->it_group;
>>>> +
>>>> + /* Skip (already) uninitialized */
>>>> + if (!grp)
>>>> + continue;
>>>> +
>>>> + /* Do actual release, group_release() is expected to work */
>>>> + iommu_group_put(grp);
>>>> + BUG_ON(tbl->it_group);
>>>> + }
>>>> +
>>>
>>>
>>> It troubles me a bit that you're using the vfio driver to initialize and
>>> tear down IOMMU groups on your platform.
>>
>>
>> I am not following you here. Could you please explain a bit?
>
> IOMMU groups are theoretically not just for VFIO. They expose DMA
> dependencies between devices for anyone who cares to know about it.
> VFIO happens to care very much about that, but is hopefully not the only
> consumer. So it's a little bit like writing a driver for a device on a
> new bus and incorporating the bus topology handling code into the device
> driver. IOMMU groups should be created and managed independent of VFIO.
Do you mean that we create groups only for PCI devices? Well, moving groups
creation where the actual powerpc groups are allocated (pci scan) is
problematic right now as iommu_init() is called too late.
>>> VFIO makes use of IOMMU groups
>>> and is the only user so far, but they're hopefully useful beyond this.
>>> In fact, VFIO used to manage assembling all groups from data provided by
>>> the IOMMU but David wanted to see IOMMU groups be a more universally
>>> available feature, so it's odd to see POWER implementing it this way.
>>
>>
>> David, help! :)
>>
>>
>>>> + vfio_unregister_iommu_driver(&tce_iommu_driver_ops);
>>>> +}
>>>> +
>>>> +module_init(tce_iommu_init);
>>>> +module_exit(tce_iommu_cleanup);
>>>> +
>>>> +MODULE_VERSION(DRIVER_VERSION);
>>>> +MODULE_LICENSE("GPL v2");
>>>> +MODULE_AUTHOR(DRIVER_AUTHOR);
>>>> +MODULE_DESCRIPTION(DRIVER_DESC);
>>>> +
>>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>>>> index 0a4f180..2c0a927 100644
>>>> --- a/include/linux/vfio.h
>>>> +++ b/include/linux/vfio.h
>>>> @@ -99,6 +99,7 @@ extern void vfio_unregister_iommu_driver(
>>>> /* Extensions */
>>>>
>>>> #define VFIO_TYPE1_IOMMU 1
>>>> +#define VFIO_SPAPR_TCE_IOMMU 2
>>>>
>>>> /*
>>>> * The IOCTL interface is designed for extensibility by embedding the
>>>> @@ -442,4 +443,32 @@ struct vfio_iommu_type1_dma_unmap {
>>>>
>>>> #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
>>>>
>>>> +/* -------- API for SPAPR TCE (Server POWERPC) IOMMU -------- */
>>>> +
>>>> +struct vfio_iommu_spapr_tce_info {
>>>> + __u32 argsz;
>>>> + __u32 flags;
>>>> + __u32 dma32_window_start;
>>>> + __u32 dma32_window_size;
>>>> + __u64 dma64_window_start;
>>>> + __u64 dma64_window_size;
>>>> +};
>>>> +
>>>> +#define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>> +
>>>> +struct vfio_iommu_spapr_tce_put {
>>>> + __u32 argsz;
>>>> + __u32 flags;
>>>> +#define VFIO_SPAPR_TCE_READ 1
>>>> +#define VFIO_SPAPR_TCE_WRITE 2
>>>> +#define VFIO_SPAPR_TCE_BIDIRECTIONAL (VFIO_SPAPR_TCE_READ|VFIO_SPAPR_TCE_WRITE)
>>>> +#define VFIO_SPAPR_TCE_PUT_MASK VFIO_SPAPR_TCE_BIDIRECTIONAL
>>>> + __u64 ioba;
>>>> + __u64 tce;
>>>> +};
>>>
>>> Ok, so if READ & WRITE are both clear and ioba is set, that's an
>>> "unmap"? This is exactly why _type1 has a MAP and UNMAP, to make it
>>> clear which fields are necessary for which call. I think we should
>>> probably do the same here. Besides, _put makes me think there should be
>>> a _get; do these have some unique meaning in POWER?
>>
>>
>> It is a single H_PUT_TCE for putting a record into TCE table. The guest
>> calls H_PUT_TCE, QEMU replaces the address and simply forwards the call to
>> the host. Calling them map/unmap makes it less clear for powerpc people :)
>
> In the unmap case we take an ioba and lookup a tce to clear, in the map
> case we take an ioba and tce and insert them into the table. It's valid
> to document this and use a single ioctl, but I've opted on x86 to have
> separate ioctls because the documentation falls out cleaner when there
> aren't fields that are only used in certain conditions. Do you really
> want any userspace driver making use of this to know about powerpc
> H_PUT_TCE or would it make more sense to have a MAP and UNMAP call? I
> think it would be better for the VFIO API if we had some consistency in
> the mapping ioctls where possible.
I would think that passing through "as is" as much as possible is the best
thing here as the aim is KVM. May be one day we will implement H_PUT_TCE in
the kernel, so splitting H_PUT_TCE to map+unmap and then combining it back
in the kernel (because we will have H_PUT_TCE handler) is a bit ugly.
>>>> +#define VFIO_IOMMU_SPAPR_TCE_PUT _IO(VFIO_TYPE, VFIO_BASE + 13)
>>>> +
>>>
>>> Please document what all of the above means. Thanks,
>>
>>
>> Something like this?
>> /*
>> * The VFIO_IOMMU_SPAPR_TCE_PUT is implemented as the H_PUT_TCE hypercall.
>> * ioba - I/O Bus Address for indexing into TCE table
>> * tce - logical address of storage
>> *
>> * The non-zero flags means adding new page into the table.
>> * The zero flags means releasing the existing page and clearing the
>> * TCE table entry.
>> */
>
> Do you only want VFIO drivers to work on POWER if they're written by
> POWER people? Ideally there are a few simple concepts: a) devices have
> an I/O virtual address space. On x86 we call this the iova and it's
> effectively a zero-based, 64bit (not really, but close enough) address
> space. You seem to have two smaller windows, one in 32bit space,
> another in 64bit space (maybe we could name these more consistently).
> b) Userspace has a buffer that they want to map and unmap to an iova,
> potentially with some access flags. That's all you need to know to use
> the x86 _type1 VFIO IOMMU API.
Do not you have to map entire RAM to PCI bus? You use listener which
purpose is not very clear. This is an extra knowledge beyond qemu-to-host
interface which the user space program should know.
> Why do I need to know about H_PUT_TCE to
> use this interface? Let's assume there might be some VFIO drivers some
> day that aren't written by POWER people. Thanks,
Example of such a driver? My imagination is weak :)
--
Alexey
^ permalink raw reply
* RE: [PATCH 2/3] powerpc/esdhc: add property to disable the CMD23
From: Huang Changming-R66093 @ 2012-09-14 2:02 UTC (permalink / raw)
To: Kumar Gala
Cc: linux-mmc@vger.kernel.org, Chris Ball,
linuxppc-dev@lists.ozlabs.org list, Anton Vorontsov
In-Reply-To: <4E15F856-D385-40C4-A5FD-5F298C70F402@kernel.crashing.org>
Best Regards
Jerry Huang
> -----Original Message-----
> From: Kumar Gala [mailto:galak@kernel.crashing.org]
> Sent: Thursday, September 13, 2012 8:48 PM
> To: Huang Changming-R66093
> Cc: Chris Ball; linuxppc-dev@lists.ozlabs.org list; linux-
> mmc@vger.kernel.org; Anton Vorontsov
> Subject: Re: [PATCH 2/3] powerpc/esdhc: add property to disable the CMD23
>=20
>=20
> On Sep 12, 2012, at 9:02 PM, Huang Changming-R66093 wrote:
>=20
> >>>
> >>>> -----Original Message-----
> >>>> From: Chris Ball [mailto:cjb@laptop.org]
> >>>> Sent: Wednesday, September 12, 2012 4:59 AM
> >>>> To: Kumar Gala
> >>>> Cc: Huang Changming-R66093; linuxppc-dev@lists.ozlabs.org list;
> >>>> linux- mmc@vger.kernel.org; Anton Vorontsov
> >>>> Subject: Re: [PATCH 2/3] powerpc/esdhc: add property to disable the
> >>>> CMD23
> >>>>
> >>>> Hi,
> >>>>
> >>>> On Tue, Sep 11 2012, Kumar Gala wrote:
> >>>>> thanks for the info. Do you know what's required on controller
> >>>>> side to handle cards that support CMD23?
> >>>>>
> >>>>> I'm trying to figure out if older controller's on FSL SoCs are
> >>>>> missing some feature to allow CMD23 to work (vs Auto-CMD23).
> >>>>
> >>>> It seems plausible that it's just not implemented on these
> controllers.
> >>>> It's a little strange, since the command's been specified for so
> >>>> long and we haven't seen any other controllers with problems. The
> >>>> patch would be correct if this is true.
> >>>>
> >>>
> >>> I didn't find any description about it, but after testing on FSL
> >> silicones, I got this result:
> >>> Some silicones support this command, and some silicones don't
> >>> support
> >> it, which will cause I/O error.
> >>
> >> Can you list out which SoCs support it and which don't. Having this
> >> list will be useful in understanding which controller versions
> supported it.
> >>
> > P1020, p1021, p1022, p1024, p1015 and p4080 can't support it.
> > Mpc8536, p2020, and the other current DPAA silicon (e.g. p5020, p3041)
> support it.
>=20
> Based on this, why don't we use the HOSTVER register to detect instead of
> device tree:
>=20
>=20
> #define FSL_SDHC_VER_1_0 0x00
> #define FSL_SDHC_VER_1_1 0x01
> #define FSL_SDHC_VER_2_0 0x10
> #define FSL_SDHC_VER_2_1 0x11
> #define FSL_SDHC_VER_2_2 0x12
> #define FSL_SDHC_VER_2_3 0x13
>=20
> unsigned int vendor_version;
>=20
> vendor_version =3D sdhci_readw(host, SDHCI_HOST_VERSION); vendor_version =
=3D
> (vendor_version & SDHCI_VENDOR_VER_MASK) >> SDHCI_VENDOR_VER_SHIFT;
>=20
> if ((vendor_version =3D=3D FSL_SDHC_VER_1_1) || (vendor_version =3D=3D
> FSL_SDHC_VER_2_2))
> host->quirks2 |=3D SDHCI_QUIRK2_HOST_NO_CMD23;
>=20
I once thought about it, but if the future silicon does not support this fe=
ature,
then we continue to modify these codes for new silicon?
^ permalink raw reply
* Re: [PATCH] vfio: enabled and supported on power (v7)
From: Alex Williamson @ 2012-09-14 4:35 UTC (permalink / raw)
To: Alexey Kardashevskiy; +Cc: Paul Mackerras, linuxppc-dev, David Gibson
In-Reply-To: <50527FAC.2080203@ozlabs.ru>
On Fri, 2012-09-14 at 10:51 +1000, Alexey Kardashevskiy wrote:
> On 14/09/12 08:34, Alex Williamson wrote:
> > On Tue, 2012-09-11 at 18:28 +1000, Alexey Kardashevskiy wrote:
> >> On 11/09/12 02:02, Alex Williamson wrote:
> >>> On Tue, 2012-09-04 at 17:33 +1000, Alexey Kardashevskiy wrote:
> >>>> Cc: David Gibson <david@gibson.dropbear.id.au>
> >>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> >>>> Cc: Paul Mackerras <paulus@samba.org>
> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>> ---
> >>>
> >>> Please at least cc kvm@vger as well since we list that as the devel list
> >>> for vfio.
> >>>
> >>>> arch/powerpc/include/asm/iommu.h | 3 +
> >>>
> >>> I'll need an ack from Ben or Paul for this change.
> >>>
> >>>> drivers/iommu/Kconfig | 8 +
> >>>> drivers/vfio/Kconfig | 6 +
> >>>> drivers/vfio/Makefile | 1 +
> >>>> drivers/vfio/vfio_iommu_spapr_tce.c | 440 +++++++++++++++++++++++++++++++++++
> >>>> include/linux/vfio.h | 29 +++
> >>>> 6 files changed, 487 insertions(+)
> >>>> create mode 100644 drivers/vfio/vfio_iommu_spapr_tce.c
> >>>>
> >>>> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> >>>> index 957a83f..c64bce7 100644
> >>>> --- a/arch/powerpc/include/asm/iommu.h
> >>>> +++ b/arch/powerpc/include/asm/iommu.h
> >>>> @@ -66,6 +66,9 @@ struct iommu_table {
> >>>> unsigned long it_halfpoint; /* Breaking point for small/large allocs */
> >>>> spinlock_t it_lock; /* Protects it_map */
> >>>> unsigned long *it_map; /* A simple allocation bitmap for now */
> >>>> +#ifdef CONFIG_IOMMU_API
> >>>> + struct iommu_group *it_group;
> >>>> +#endif
> >>>> };
> >>>
> >>> This seems to only be valid when vfio_iommu_spapr_tce is loaded, which
> >>> is a bit misleading.
> >>>
> >>>>
> >>>> struct scatterlist;
> >>>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> >>>> index 3bd9fff..19cf2d9 100644
> >>>> --- a/drivers/iommu/Kconfig
> >>>> +++ b/drivers/iommu/Kconfig
> >>>> @@ -162,4 +162,12 @@ config TEGRA_IOMMU_SMMU
> >>>> space through the SMMU (System Memory Management Unit)
> >>>> hardware included on Tegra SoCs.
> >>>>
> >>>> +config SPAPR_TCE_IOMMU
> >>>> + bool "sPAPR TCE IOMMU Support"
> >>>> + depends on PPC_PSERIES
> >>>> + select IOMMU_API
> >>>> + help
> >>>> + Enables bits of IOMMU API required by VFIO. The iommu_ops is
> >>>> + still not implemented.
> >>>> +
> >>>> endif # IOMMU_SUPPORT
> >>>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> >>>> index 7cd5dec..b464687 100644
> >>>> --- a/drivers/vfio/Kconfig
> >>>> +++ b/drivers/vfio/Kconfig
> >>>> @@ -3,10 +3,16 @@ config VFIO_IOMMU_TYPE1
> >>>> depends on VFIO
> >>>> default n
> >>>>
> >>>> +config VFIO_IOMMU_SPAPR_TCE
> >>>> + tristate
> >>>> + depends on VFIO && SPAPR_TCE_IOMMU
> >>>> + default n
> >>>> +
> >>>> menuconfig VFIO
> >>>> tristate "VFIO Non-Privileged userspace driver framework"
> >>>> depends on IOMMU_API
> >>>> select VFIO_IOMMU_TYPE1 if X86
> >>>> + select VFIO_IOMMU_SPAPR_TCE if PPC_POWERNV
> >>>> help
> >>>> VFIO provides a framework for secure userspace device drivers.
> >>>> See Documentation/vfio.txt for more details.
> >>>> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> >>>> index 2398d4a..72bfabc 100644
> >>>> --- a/drivers/vfio/Makefile
> >>>> +++ b/drivers/vfio/Makefile
> >>>> @@ -1,3 +1,4 @@
> >>>> obj-$(CONFIG_VFIO) += vfio.o
> >>>> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >>>> +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >>>> obj-$(CONFIG_VFIO_PCI) += pci/
> >>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>>> new file mode 100644
> >>>> index 0000000..21f1909
> >>>> --- /dev/null
> >>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>>> @@ -0,0 +1,440 @@
> >>>> +/*
> >>>> + * VFIO: IOMMU DMA mapping support for TCE on POWER
> >>>> + *
> >>>> + * Copyright (C) 2012 IBM Corp. All rights reserved.
> >>>> + * Author: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>> + *
> >>>> + * This program is free software; you can redistribute it and/or modify
> >>>> + * it under the terms of the GNU General Public License version 2 as
> >>>> + * published by the Free Software Foundation.
> >>>> + *
> >>>> + * Derived from original vfio_iommu_x86.c:
> >>>
> >>> Should this be _type1? Only the mail archives are going to remember
> >>> there was a _x86, so the renamed version is probably a better reference.
> >>>
> >>>> + * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
> >>>> + * Author: Alex Williamson <alex.williamson@redhat.com>
> >>>> + */
> >>>> +
> >>>> +#include <linux/module.h>
> >>>> +#include <linux/pci.h>
> >>>> +#include <linux/slab.h>
> >>>> +#include <linux/uaccess.h>
> >>>> +#include <linux/err.h>
> >>>> +#include <linux/vfio.h>
> >>>> +#include <linux/spinlock.h>
> >>>> +#include <asm/iommu.h>
> >>>> +
> >>>> +#define DRIVER_VERSION "0.1"
> >>>> +#define DRIVER_AUTHOR "aik@ozlabs.ru"
> >>>> +#define DRIVER_DESC "VFIO IOMMU SPAPR TCE"
> >>>> +
> >>>> +
> >>>> +/*
> >>>> + * SPAPR TCE API
> >>>> + */
> >>>> +static void tce_free(struct iommu_table *tbl, unsigned long entry,
> >>>> + unsigned long tce)
> >>>> +{
> >>>> + struct page *page = pfn_to_page(tce >> PAGE_SHIFT);
> >>>> +
> >>>> + WARN_ON(!page);
> >>>> + if (page) {
> >>>> + if (tce & VFIO_SPAPR_TCE_WRITE)
> >>>> + SetPageDirty(page);
> >>>> + put_page(page);
> >>>> + }
> >>>> + ppc_md.tce_free(tbl, entry, 1);
> >>>> +}
> >>>> +
> >>>> +static long tce_put(struct iommu_table *tbl,
> >>>> + unsigned long entry, uint64_t tce, uint32_t flags)
> >>>> +{
> >>>> + int ret;
> >>>> + unsigned long oldtce, kva, offset;
> >>>> + struct page *page = NULL;
> >>>> + enum dma_data_direction direction = DMA_NONE;
> >>>> +
> >>>> + switch (flags & VFIO_SPAPR_TCE_PUT_MASK) {
> >>>> + case VFIO_SPAPR_TCE_READ:
> >>>> + direction = DMA_TO_DEVICE;
> >>>> + break;
> >>>> + case VFIO_SPAPR_TCE_WRITE:
> >>>> + direction = DMA_FROM_DEVICE;
> >>>> + break;
> >>>> + case VFIO_SPAPR_TCE_BIDIRECTIONAL:
> >>>> + direction = DMA_BIDIRECTIONAL;
> >>>> + break;
> >>>> + }
> >>>> +
> >>>> + oldtce = ppc_md.tce_get(tbl, entry);
> >>>> +
> >>>> + /* Free page if still allocated */
> >>>> + if (oldtce & VFIO_SPAPR_TCE_PUT_MASK)
> >>>> + tce_free(tbl, entry, oldtce);
> >>>> +
> >>>> + /* Map new TCE */
> >>>> + if (direction != DMA_NONE) {
> >>>> + offset = (tce & IOMMU_PAGE_MASK) - (tce & PAGE_MASK);
> >>>> + ret = get_user_pages_fast(tce & PAGE_MASK, 1,
> >>>> + direction != DMA_TO_DEVICE, &page);
> >>>> + BUG_ON(ret > 1);
> >>>
> >>> Can this happen?
> >>>
> >>>> + if (ret < 1) {
> >>>> + printk(KERN_ERR "tce_vfio: get_user_pages_fast failed "
> >>>> + "tce=%llx ioba=%lx ret=%d\n",
> >>>> + tce, entry << IOMMU_PAGE_SHIFT, ret);
> >>>> + if (!ret)
> >>>> + ret = -EFAULT;
> >>>> + goto unlock_exit;
> >>>> + }
> >>>> +
> >>>> + kva = (unsigned long) page_address(page);
> >>>> + kva += offset;
> >>>> + BUG_ON(!kva);
> >>>
> >>> Same here, can it happen? If so, should it BUG or catch the below
> >>> EINVAL?
> >>>
> >>>> + if (WARN_ON(kva & ~IOMMU_PAGE_MASK))
> >>>> + return -EINVAL;
> >>>
> >>> Page leak? Don't we want to do a put_page(), which means we probably
> >>> want a goto exit here.
> >>>
> >>>> +
> >>>> + /* Preserve access bits */
> >>>> + kva |= flags & VFIO_SPAPR_TCE_PUT_MASK;
> >>>> +
> >>>> + /* tce_build receives a virtual address */
> >>>> + entry += tbl->it_offset; /* Offset into real TCE table */
> >>>> + ret = ppc_md.tce_build(tbl, entry, 1, kva, direction, NULL);
> >>>> +
> >>>> + /* tce_build() only returns non-zero for transient errors */
> >>>> + if (unlikely(ret)) {
> >>>> + printk(KERN_ERR "tce_vfio: Failed to add TCE\n");
> >>>> + ret = -EIO;
> >>>> + goto unlock_exit;
> >>>> + }
> >>>> + }
> >>>> + /* Flush/invalidate TLB caches if necessary */
> >>>> + if (ppc_md.tce_flush)
> >>>> + ppc_md.tce_flush(tbl);
> >>>> +
> >>>> + /* Make sure updates are seen by hardware */
> >>>> + mb();
> >>>> +
> >>>> +unlock_exit:
> >>>
> >>> unlock seems wrong here, I had to go re-read the code looking for the
> >>> lock.
> >>>
> >>>> + if (ret && page)
> >>>> + put_page(page);
> >>>> +
> >>>> + if (ret)
> >>>> + printk(KERN_ERR "tce_vfio: tce_put failed on tce=%llx "
> >>>> + "ioba=%lx kva=%lx\n", tce,
> >>>> + entry << IOMMU_PAGE_SHIFT, kva);
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
> >>>> + */
> >>>> +
> >>>> +/*
> >>>> + * The container descriptor supports only a single group per container.
> >>>> + * Required by the API as the container is not supplied with the IOMMU group
> >>>> + * at the moment of initialization.
> >>>> + */
> >>>> +struct tce_container {
> >>>> + struct iommu_table *tbl;
> >>>> +};
> >>>> +
> >>>> +static void *tce_iommu_open(unsigned long arg)
> >>>> +{
> >>>> + struct tce_container *container;
> >>>> +
> >>>> + if (arg != VFIO_SPAPR_TCE_IOMMU) {
> >>>> + printk(KERN_ERR "tce_vfio: Wrong IOMMU type\n");
> >>>> + return ERR_PTR(-EINVAL);
> >>>> + }
> >>>> +
> >>>> + container = kzalloc(sizeof(*container), GFP_KERNEL);
> >>>> + if (!container)
> >>>> + return ERR_PTR(-ENOMEM);
> >>>> +
> >>>> + return container;
> >>>> +}
> >>>> +
> >>>> +static void tce_iommu_release(void *iommu_data)
> >>>> +{
> >>>> + struct tce_container *container = iommu_data;
> >>>> + struct iommu_table *tbl = container->tbl;
> >>>> + unsigned long i, tce;
> >>>> +
> >>>
> >>> This will segfault if releasing a container that never had an a device
> >>> attached.
> >>>
> >>>> + /* Unmap leftovers */
> >>>> + spin_lock_irq(&tbl->it_lock);
> >>>> + for (i = tbl->it_offset; i < tbl->it_offset + tbl->it_size; ++i) {
> >>>> + tce = ppc_md.tce_get(tbl, i);
> >>>> + if (tce & VFIO_SPAPR_TCE_PUT_MASK)
> >>>> + tce_free(tbl, i, tce);
> >>>> + }
> >>>> + /* Flush/invalidate TLB caches if necessary */
> >>>> + if (ppc_md.tce_flush)
> >>>> + ppc_md.tce_flush(tbl);
> >>>> +
> >>>> + /* Make sure updates are seen by hardware */
> >>>> + mb();
> >>>> +
> >>>> + spin_unlock_irq(&tbl->it_lock);
> >>>> +
> >>>> + kfree(container);
> >>>> +}
> >>>> +
> >>>> +static long tce_iommu_ioctl(void *iommu_data,
> >>>> + unsigned int cmd, unsigned long arg)
> >>>> +{
> >>>> + struct tce_container *container = iommu_data;
> >>>> + unsigned long minsz;
> >>>> + long ret;
> >>>> +
> >>>> + switch (cmd) {
> >>>> + case VFIO_CHECK_EXTENSION: {
> >>>> + return (arg == VFIO_SPAPR_TCE_IOMMU) ? 1 : 0;
> >>>> + }
> >>>> + case VFIO_IOMMU_SPAPR_TCE_GET_INFO: {
> >>>> + struct vfio_iommu_spapr_tce_info info;
> >>>> + struct iommu_table *tbl = container->tbl;
> >>>> +
> >>>> + minsz = offsetofend(struct vfio_iommu_spapr_tce_info,
> >>>> + dma64_window_size);
> >>>> +
> >>>> + if (copy_from_user(&info, (void __user *)arg, minsz))
> >>>> + return -EFAULT;
> >>>> +
> >>>> + if (info.argsz < minsz)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + if (!tbl)
> >>>> + return -ENXIO;
> >>>
> >>> nit: why not check this earlier?
> >>>
> >>>> +
> >>>> + info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT;
> >>>> + info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT;
> >>>> + info.dma64_window_start = 0;
> >>>> + info.dma64_window_size = 0;
> >>>> + info.flags = 0;
> >>>> +
> >>>> + return copy_to_user((void __user *)arg, &info, minsz);
> >>>> + }
> >>>> + case VFIO_IOMMU_SPAPR_TCE_PUT: {
> >>>> + struct vfio_iommu_spapr_tce_put par;
> >>>> + struct iommu_table *tbl = container->tbl;
> >>>> +
> >>>> + minsz = offsetofend(struct vfio_iommu_spapr_tce_put, tce);
> >>>> +
> >>>> + if (copy_from_user(&par, (void __user *)arg, minsz))
> >>>> + return -EFAULT;
> >>>> +
> >>>> + if (par.argsz < minsz)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + if (!tbl) {
> >>>> + return -ENXIO;
> >>>> + }
> >>>
> >>> Same, plus drop the braces.
> >>>
> >>>> +
> >>>> + spin_lock_irq(&tbl->it_lock);
> >>>> + ret = tce_put(tbl, par.ioba >> IOMMU_PAGE_SHIFT,
> >>>> + par.tce, par.flags);
> >>>> + spin_unlock_irq(&tbl->it_lock);
> >>>> +
> >>>> + return ret;
> >>>> + }
> >>>
> >>> Is "PUT" really the name we want for this?
> >>
> >>
> >> Yes, it is a single H_PUT_TCE hypercall from POWER architecture spec.
> >
> > Ok, if it makes sense on your arch, I won't complain (too much) about
> > it.
> >
> >>>> + default:
> >>>> + printk(KERN_WARNING "tce_vfio: unexpected cmd %x\n", cmd);
> >>>> + }
> >>>> +
> >>>> + return -ENOTTY;
> >>>> +}
> >>>> +
> >>>> +static int tce_iommu_attach_group(void *iommu_data,
> >>>> + struct iommu_group *iommu_group)
> >>>> +{
> >>>> + struct tce_container *container = iommu_data;
> >>>> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> >>>> +
> >>>> + printk(KERN_DEBUG "tce_vfio: Attaching group #%u to iommu %p\n",
> >>>> + iommu_group_id(iommu_group), iommu_group);
> >>>
> >>> Let's use pr_debug() and friends throughout.
> >>>
> >>>> + if (container->tbl) {
> >>>> + printk(KERN_WARNING "tce_vfio: Only one group per IOMMU "
> >>>> + "container is allowed, "
> >>>> + "existing id=%d, attaching id=%d\n",
> >>>> + iommu_group_id(container->tbl->it_group),
> >>>> + iommu_group_id(iommu_group));
> >>>> + return -EBUSY;
> >>>> + }
> >>>> +
> >>>
> >>> _type1 has a lock to avoid races here, I think you might need one too.
> >>>
> >>>> + container->tbl = tbl;
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +static void tce_iommu_detach_group(void *iommu_data,
> >>>> + struct iommu_group *iommu_group)
> >>>> +{
> >>>> + struct tce_container *container = iommu_data;
> >>>> + struct iommu_table *tbl = iommu_group_get_iommudata(iommu_group);
> >>>> +
> >>>> + BUG_ON(!tbl);
> >>>
> >>> Needed? If so, why is there no check on attach?
> >>
> >> Added to attach() :)
> >>
> >>
> >>>
> >>>> + if (tbl != container->tbl) {
> >>>> + printk(KERN_WARNING "tce_vfio: detaching group #%u, expected "
> >>>> + "group is #%u\n", iommu_group_id(iommu_group),
> >>>> + iommu_group_id(tbl->it_group));
> >>>> + return;
> >>>> + }
> >>>> + printk(KERN_DEBUG "tce_vfio: detaching group #%u from iommu %p\n",
> >>>> + iommu_group_id(iommu_group), iommu_group);
> >>>
> >>> container->tbl = NULL?
> >>
> >>
> >> Then I won't be able to release pages in tce_iommu_release().
> >> Releasing pages in tce_iommu_detach_group() caused some other problems,
> >> cannot recall now which ones.
> >
> > What happens if you hot unplug a group from one VM and add it to
> > another? ie. we've detached it from one container and add it to another
> > in a different instance. Does it cause problems here?
>
>
> Then the container will be released as just one group per container is
> supported at the moment, no? Cannot check though as we do not support
> hotplug yet.
But you still have a race where the group is detached, but the container
is not yet released and can be attached to another container in a
different instance.
> >>>> +}
> >>>> +
> >>>> +const struct vfio_iommu_driver_ops tce_iommu_driver_ops = {
> >>>> + .name = "iommu-vfio-powerpc",
> >>>> + .owner = THIS_MODULE,
> >>>> + .open = tce_iommu_open,
> >>>> + .release = tce_iommu_release,
> >>>> + .ioctl = tce_iommu_ioctl,
> >>>> + .attach_group = tce_iommu_attach_group,
> >>>> + .detach_group = tce_iommu_detach_group,
> >>>> +};
> >>>> +
> >>>> +/*
> >>>> + * Add/delete devices support (hotplug, module_init, module_exit)
> >>>> + */
> >>>> +static int add_device(struct device *dev)
> >>>> +{
> >>>> + struct iommu_table *tbl;
> >>>> + int ret = 0;
> >>>> +
> >>>> + if (dev->iommu_group) {
> >>>> + printk(KERN_WARNING "tce_vfio: device %s is already in iommu "
> >>>> + "group %d, skipping\n", dev->kobj.name,
> >>>
> >>> Watch line wrapping on strings.
> >>
> >> Pardon?
> >
> > Just suggesting that you try to wrap lines so that strings are
> > searchable. For instance, can I search cscope for "is already in iommu
> > group". It's generally accepted that printks can break 80 cols for
> > this.
>
> Aaaa. Did not know that this is accepted but was always annoyed to wrap
> this way, thanks :)
>
>
> >>>> + iommu_group_id(dev->iommu_group));
> >>>> + return -EBUSY;
> >>>> + }
> >>>> +
> >>>> + tbl = get_iommu_table_base(dev);
> >>>> + if (!tbl) {
> >>>> + printk(KERN_DEBUG "tce_vfio: skipping device %s with no tbl\n",
> >>>> + dev->kobj.name);
> >>>> + return 0;
> >>>> + }
> >>>> +
> >>>> + printk(KERN_DEBUG "tce_vfio: adding %s to iommu group %d\n",
> >>>> + dev->kobj.name, iommu_group_id(tbl->it_group));
> >>>> +
> >>>> + ret = iommu_group_add_device(tbl->it_group, dev);
> >>>> + if (ret < 0)
> >>>> + printk(KERN_ERR "tce_vfio: %s has not been added, ret=%d\n",
> >>>> + dev->kobj.name, ret);
> >>>> +
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +static void del_device(struct device *dev)
> >>>> +{
> >>>> + iommu_group_remove_device(dev);
> >>>> +}
> >>>> +
> >>>> +static int iommu_bus_notifier(struct notifier_block *nb,
> >>>> + unsigned long action, void *data)
> >>>> +{
> >>>> + struct device *dev = data;
> >>>> +
> >>>> + switch (action) {
> >>>> + case BUS_NOTIFY_ADD_DEVICE:
> >>>> + return add_device(dev);
> >>>> + case BUS_NOTIFY_DEL_DEVICE:
> >>>> + del_device(dev);
> >>>> + return 0;
> >>>> + default:
> >>>> + return 0;
> >>>> + }
> >>>> +}
> >>>> +
> >>>> +static struct notifier_block tce_iommu_bus_nb = {
> >>>> + .notifier_call = iommu_bus_notifier,
> >>>> +};
> >>>> +
> >>>> +void group_release(void *iommu_data)
> >>>> +{
> >>>> + struct iommu_table *tbl = iommu_data;
> >>>> + tbl->it_group = NULL;
> >>>> +}
> >>>> +
> >>>> +static int __init tce_iommu_init(void)
> >>>> +{
> >>>> + struct pci_dev *pdev = NULL;
> >>>> + struct iommu_table *tbl;
> >>>> + struct iommu_group *grp;
> >>>> +
> >>>> + /* If the current platform does not support tce_get
> >>>> + we are unable to clean TCE table properly and
> >>>> + therefore it is better not to touch it at all */
> >>>> + if (!ppc_md.tce_get) {
> >>>> + printk(KERN_ERR "tce_vfio: ppc_md.tce_get isn't implemented\n");
> >>>> + return -EOPNOTSUPP;
> >>>> + }
> >>>> +
> >>>> + bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> >>>> +
> >>>> + /* Allocate and initialize VFIO groups */
> >>>
> >>> s/VFIO groups/IOMMU groups/
> >>>
> >>>> + for_each_pci_dev(pdev) {
> >>>> + tbl = get_iommu_table_base(&pdev->dev);
> >>>> + if (!tbl)
> >>>> + continue;
> >>>> +
> >>>> + /* Skip already initialized */
> >>>> + if (tbl->it_group)
> >>>> + continue;
> >>>> +
> >>>> + grp = iommu_group_alloc();
> >>>> + if (IS_ERR(grp)) {
> >>>> + printk(KERN_INFO "tce_vfio: cannot create "
> >>>> + "new IOMMU group, ret=%ld\n",
> >>>> + PTR_ERR(grp));
> >>>> + return -EFAULT;
> >>>> + }
> >>>> + tbl->it_group = grp;
> >>>> + iommu_group_set_iommudata(grp, tbl, group_release);
> >>>> + }
> >>>> +
> >>>> + /* Add PCI devices to VFIO groups */
> >>>> + for_each_pci_dev(pdev)
> >>>> + add_device(&pdev->dev);
> >>>> +
> >>>> + return vfio_register_iommu_driver(&tce_iommu_driver_ops);
> >>>> +}
> >>>> +
> >>>> +static void __exit tce_iommu_cleanup(void)
> >>>> +{
> >>>> + struct pci_dev *pdev = NULL;
> >>>> + struct iommu_table *tbl;
> >>>> + struct iommu_group *grp = NULL;
> >>>> +
> >>>> + bus_unregister_notifier(&pci_bus_type, &tce_iommu_bus_nb);
> >>>> +
> >>>> + /* Delete PCI devices from VFIO groups */
> >>>> + for_each_pci_dev(pdev)
> >>>> + del_device(&pdev->dev);
> >>>> +
> >>>> + /* Release VFIO groups */
> >>>> + for_each_pci_dev(pdev) {
> >>>> + tbl = get_iommu_table_base(&pdev->dev);
> >>>> + if (!tbl)
> >>>> + continue;
> >>>> + grp = tbl->it_group;
> >>>> +
> >>>> + /* Skip (already) uninitialized */
> >>>> + if (!grp)
> >>>> + continue;
> >>>> +
> >>>> + /* Do actual release, group_release() is expected to work */
> >>>> + iommu_group_put(grp);
> >>>> + BUG_ON(tbl->it_group);
> >>>> + }
> >>>> +
> >>>
> >>>
> >>> It troubles me a bit that you're using the vfio driver to initialize and
> >>> tear down IOMMU groups on your platform.
> >>
> >>
> >> I am not following you here. Could you please explain a bit?
> >
> > IOMMU groups are theoretically not just for VFIO. They expose DMA
> > dependencies between devices for anyone who cares to know about it.
> > VFIO happens to care very much about that, but is hopefully not the only
> > consumer. So it's a little bit like writing a driver for a device on a
> > new bus and incorporating the bus topology handling code into the device
> > driver. IOMMU groups should be created and managed independent of VFIO.
>
> Do you mean that we create groups only for PCI devices? Well, moving groups
> creation where the actual powerpc groups are allocated (pci scan) is
> problematic right now as iommu_init() is called too late.
I mean IOMMU group creation should be independent of VFIO. I'm not sure
how to make that ordering work on POWER, but integrating them into your
VFIO driver is contrary to many of the arguments that were made for
making IOMMU groups part of the base device model.
> >>> VFIO makes use of IOMMU groups
> >>> and is the only user so far, but they're hopefully useful beyond this.
> >>> In fact, VFIO used to manage assembling all groups from data provided by
> >>> the IOMMU but David wanted to see IOMMU groups be a more universally
> >>> available feature, so it's odd to see POWER implementing it this way.
> >>
> >>
> >> David, help! :)
> >>
> >>
> >>>> + vfio_unregister_iommu_driver(&tce_iommu_driver_ops);
> >>>> +}
> >>>> +
> >>>> +module_init(tce_iommu_init);
> >>>> +module_exit(tce_iommu_cleanup);
> >>>> +
> >>>> +MODULE_VERSION(DRIVER_VERSION);
> >>>> +MODULE_LICENSE("GPL v2");
> >>>> +MODULE_AUTHOR(DRIVER_AUTHOR);
> >>>> +MODULE_DESCRIPTION(DRIVER_DESC);
> >>>> +
> >>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >>>> index 0a4f180..2c0a927 100644
> >>>> --- a/include/linux/vfio.h
> >>>> +++ b/include/linux/vfio.h
> >>>> @@ -99,6 +99,7 @@ extern void vfio_unregister_iommu_driver(
> >>>> /* Extensions */
> >>>>
> >>>> #define VFIO_TYPE1_IOMMU 1
> >>>> +#define VFIO_SPAPR_TCE_IOMMU 2
> >>>>
> >>>> /*
> >>>> * The IOCTL interface is designed for extensibility by embedding the
> >>>> @@ -442,4 +443,32 @@ struct vfio_iommu_type1_dma_unmap {
> >>>>
> >>>> #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
> >>>>
> >>>> +/* -------- API for SPAPR TCE (Server POWERPC) IOMMU -------- */
> >>>> +
> >>>> +struct vfio_iommu_spapr_tce_info {
> >>>> + __u32 argsz;
> >>>> + __u32 flags;
> >>>> + __u32 dma32_window_start;
> >>>> + __u32 dma32_window_size;
> >>>> + __u64 dma64_window_start;
> >>>> + __u64 dma64_window_size;
> >>>> +};
> >>>> +
> >>>> +#define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>>> +
> >>>> +struct vfio_iommu_spapr_tce_put {
> >>>> + __u32 argsz;
> >>>> + __u32 flags;
> >>>> +#define VFIO_SPAPR_TCE_READ 1
> >>>> +#define VFIO_SPAPR_TCE_WRITE 2
> >>>> +#define VFIO_SPAPR_TCE_BIDIRECTIONAL (VFIO_SPAPR_TCE_READ|VFIO_SPAPR_TCE_WRITE)
> >>>> +#define VFIO_SPAPR_TCE_PUT_MASK VFIO_SPAPR_TCE_BIDIRECTIONAL
> >>>> + __u64 ioba;
> >>>> + __u64 tce;
> >>>> +};
> >>>
> >>> Ok, so if READ & WRITE are both clear and ioba is set, that's an
> >>> "unmap"? This is exactly why _type1 has a MAP and UNMAP, to make it
> >>> clear which fields are necessary for which call. I think we should
> >>> probably do the same here. Besides, _put makes me think there should be
> >>> a _get; do these have some unique meaning in POWER?
> >>
> >>
> >> It is a single H_PUT_TCE for putting a record into TCE table. The guest
> >> calls H_PUT_TCE, QEMU replaces the address and simply forwards the call to
> >> the host. Calling them map/unmap makes it less clear for powerpc people :)
> >
> > In the unmap case we take an ioba and lookup a tce to clear, in the map
> > case we take an ioba and tce and insert them into the table. It's valid
> > to document this and use a single ioctl, but I've opted on x86 to have
> > separate ioctls because the documentation falls out cleaner when there
> > aren't fields that are only used in certain conditions. Do you really
> > want any userspace driver making use of this to know about powerpc
> > H_PUT_TCE or would it make more sense to have a MAP and UNMAP call? I
> > think it would be better for the VFIO API if we had some consistency in
> > the mapping ioctls where possible.
>
>
> I would think that passing through "as is" as much as possible is the best
> thing here as the aim is KVM. May be one day we will implement H_PUT_TCE in
> the kernel, so splitting H_PUT_TCE to map+unmap and then combining it back
> in the kernel (because we will have H_PUT_TCE handler) is a bit ugly.
No, KVM is a use case for VFIO, we shouldn't be assume it's _the_ use
case. Exposing it "as is" means anyone trying to write a VFIO userspace
driver needs to know about the implementation of H_PUT_TCE to make the
driver work on POWER. The fact that the same hypercall is made for a
map or unmap is really irrelevant to the VFIO API.
> >>>> +#define VFIO_IOMMU_SPAPR_TCE_PUT _IO(VFIO_TYPE, VFIO_BASE + 13)
> >>>> +
> >>>
> >>> Please document what all of the above means. Thanks,
> >>
> >>
> >> Something like this?
> >> /*
> >> * The VFIO_IOMMU_SPAPR_TCE_PUT is implemented as the H_PUT_TCE hypercall.
> >> * ioba - I/O Bus Address for indexing into TCE table
> >> * tce - logical address of storage
> >> *
> >> * The non-zero flags means adding new page into the table.
> >> * The zero flags means releasing the existing page and clearing the
> >> * TCE table entry.
> >> */
> >
> > Do you only want VFIO drivers to work on POWER if they're written by
> > POWER people? Ideally there are a few simple concepts: a) devices have
> > an I/O virtual address space. On x86 we call this the iova and it's
> > effectively a zero-based, 64bit (not really, but close enough) address
> > space. You seem to have two smaller windows, one in 32bit space,
> > another in 64bit space (maybe we could name these more consistently).
> > b) Userspace has a buffer that they want to map and unmap to an iova,
> > potentially with some access flags. That's all you need to know to use
> > the x86 _type1 VFIO IOMMU API.
>
>
> Do not you have to map entire RAM to PCI bus? You use listener which
> purpose is not very clear. This is an extra knowledge beyond qemu-to-host
> interface which the user space program should know.
In the x86 case, the buffer we want to map is all of guest RAM. Some of
that changes dynamically, so we have a listener setup to make updates.
The only thing magic about doing that is that the device is then able to
DMA to any part of guest RAM and therefore the guest doesn't need to
know the IOMMU exists. Device assignment is therefore transparent on
x86.
> > Why do I need to know about H_PUT_TCE to
> > use this interface? Let's assume there might be some VFIO drivers some
> > day that aren't written by POWER people. Thanks,
>
> Example of such a driver? My imagination is weak :)
See Tom Lyon's original user level drivers:
https://github.com/pugs/vfio-user-level-drivers
These are against the original version of VFIO so no longer work, but
he's got drivers for common devices like Intel 82576 & 82599 SR-IOV
NICs. There are special use cases and special devices where it makes
sense to have a driver in userspace. Ideally a VFIO driver for a NIC
would work with fairly minimal IOMMU abstractions between x86 and POWER,
but if you design the SPAPR VFIO IOMMU API so that users need to
understand how H_PUT_TCE works to port their driver to POWER, you might
find it more difficult to leverage such drivers. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH] powerpc: Add an xmon command to dump one or all pacas
From: Michael Ellerman @ 2012-09-14 4:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1347439541.2603.100.camel@pasglop>
On Wed, 2012-09-12 at 18:45 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-09-12 at 17:52 +1000, Michael Ellerman wrote:
> > This was originally motivated by a desire to see the mapping between
> > logical and hardware cpu numbers.
> >
> > But it seemed that it made more sense to just add a command to dump
> > (most of) the paca.
> >
> > With no arguments "dp" will dump the paca for all possible cpus. If
> > there are no possible cpus, like early in boot, it will tell you that.
>
> I'd rather "dp" dump the paca for the current active CPU in xmon.
> Shouldn't be hard to make a "dpa" that dumps them all too.
OK.
I also want to be able to dump the paca of a cpu not in xmon, so I'll
keep the "dp #" variant as well.
cheers
^ permalink raw reply
* Re: [PATCH] powerpc: Add an xmon command to dump one or all pacas
From: Michael Ellerman @ 2012-09-14 4:45 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linuxppc-dev
In-Reply-To: <20120912210704.54cc517b6f39c8e2f73523af@canb.auug.org.au>
On Wed, 2012-09-12 at 21:07 +1000, Stephen Rothwell wrote:
> Hi Michael,
>
> On Wed, 12 Sep 2012 17:52:40 +1000 Michael Ellerman <michael@ellerman.id.au> wrote:
> >
> > +#define DUMP(name, format) \
> > + printf(" %-*s = %#-*"format"\t(0x%lx)\n", 16, #name, 18, p->name, \
> > + (u64)((void *)&(p->name) - (void *)p));
>
> I must say that I hate macros that reference (assumed) globals
Well sure, but it's xmon :)
I can change it to take p as a parameter.
> shouldn't you use offsetof(struct paca_struct, name) (from
> linux/stddef.h)?
Yes!
cheers
^ permalink raw reply
* Re: [PATCH] powerpc: Add an xmon command to dump one or all pacas
From: Benjamin Herrenschmidt @ 2012-09-14 5:45 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1347597870.6883.0.camel@concordia>
On Fri, 2012-09-14 at 14:44 +1000, Michael Ellerman wrote:
> On Wed, 2012-09-12 at 18:45 +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2012-09-12 at 17:52 +1000, Michael Ellerman wrote:
> > > This was originally motivated by a desire to see the mapping between
> > > logical and hardware cpu numbers.
> > >
> > > But it seemed that it made more sense to just add a command to dump
> > > (most of) the paca.
> > >
> > > With no arguments "dp" will dump the paca for all possible cpus. If
> > > there are no possible cpus, like early in boot, it will tell you that.
> >
> > I'd rather "dp" dump the paca for the current active CPU in xmon.
> > Shouldn't be hard to make a "dpa" that dumps them all too.
>
> OK.
>
> I also want to be able to dump the paca of a cpu not in xmon, so I'll
> keep the "dp #" variant as well.
Yes, absolutely.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH v4 0/8] Avoid cache trashing on clearing huge/gigantic page
From: Ingo Molnar @ 2012-09-14 5:52 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mips, linux-sh, Jan Beulich, linux-mm, H. Peter Anvin,
sparclinux, Andrea Arcangeli, Andi Kleen, Robert Richter, x86,
Hugh Dickins, Ingo Molnar, Mel Gorman, Alex Shi, Thomas Gleixner,
KAMEZAWA Hiroyuki, Tim Chen, linux-kernel, Andy Lutomirski,
Johannes Weiner, linuxppc-dev, Kirill A. Shutemov
In-Reply-To: <20120913160506.d394392a.akpm@linux-foundation.org>
* Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 20 Aug 2012 16:52:29 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches. To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> >
> > This patchset implements cache avoiding version of clear_page only for
> > x86. If an architecture wants to provide cache avoiding version of
> > clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
> > clear_page_nocache() and clear_user_highpage_nocache().
>
> Patchset looks nice to me, but the changelogs are terribly
> short of performance measurements. For this sort of change I
> do think it is important that pretty exhaustive testing be
> performed, and that the results (or a readable summary of
> them) be shown. And that testing should be designed to probe
> for slowdowns, not just the speedups!
That is my general impression as well.
Firstly, doing before/after "perf stat --repeat 3 ..." runs
showing a statistically significant effect on a workload that is
expected to win from this, and on a workload expected to be
hurting from this would go a long way towards convincing me.
Secondly, if you can find some user-space simulation of the
intended positive (and negative) effects then a 'perf bench'
testcase designed to show weakness of any such approach, running
the very kernel assembly code in user-space would also be rather
useful.
See:
comet:~/tip> git grep x86 tools/perf/bench/ | grep inclu
tools/perf/bench/mem-memcpy-arch.h:#include "mem-memcpy-x86-64-asm-def.h"
tools/perf/bench/mem-memcpy-x86-64-asm.S:#include "../../../arch/x86/lib/memcpy_64.S"
tools/perf/bench/mem-memcpy.c:#include "mem-memcpy-x86-64-asm-def.h"
tools/perf/bench/mem-memset-arch.h:#include "mem-memset-x86-64-asm-def.h"
tools/perf/bench/mem-memset-x86-64-asm.S:#include "../../../arch/x86/lib/memset_64.S"
tools/perf/bench/mem-memset.c:#include "mem-memset-x86-64-asm-def.h"
that code uses the kernel-side assembly code and runs it in
user-space.
Although obviously clearing pages on page faults needs some care
to properly simulate in user-space.
Without repeatable hard numbers such code just gets into the
kernel and bitrots there as new CPU generations come in - a few
years down the line the original decisions often degrade to pure
noise. We've been there, we've done that, we don't want to
repeat it.
Thanks,
Ingo
^ permalink raw reply
* [PATCH] powerpc: Add an xmon command to dump one or all pacas
From: Michael Ellerman @ 2012-09-14 6:03 UTC (permalink / raw)
To: linuxppc-dev
This was originally motivated by a desire to see the mapping between
logical and hardware cpu numbers.
But it seemed that it made more sense to just add a command to dump
(most of) the paca.
With no arguments "dp" will dump the paca for the current cpu.
It also takes an argument, eg. "dp 3" which is the logical cpu number
in hex. This form does not check if the cpu is possible, but displays
the paca regardless, as well as the cpu's state in the possible, present
and online masks.
Thirdly, "dpa" will display the paca for all possible cpus. If there are
no possible cpus, like early in boot, it will tell you that.
Sample output, number in brackets is the offset into the struct:
2:mon> dp 3
paca for cpu 0x3 @ c00000000ff20a80:
possible = yes
present = yes
online = yes
lock_token = 0x8000 (0x8)
paca_index = 0x3 (0xa)
kernel_toc = 0xc00000000144f990 (0x10)
kernelbase = 0xc000000000000000 (0x18)
kernel_msr = 0xb000000000001032 (0x20)
stab_real = 0x0 (0x28)
stab_addr = 0x0 (0x30)
emergency_sp = 0xc00000003ffe4000 (0x38)
data_offset = 0xa40000 (0x40)
hw_cpu_id = 0x9 (0x50)
cpu_start = 0x1 (0x52)
kexec_state = 0x0 (0x53)
__current = 0xc00000007e568680 (0x218)
kstack = 0xc00000007e5a3e30 (0x220)
stab_rr = 0x1a (0x228)
saved_r1 = 0xc00000007e7cb450 (0x230)
trap_save = 0x0 (0x240)
soft_enabled = 0x0 (0x242)
irq_happened = 0x0 (0x243)
io_sync = 0x0 (0x244)
irq_work_pending = 0x0 (0x245)
nap_state_lost = 0x0 (0x246)
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
v2: Fix macro per sfr's comments. Change dp to show current cpu's paca,
and add dpa to show all.
---
arch/powerpc/xmon/xmon.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9b49c65..2a2339f 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -203,6 +203,8 @@ Commands:\n\
df dump float values\n\
dd dump double values\n\
dl dump the kernel log buffer\n\
+ dp[#] dump paca for current cpu, or cpu #\n\
+ dpa dump paca for all possible cpus\n\
dr dump stream of raw bytes\n\
e print exception information\n\
f flush cache\n\
@@ -2009,6 +2011,93 @@ static void xmon_rawdump (unsigned long adrs, long ndump)
printf("\n");
}
+static void dump_one_paca(int cpu)
+{
+ struct paca_struct *p;
+
+ if (setjmp(bus_error_jmp) != 0) {
+ printf("*** Error dumping paca for cpu 0x%x!\n", cpu);
+ return;
+ }
+
+ catch_memory_errors = 1;
+ sync();
+
+ p = &paca[cpu];
+
+ printf("paca for cpu 0x%x @ %p:\n", cpu, p);
+
+ printf(" %-*s = %s\n", 16, "possible", cpu_possible(cpu) ? "yes" : "no");
+ printf(" %-*s = %s\n", 16, "present", cpu_present(cpu) ? "yes" : "no");
+ printf(" %-*s = %s\n", 16, "online", cpu_online(cpu) ? "yes" : "no");
+
+#define DUMP(paca, name, format) \
+ printf(" %-*s = %#-*"format"\t(0x%lx)\n", 16, #name, 18, paca->name, \
+ offsetof(struct paca_struct, name));
+
+ DUMP(p, lock_token, "x");
+ DUMP(p, paca_index, "x");
+ DUMP(p, kernel_toc, "lx");
+ DUMP(p, kernelbase, "lx");
+ DUMP(p, kernel_msr, "lx");
+#ifdef CONFIG_PPC_STD_MMU_64
+ DUMP(p, stab_real, "lx");
+ DUMP(p, stab_addr, "lx");
+#endif
+ DUMP(p, emergency_sp, "p");
+ DUMP(p, data_offset, "lx");
+ DUMP(p, hw_cpu_id, "x");
+ DUMP(p, cpu_start, "x");
+ DUMP(p, kexec_state, "x");
+ DUMP(p, __current, "p");
+ DUMP(p, kstack, "lx");
+ DUMP(p, stab_rr, "lx");
+ DUMP(p, saved_r1, "lx");
+ DUMP(p, trap_save, "x");
+ DUMP(p, soft_enabled, "x");
+ DUMP(p, irq_happened, "x");
+ DUMP(p, io_sync, "x");
+ DUMP(p, irq_work_pending, "x");
+ DUMP(p, nap_state_lost, "x");
+
+#undef DUMP
+
+ catch_memory_errors = 0;
+ sync();
+}
+
+static void dump_all_pacas(void)
+{
+ int cpu;
+
+ if (num_possible_cpus() == 0) {
+ printf("No possible cpus, use 'dp #' to dump individual cpus\n");
+ return;
+ }
+
+ for_each_possible_cpu(cpu)
+ dump_one_paca(cpu);
+}
+
+static void dump_pacas(void)
+{
+ unsigned long num;
+ int c;
+
+ c = inchar();
+ if (c == 'a') {
+ dump_all_pacas();
+ return;
+ }
+
+ termch = c; /* Put c back, it wasn't 'a' */
+
+ if (scanhex(&num))
+ dump_one_paca(num);
+ else
+ dump_one_paca(xmon_owner);
+}
+
#define isxdigit(c) (('0' <= (c) && (c) <= '9') \
|| ('a' <= (c) && (c) <= 'f') \
|| ('A' <= (c) && (c) <= 'F'))
@@ -2018,6 +2107,12 @@ dump(void)
int c;
c = inchar();
+
+ if (c == 'p') {
+ dump_pacas();
+ return;
+ }
+
if ((isxdigit(c) && c != 'f' && c != 'd') || c == '\n')
termch = c;
scanhex((void *)&adrs);
--
1.7.9.5
^ permalink raw reply related
* [PATCH] p1023rds_defconfig: Add USB support
From: Chunhe Lan @ 2012-09-14 19:57 UTC (permalink / raw)
To: linuxppc-dev; +Cc: kumar.gala, Chunhe Lan
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
---
arch/powerpc/configs/85xx/p1023rds_defconfig | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/configs/85xx/p1023rds_defconfig b/arch/powerpc/configs/85xx/p1023rds_defconfig
index 26e541c..b80bcc6 100644
--- a/arch/powerpc/configs/85xx/p1023rds_defconfig
+++ b/arch/powerpc/configs/85xx/p1023rds_defconfig
@@ -112,6 +112,12 @@ CONFIG_SND=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
+CONFIG_USB=y
+CONFIG_USB_DEVICEFS=y
+CONFIG_USB_MON=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_EHCI_FSL=y
+CONFIG_USB_STORAGE=y
CONFIG_EDAC=y
CONFIG_EDAC_MM_EDAC=y
CONFIG_RTC_CLASS=y
--
1.7.6.5
^ permalink raw reply related
* [PATCH v2 2/7] dt/powerpc: Use of_get_child_by_name to get a named child.
From: Srinivas KANDAGATLA @ 2012-09-14 8:18 UTC (permalink / raw)
To: benh, bergner
Cc: robherring2, devicetree-discuss, linuxppc-dev,
srinivas.kandagatla
From: Srinivas Kandagatla <srinivas.kandagatla@st.com>
As follow-up to "dt: introduce of_get_child_by_name to get child node by
name." patch, This patch removes some of the code duplication in the
driver by replacing it with of_get_child_by_name instead.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
---
arch/powerpc/kernel/prom.c | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index f191bf0..80e5b6c 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -735,7 +735,6 @@ void __init early_init_devtree(void *params)
*/
struct device_node *of_find_next_cache_node(struct device_node *np)
{
- struct device_node *child;
const phandle *handle;
handle = of_get_property(np, "l2-cache", NULL);
@@ -749,9 +748,7 @@ struct device_node *of_find_next_cache_node(struct device_node *np)
* beneath CPU nodes.
*/
if (!strcmp(np->type, "cpu"))
- for_each_child_of_node(np, child)
- if (!strcmp(child->type, "cache"))
- return child;
+ return of_get_child_by_name(np, "cache");
return NULL;
}
--
1.7.0.4
^ permalink raw reply related
* [PATCH v2 1/7] dt: introduce of_get_child_by_name to get child node by name.
From: Srinivas KANDAGATLA @ 2012-09-14 8:17 UTC (permalink / raw)
To: davem, robherring2, bergner, devicetree-discuss
Cc: jassi.brar, srinivas.kandagatla, afleming, netdev, linuxppc-dev
From: Srinivas Kandagatla <srinivas.kandagatla@st.com>
This patch introduces of_get_child_by_name function to get a child node
by its name in a given parent node.
Without this patch each driver code has to iterate the parent and do
a string compare, However having of_get_child_by_name libary function would
avoid code duplication, errors and is more convenient.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
---
drivers/of/base.c | 25 +++++++++++++++++++++++++
include/linux/of.h | 2 ++
2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/drivers/of/base.c b/drivers/of/base.c
index d4a1c9a..7391f8a 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -391,6 +391,31 @@ struct device_node *of_get_next_available_child(const struct device_node *node,
EXPORT_SYMBOL(of_get_next_available_child);
/**
+ * of_get_child_by_name - Find the child node by name for a given parent
+ * @node: parent node
+ * @name: child name to look for.
+ *
+ * This function looks for child node for given matching name
+ *
+ * Returns a node pointer if found, with refcount incremented, use
+ * of_node_put() on it when done.
+ * Returns NULL if node is not found.
+ */
+
+struct device_node *of_get_child_by_name(const struct device_node *node,
+ const char *name)
+{
+ struct device_node *child;
+
+ for_each_child_of_node(node, child)
+ if (child->name && (of_node_cmp(child->name, name) == 0)
+ && of_node_get(child))
+ break;
+ return child;
+}
+EXPORT_SYMBOL(of_get_child_by_name);
+
+/**
* of_find_node_by_path - Find a node matching a full OF path
* @path: The full path to match
*
diff --git a/include/linux/of.h b/include/linux/of.h
index 1b11632..7b8e3cd 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -192,6 +192,8 @@ extern struct device_node *of_get_next_child(const struct device_node *node,
struct device_node *prev);
extern struct device_node *of_get_next_available_child(
const struct device_node *node, struct device_node *prev);
+extern struct device_node *of_get_child_by_name(const struct device_node *node,
+ const char *name);
#define for_each_child_of_node(parent, child) \
for (child = of_get_next_child(parent, NULL); child != NULL; \
--
1.7.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox