LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] i2c: move of_i2c_register_devices call into core
From: Rob Herring @ 2011-08-05 23:22 UTC (permalink / raw)
  To: Grant Likely
  Cc: Stephen Warren, Sebastian Andrzej Siewior, linux-kernel,
	Dirk Brandewie, linux-i2c, Ben Dooks, Jean Delvare, linuxppc-dev
In-Reply-To: <20110805225435.GA6404@ponder.secretlab.ca>

Grant,

On 08/05/2011 05:54 PM, Grant Likely wrote:
> On Fri, Aug 05, 2011 at 04:24:26PM -0500, Rob Herring wrote:
>> From: Rob Herring <rob.herring@calxeda.com>
>>
>> All callers of of_i2c_register_devices are immediately preceded by a call
>> to i2c_add_adapter or i2c_add_numbered_adapter. Add call to
>> of_i2c_register_devices and remove all other callers.
>>
>> This causes a module dependency loop that is resolved by the next commit.
> 
> Wrong way around.  Don't break bisectability.  I'll leave it up to Ben
> and Jean on weather or not they want to move this code.  I intend to
> do the same thing for spi/gpio, but I maintain those subsystems so I
> get to choose.  i2c is not up to me.
> 

Well, I know, but it's only broken for bisect in the case of both being
modules. So it's only run-time brokenness. That's better than the 3
months it was broken before. I couldn't come up with a better way. The
choices I thought of were:

-temporarily exporting and adding of_i2c_register_devices to i2c.h and
then removing it. I'm not a fan of adding that churn.
-just combining it all into 1 patch.
-reverse the order and leave i2c device registration broken for 1
commit. Thinking some more about it, perhaps that is a bit better than
broken module loading. Guess it depends if you are doing modules or
built-in.

Rob

> g.
> 
>>
>> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
>> Cc: Grant Likely <grant.likely@secretlab.ca>
>> Cc: Jochen Friedrich <jochen@scram.de>
>> Cc: Jean Delvare <khali@linux-fr.org>
>> Cc: Ben Dooks <ben-linux@fluff.org>
>> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
>> Cc: Stephen Warren <swarren@nvidia.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-i2c@vger.kernel.org
>> ---
>>  drivers/i2c/busses/i2c-cpm.c     |    6 ------
>>  drivers/i2c/busses/i2c-ibm_iic.c |    4 ----
>>  drivers/i2c/busses/i2c-mpc.c     |    2 --
>>  drivers/i2c/busses/i2c-pxa.c     |    2 --
>>  drivers/i2c/busses/i2c-tegra.c   |    3 ---
>>  drivers/i2c/i2c-core.c           |    4 ++++
>>  6 files changed, 4 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
>> index b1d9cd2..7cbc82a 100644
>> --- a/drivers/i2c/busses/i2c-cpm.c
>> +++ b/drivers/i2c/busses/i2c-cpm.c
>> @@ -42,7 +42,6 @@
>>  #include <linux/dma-mapping.h>
>>  #include <linux/of_device.h>
>>  #include <linux/of_platform.h>
>> -#include <linux/of_i2c.h>
>>  #include <sysdev/fsl_soc.h>
>>  #include <asm/cpm.h>
>>  
>> @@ -673,11 +672,6 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev)
>>  	dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
>>  		cpm->adap.name);
>>  
>> -	/*
>> -	 * register OF I2C devices
>> -	 */
>> -	of_i2c_register_devices(&cpm->adap);
>> -
>>  	return 0;
>>  out_shut:
>>  	cpm_i2c_shutdown(cpm);
>> diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
>> index 3c110fb..5ba15ba 100644
>> --- a/drivers/i2c/busses/i2c-ibm_iic.c
>> +++ b/drivers/i2c/busses/i2c-ibm_iic.c
>> @@ -42,7 +42,6 @@
>>  #include <linux/io.h>
>>  #include <linux/i2c.h>
>>  #include <linux/of_platform.h>
>> -#include <linux/of_i2c.h>
>>  
>>  #include "i2c-ibm_iic.h"
>>  
>> @@ -759,9 +758,6 @@ static int __devinit iic_probe(struct platform_device *ofdev)
>>  	dev_info(&ofdev->dev, "using %s mode\n",
>>  		 dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
>>  
>> -	/* Now register all the child nodes */
>> -	of_i2c_register_devices(adap);
>> -
>>  	return 0;
>>  
>>  error_cleanup:
>> diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
>> index 107397a..a433f59 100644
>> --- a/drivers/i2c/busses/i2c-mpc.c
>> +++ b/drivers/i2c/busses/i2c-mpc.c
>> @@ -18,7 +18,6 @@
>>  #include <linux/sched.h>
>>  #include <linux/init.h>
>>  #include <linux/of_platform.h>
>> -#include <linux/of_i2c.h>
>>  #include <linux/slab.h>
>>  
>>  #include <linux/io.h>
>> @@ -637,7 +636,6 @@ static int __devinit fsl_i2c_probe(struct platform_device *op)
>>  		dev_err(i2c->dev, "failed to add adapter\n");
>>  		goto fail_add;
>>  	}
>> -	of_i2c_register_devices(&i2c->adap);
>>  
>>  	return result;
>>  
>> diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c
>> index d603646..c1c2885 100644
>> --- a/drivers/i2c/busses/i2c-pxa.c
>> +++ b/drivers/i2c/busses/i2c-pxa.c
>> @@ -29,7 +29,6 @@
>>  #include <linux/errno.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/i2c-pxa.h>
>> -#include <linux/of_i2c.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/err.h>
>>  #include <linux/clk.h>
>> @@ -1147,7 +1146,6 @@ static int i2c_pxa_probe(struct platform_device *dev)
>>  		printk(KERN_INFO "I2C: Failed to add bus\n");
>>  		goto eadapt;
>>  	}
>> -	of_i2c_register_devices(&i2c->adap);
>>  
>>  	platform_set_drvdata(dev, i2c);
>>  
>> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
>> index 2440b74..9ec0a58 100644
>> --- a/drivers/i2c/busses/i2c-tegra.c
>> +++ b/drivers/i2c/busses/i2c-tegra.c
>> @@ -26,7 +26,6 @@
>>  #include <linux/delay.h>
>>  #include <linux/slab.h>
>>  #include <linux/i2c-tegra.h>
>> -#include <linux/of_i2c.h>
>>  
>>  #include <asm/unaligned.h>
>>  
>> @@ -653,8 +652,6 @@ static int tegra_i2c_probe(struct platform_device *pdev)
>>  		goto err_free_irq;
>>  	}
>>  
>> -	of_i2c_register_devices(&i2c_dev->adapter);
>> -
>>  	return 0;
>>  err_free_irq:
>>  	free_irq(i2c_dev->irq, i2c_dev);
>> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
>> index 131079a..011e195 100644
>> --- a/drivers/i2c/i2c-core.c
>> +++ b/drivers/i2c/i2c-core.c
>> @@ -32,6 +32,7 @@
>>  #include <linux/init.h>
>>  #include <linux/idr.h>
>>  #include <linux/mutex.h>
>> +#include <linux/of_i2c.h>
>>  #include <linux/of_device.h>
>>  #include <linux/completion.h>
>>  #include <linux/hardirq.h>
>> @@ -863,6 +864,9 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
>>  	if (adap->nr < __i2c_first_dynamic_bus_num)
>>  		i2c_scan_static_board_info(adap);
>>  
>> +	/* Register devices from the device tree */
>> +	of_i2c_register_devices(adap);
>> +
>>  	/* Notify drivers */
>>  	mutex_lock(&core_lock);
>>  	bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);
>> -- 
>> 1.7.4.1
>>

^ permalink raw reply

* Re: [PATCH 1/3] i2c: move of_i2c_register_devices call into core
From: Grant Likely @ 2011-08-05 22:54 UTC (permalink / raw)
  To: Rob Herring
  Cc: Stephen Warren, Sebastian Andrzej Siewior, Rob Herring,
	linux-kernel, Dirk Brandewie, linux-i2c, Ben Dooks, Jean Delvare,
	linuxppc-dev
In-Reply-To: <1312579468-19365-2-git-send-email-robherring2@gmail.com>

On Fri, Aug 05, 2011 at 04:24:26PM -0500, Rob Herring wrote:
> From: Rob Herring <rob.herring@calxeda.com>
> 
> All callers of of_i2c_register_devices are immediately preceded by a call
> to i2c_add_adapter or i2c_add_numbered_adapter. Add call to
> of_i2c_register_devices and remove all other callers.
> 
> This causes a module dependency loop that is resolved by the next commit.

Wrong way around.  Don't break bisectability.  I'll leave it up to Ben
and Jean on weather or not they want to move this code.  I intend to
do the same thing for spi/gpio, but I maintain those subsystems so I
get to choose.  i2c is not up to me.

g.

> 
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Cc: Grant Likely <grant.likely@secretlab.ca>
> Cc: Jochen Friedrich <jochen@scram.de>
> Cc: Jean Delvare <khali@linux-fr.org>
> Cc: Ben Dooks <ben-linux@fluff.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
> Cc: Stephen Warren <swarren@nvidia.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-i2c@vger.kernel.org
> ---
>  drivers/i2c/busses/i2c-cpm.c     |    6 ------
>  drivers/i2c/busses/i2c-ibm_iic.c |    4 ----
>  drivers/i2c/busses/i2c-mpc.c     |    2 --
>  drivers/i2c/busses/i2c-pxa.c     |    2 --
>  drivers/i2c/busses/i2c-tegra.c   |    3 ---
>  drivers/i2c/i2c-core.c           |    4 ++++
>  6 files changed, 4 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
> index b1d9cd2..7cbc82a 100644
> --- a/drivers/i2c/busses/i2c-cpm.c
> +++ b/drivers/i2c/busses/i2c-cpm.c
> @@ -42,7 +42,6 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/of_device.h>
>  #include <linux/of_platform.h>
> -#include <linux/of_i2c.h>
>  #include <sysdev/fsl_soc.h>
>  #include <asm/cpm.h>
>  
> @@ -673,11 +672,6 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev)
>  	dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
>  		cpm->adap.name);
>  
> -	/*
> -	 * register OF I2C devices
> -	 */
> -	of_i2c_register_devices(&cpm->adap);
> -
>  	return 0;
>  out_shut:
>  	cpm_i2c_shutdown(cpm);
> diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
> index 3c110fb..5ba15ba 100644
> --- a/drivers/i2c/busses/i2c-ibm_iic.c
> +++ b/drivers/i2c/busses/i2c-ibm_iic.c
> @@ -42,7 +42,6 @@
>  #include <linux/io.h>
>  #include <linux/i2c.h>
>  #include <linux/of_platform.h>
> -#include <linux/of_i2c.h>
>  
>  #include "i2c-ibm_iic.h"
>  
> @@ -759,9 +758,6 @@ static int __devinit iic_probe(struct platform_device *ofdev)
>  	dev_info(&ofdev->dev, "using %s mode\n",
>  		 dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
>  
> -	/* Now register all the child nodes */
> -	of_i2c_register_devices(adap);
> -
>  	return 0;
>  
>  error_cleanup:
> diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
> index 107397a..a433f59 100644
> --- a/drivers/i2c/busses/i2c-mpc.c
> +++ b/drivers/i2c/busses/i2c-mpc.c
> @@ -18,7 +18,6 @@
>  #include <linux/sched.h>
>  #include <linux/init.h>
>  #include <linux/of_platform.h>
> -#include <linux/of_i2c.h>
>  #include <linux/slab.h>
>  
>  #include <linux/io.h>
> @@ -637,7 +636,6 @@ static int __devinit fsl_i2c_probe(struct platform_device *op)
>  		dev_err(i2c->dev, "failed to add adapter\n");
>  		goto fail_add;
>  	}
> -	of_i2c_register_devices(&i2c->adap);
>  
>  	return result;
>  
> diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c
> index d603646..c1c2885 100644
> --- a/drivers/i2c/busses/i2c-pxa.c
> +++ b/drivers/i2c/busses/i2c-pxa.c
> @@ -29,7 +29,6 @@
>  #include <linux/errno.h>
>  #include <linux/interrupt.h>
>  #include <linux/i2c-pxa.h>
> -#include <linux/of_i2c.h>
>  #include <linux/platform_device.h>
>  #include <linux/err.h>
>  #include <linux/clk.h>
> @@ -1147,7 +1146,6 @@ static int i2c_pxa_probe(struct platform_device *dev)
>  		printk(KERN_INFO "I2C: Failed to add bus\n");
>  		goto eadapt;
>  	}
> -	of_i2c_register_devices(&i2c->adap);
>  
>  	platform_set_drvdata(dev, i2c);
>  
> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> index 2440b74..9ec0a58 100644
> --- a/drivers/i2c/busses/i2c-tegra.c
> +++ b/drivers/i2c/busses/i2c-tegra.c
> @@ -26,7 +26,6 @@
>  #include <linux/delay.h>
>  #include <linux/slab.h>
>  #include <linux/i2c-tegra.h>
> -#include <linux/of_i2c.h>
>  
>  #include <asm/unaligned.h>
>  
> @@ -653,8 +652,6 @@ static int tegra_i2c_probe(struct platform_device *pdev)
>  		goto err_free_irq;
>  	}
>  
> -	of_i2c_register_devices(&i2c_dev->adapter);
> -
>  	return 0;
>  err_free_irq:
>  	free_irq(i2c_dev->irq, i2c_dev);
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index 131079a..011e195 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -32,6 +32,7 @@
>  #include <linux/init.h>
>  #include <linux/idr.h>
>  #include <linux/mutex.h>
> +#include <linux/of_i2c.h>
>  #include <linux/of_device.h>
>  #include <linux/completion.h>
>  #include <linux/hardirq.h>
> @@ -863,6 +864,9 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
>  	if (adap->nr < __i2c_first_dynamic_bus_num)
>  		i2c_scan_static_board_info(adap);
>  
> +	/* Register devices from the device tree */
> +	of_i2c_register_devices(adap);
> +
>  	/* Notify drivers */
>  	mutex_lock(&core_lock);
>  	bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);
> -- 
> 1.7.4.1
> 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-05 22:49 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Anthony Liguori, linuxppc-dev
In-Reply-To: <20110805134446.GC30353@8bytes.org>

On Fri, 2011-08-05 at 15:44 +0200, Joerg Roedel wrote:
> On Fri, Aug 05, 2011 at 08:42:38PM +1000, Benjamin Herrenschmidt wrote:
> 
> > Right. In fact to try to clarify the problem for everybody, I think we
> > can distinguish two different classes of "constraints" that can
> > influence the grouping of devices:
> > 
> >  1- Hard constraints. These are typically devices using the same RID or
> > where the RID cannot be reliably guaranteed (the later is the case with
> > some PCIe-PCIX bridges which will take ownership of "some" transactions
> > such as split but not all). Devices like that must be in the same
> > domain. This is where PowerPC adds to what x86 does today the concept
> > that the domains are pre-existing, since we use the RID for error
> > isolation & MMIO segmenting as well. so we need to create those domains
> > at boot time.
> 
> Domains (in the iommu-sense) are created at boot time on x86 today.
> Every device needs at least a domain to provide dma-mapping
> functionality to the drivers. So all the grouping is done too at
> boot-time. This is specific to the iommu-drivers today but can be
> generalized I think.

Ok, let's go there then.

> >  2- Softer constraints. Those constraints derive from the fact that not
> > applying them risks enabling the guest to create side effects outside of
> > its "sandbox". To some extent, there can be "degrees" of badness between
> > the various things that can cause such constraints. Examples are shared
> > LSIs (since trusting DisINTx can be chancy, see earlier discussions),
> > potentially any set of functions in the same device can be problematic
> > due to the possibility to get backdoor access to the BARs etc...
> 
> Hmm, there is no sane way to handle such constraints in a safe way,
> right? We can either blacklist devices which are know to have such
> backdoors or we just ignore the problem.

Arguably they probably all do have such backdoors. A debug register,
JTAG register, ... My point is you don't really know unless you get
manufacturer guarantee that there is no undocumented register somewhere
or way to change the microcode so that it does it etc.... The more
complex the devices, the less likely to have a guarantee.

The "safe" way is what pHyp does and basically boils down to only
allowing pass-through of entire 'slots', ie, things that are behind a
P2P bridge (virtual one typically, ie, a PCIe switch) and disallowing
pass-through with shared interrupts.

That way, even if the guest can move the BARs around, it cannot make
them overlap somebody else device because the parent bridge restricts
the portion of MMIO space that is forwarded down to that device anyway.

> > Now, what I derive from the discussion we've had so far, is that we need
> > to find a proper fix for #1, but Alex and Avi seem to prefer that #2
> > remains a matter of libvirt/user doing the right thing (basically
> > keeping a loaded gun aimed at the user's foot with a very very very
> > sweet trigger but heh, let's not start a flamewar here :-)
> > 
> > So let's try to find a proper solution for #1 now, and leave #2 alone
> > for the time being.
> 
> Yes, and the solution for #1 should be entirely in the kernel. The
> question is how to do that. Probably the most sane way is to introduce a
> concept of device ownership. The ownership can either be a kernel driver
> or a userspace process. Giving ownership of a device to userspace is
> only possible if all devices in the same group are unbound from its
> respective drivers. This is a very intrusive concept, no idea if it
> has a chance of acceptance :-)
> But the advantage is clearly that this allows better semantics in the
> IOMMU drivers and a more stable handover of devices from host drivers to
> kvm guests.

I tend to think around those lines too, but the ownership concept
doesn't necessarily have to be core-kernel enforced itself, it can be in
VFIO.

If we have a common API to expose the "domain number", it can perfectly
be a matter of VFIO itself not allowing to do pass-through until it has 
attached its stub driver to all the devices with that domain number, and
it can handle exclusion of iommu domains from there.

> > Maybe the right option is for x86 to move toward pre-existing domains
> > like powerpc does, or maybe we can just expose some kind of ID.
> 
> As I said, the domains are created a iommu driver initialization time
> (usually boot time). But the groups are internal to the iommu drivers
> and not visible somewhere else.

That's what we need to fix :-)

> > Ah you started answering to my above questions :-)
> > 
> > We could do what you propose. It depends what we want to do with
> > domains. Practically speaking, we could make domains pre-existing (with
> > the ability to group several PEs into larger domains) or we could keep
> > the concepts different, possibly with the limitation that on powerpc, a
> > domain == a PE.
> > 
> > I suppose we -could- make arbitrary domains on ppc as well by making the
> > various PE's iommu's in HW point to the same in-memory table, but that's
> > a bit nasty in practice due to the way we manage those, and it would to
> > some extent increase the risk of a failing device/driver stomping on
> > another one and thus taking it down with itself. IE. isolation of errors
> > is an important feature for us.
> 
> These arbitrary domains exist in the iommu-api. It would be good to
> emulate them on Power too. Can't you put a PE into an isolated
> error-domain when something goes wrong with it? This should provide the
> same isolation as before.

Well, my problem is that it's quite hard for me to arbitrarily make PEs
shared the same iommu table. The iommu tables are assigned at boot time
along with the creation of the PEs, and because sadly, I don't (yet)
support tree structures for them, they are large physically contiguous
things, so I need to allocate them early and keep them around.

I -could- make a hack to share tables when creating such arbitrary
domains, but I would definitely have to keep track of the "original"
table of the PE so that can be reverted, I can't afford to free the
memory or I risk not being able to re-allocate it.

We'll have tree iommu's in future HW but not yet.

> What you derive the group number from is your business :-) On x86 it is
> certainly the best to use the RID these devices share together with the
> PCI segment number.

Ok. The question is more in term of API whether this number is to be
unique at the system scope or only at the PCI host bridge scope. 

Cheers,
Ben.

> Regards,
> 
> 	Joerg

^ permalink raw reply

* [PATCH 1/3] i2c: move of_i2c_register_devices call into core
From: Rob Herring @ 2011-08-05 21:24 UTC (permalink / raw)
  To: Grant Likely
  Cc: Stephen Warren, Sebastian Andrzej Siewior, Rob Herring,
	linux-kernel, Dirk Brandewie, linux-i2c, Ben Dooks, Jean Delvare,
	linuxppc-dev
In-Reply-To: <1312579468-19365-1-git-send-email-robherring2@gmail.com>

From: Rob Herring <rob.herring@calxeda.com>

All callers of of_i2c_register_devices are immediately preceded by a call
to i2c_add_adapter or i2c_add_numbered_adapter. Add call to
of_i2c_register_devices and remove all other callers.

This causes a module dependency loop that is resolved by the next commit.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Jochen Friedrich <jochen@scram.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-i2c@vger.kernel.org
---
 drivers/i2c/busses/i2c-cpm.c     |    6 ------
 drivers/i2c/busses/i2c-ibm_iic.c |    4 ----
 drivers/i2c/busses/i2c-mpc.c     |    2 --
 drivers/i2c/busses/i2c-pxa.c     |    2 --
 drivers/i2c/busses/i2c-tegra.c   |    3 ---
 drivers/i2c/i2c-core.c           |    4 ++++
 6 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/i2c/busses/i2c-cpm.c b/drivers/i2c/busses/i2c-cpm.c
index b1d9cd2..7cbc82a 100644
--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -42,7 +42,6 @@
 #include <linux/dma-mapping.h>
 #include <linux/of_device.h>
 #include <linux/of_platform.h>
-#include <linux/of_i2c.h>
 #include <sysdev/fsl_soc.h>
 #include <asm/cpm.h>
 
@@ -673,11 +672,6 @@ static int __devinit cpm_i2c_probe(struct platform_device *ofdev)
 	dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
 		cpm->adap.name);
 
-	/*
-	 * register OF I2C devices
-	 */
-	of_i2c_register_devices(&cpm->adap);
-
 	return 0;
 out_shut:
 	cpm_i2c_shutdown(cpm);
diff --git a/drivers/i2c/busses/i2c-ibm_iic.c b/drivers/i2c/busses/i2c-ibm_iic.c
index 3c110fb..5ba15ba 100644
--- a/drivers/i2c/busses/i2c-ibm_iic.c
+++ b/drivers/i2c/busses/i2c-ibm_iic.c
@@ -42,7 +42,6 @@
 #include <linux/io.h>
 #include <linux/i2c.h>
 #include <linux/of_platform.h>
-#include <linux/of_i2c.h>
 
 #include "i2c-ibm_iic.h"
 
@@ -759,9 +758,6 @@ static int __devinit iic_probe(struct platform_device *ofdev)
 	dev_info(&ofdev->dev, "using %s mode\n",
 		 dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
 
-	/* Now register all the child nodes */
-	of_i2c_register_devices(adap);
-
 	return 0;
 
 error_cleanup:
diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 107397a..a433f59 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -18,7 +18,6 @@
 #include <linux/sched.h>
 #include <linux/init.h>
 #include <linux/of_platform.h>
-#include <linux/of_i2c.h>
 #include <linux/slab.h>
 
 #include <linux/io.h>
@@ -637,7 +636,6 @@ static int __devinit fsl_i2c_probe(struct platform_device *op)
 		dev_err(i2c->dev, "failed to add adapter\n");
 		goto fail_add;
 	}
-	of_i2c_register_devices(&i2c->adap);
 
 	return result;
 
diff --git a/drivers/i2c/busses/i2c-pxa.c b/drivers/i2c/busses/i2c-pxa.c
index d603646..c1c2885 100644
--- a/drivers/i2c/busses/i2c-pxa.c
+++ b/drivers/i2c/busses/i2c-pxa.c
@@ -29,7 +29,6 @@
 #include <linux/errno.h>
 #include <linux/interrupt.h>
 #include <linux/i2c-pxa.h>
-#include <linux/of_i2c.h>
 #include <linux/platform_device.h>
 #include <linux/err.h>
 #include <linux/clk.h>
@@ -1147,7 +1146,6 @@ static int i2c_pxa_probe(struct platform_device *dev)
 		printk(KERN_INFO "I2C: Failed to add bus\n");
 		goto eadapt;
 	}
-	of_i2c_register_devices(&i2c->adap);
 
 	platform_set_drvdata(dev, i2c);
 
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 2440b74..9ec0a58 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -26,7 +26,6 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/i2c-tegra.h>
-#include <linux/of_i2c.h>
 
 #include <asm/unaligned.h>
 
@@ -653,8 +652,6 @@ static int tegra_i2c_probe(struct platform_device *pdev)
 		goto err_free_irq;
 	}
 
-	of_i2c_register_devices(&i2c_dev->adapter);
-
 	return 0;
 err_free_irq:
 	free_irq(i2c_dev->irq, i2c_dev);
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 131079a..011e195 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,6 +32,7 @@
 #include <linux/init.h>
 #include <linux/idr.h>
 #include <linux/mutex.h>
+#include <linux/of_i2c.h>
 #include <linux/of_device.h>
 #include <linux/completion.h>
 #include <linux/hardirq.h>
@@ -863,6 +864,9 @@ static int i2c_register_adapter(struct i2c_adapter *adap)
 	if (adap->nr < __i2c_first_dynamic_bus_num)
 		i2c_scan_static_board_info(adap);
 
+	/* Register devices from the device tree */
+	of_i2c_register_devices(adap);
+
 	/* Notify drivers */
 	mutex_lock(&core_lock);
 	bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);
-- 
1.7.4.1

^ permalink raw reply related

* Re: FW: Ethernet driver WR linux
From: Scott Wood @ 2011-08-05 18:12 UTC (permalink / raw)
  To: smitha.vanga; +Cc: linuxppc-dev
In-Reply-To: <07ACDFB8ECA8EF47863A613BC01BBB220344E2AB@HYD-MKD-MBX02.wipro.com>

On 08/05/2011 12:31 AM, smitha.vanga@wipro.com wrote:
> Hi
>  
> I have bringup WR linux 2.6.38 on a custom mpc8247 board. Not able to
> configure the ipaddress.  When I boot my linux kernel I get the below
> message.
> Also I am using the fec 1.  The fec is not getting initialized. 
> fs_enet.c:v1.0 (Aug 8, 2005)
> BB MII Bus: Cannot register as MDIO bus
> fsl-bb-mdio: probe of fsl-bb-mdio.0 failed with error -1
> BB MII Bus: Cannot register as MDIO bus
> fsl-bb-mdio: probe of fsl-bb-mdio.1 failed with error -1
> I have enabled the following:
>  
> 
> CONFIG_NET_ETHERNET=y
> 
> CONFIG_MII=y
> 
> CONFIG_FS_ENET=y
> 
> # CONFIG_FS_ENET_HAS_SCC is not set
> 
> CONFIG_FS_ENET_HAS_FCC=y

Can you attach your device tree?  What sort of mdio do you have on your
board?

> *Please do not print this email unless it is absolutely necessary. *
> 
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s)
> and may contain proprietary, confidential or privileged information. If
> you are not the intended recipient, you should not disseminate,
> distribute or copy this e-mail. Please notify the sender immediately and
> destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
> 
> www.wipro.com

Is there any way you can turn this crud off?

-Scott

^ permalink raw reply

* Re: linux-image-2.6.39-2-powerpc: CHRP Pegasos2 boot failure
From: Jonathan Nieder @ 2011-08-05 17:53 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Michael Ellerman, 630845, Ben Hutchings, linuxppc-dev,
	Andrew Buckeridge
In-Reply-To: <20110628125148.GA10884@iram.es>

forwarded 630845 http://thread.gmane.org/gmane.linux.ports.ppc.embedded/43997
quit

Hi,

Gabriel Paubert wrote:
>>>> Andrew Buckeridge wrote:

>>>>> linux-image-2.6.36-trunk-powerpc_2.6.36-1~experimental.1_powerpc.deb
>>>>> linux-image-2.6.37-1-powerpc_2.6.37-1_powerpc.deb
>>>>> linux-image-2.6.37-2-powerpc_2.6.37-2_powerpc.deb
>>>>> These failed to boot. In all cases stuck at the spinner.
[...]
> What do you mean by the spinner? I've had very long boot times with 
> an apparently dead machine depending on graphics options.
>
> For now I'm running 2.6.39 with one patch for keyboard/mouse handling
> which is now upstream.
>
> I can try a more recent kernel on Thursday.

Thanks.  Please forgive my ignorance:

 - Gabriel, what options do you suggest for a reasonably fast boot?
   Does Debian's default configuration need corresponding changes?

 - Andrew, what do you mean by the spinner?  Could you take a photograph?

 - Has anything changed in the month or so since this report was last
   discussed?

^ permalink raw reply

* Re: MESH SCSI driver not working
From: kevin diggs @ 2011-08-05 17:34 UTC (permalink / raw)
  To: Pol Vangheluwe; +Cc: linuxppc-dev
In-Reply-To: <F0034E2F-AAAD-4DAC-A35A-E532B05B3210@skynet.be>

Hi,

I'll add some noise to this. I have a PowerMac8600 (with a 750GX in it
(in case that makes a difference)) and I can't boot 2.6.36 if it is
compiled with a gcc version newer than 4.1.2. Both 4.2.4 and 4.3.5
will hang. 4.1.2 seems ok.

kevin

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-05 15:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, Joerg Roedel, David Gibson,
	Anthony Liguori, linuxppc-dev
In-Reply-To: <1312540958.8598.46.camel@pasglop>

On Fri, 2011-08-05 at 20:42 +1000, Benjamin Herrenschmidt wrote:
> Right. In fact to try to clarify the problem for everybody, I think we
> can distinguish two different classes of "constraints" that can
> influence the grouping of devices:
> 
>  1- Hard constraints. These are typically devices using the same RID or
> where the RID cannot be reliably guaranteed (the later is the case with
> some PCIe-PCIX bridges which will take ownership of "some" transactions
> such as split but not all). Devices like that must be in the same
> domain. This is where PowerPC adds to what x86 does today the concept
> that the domains are pre-existing, since we use the RID for error
> isolation & MMIO segmenting as well. so we need to create those domains
> at boot time.
> 
>  2- Softer constraints. Those constraints derive from the fact that not
> applying them risks enabling the guest to create side effects outside of
> its "sandbox". To some extent, there can be "degrees" of badness between
> the various things that can cause such constraints. Examples are shared
> LSIs (since trusting DisINTx can be chancy, see earlier discussions),
> potentially any set of functions in the same device can be problematic
> due to the possibility to get backdoor access to the BARs etc...

This is what I've been trying to get to, hardware constraints vs system
policy constraints.

> Now, what I derive from the discussion we've had so far, is that we need
> to find a proper fix for #1, but Alex and Avi seem to prefer that #2
> remains a matter of libvirt/user doing the right thing (basically
> keeping a loaded gun aimed at the user's foot with a very very very
> sweet trigger but heh, let's not start a flamewar here :-)

Doesn't your own uncertainty of whether or not to allow this lead to the
same conclusion, that it belongs in userspace policy?  I don't think we
want to make white lists of which devices we trust to do DisINTx
correctly part of the kernel interface, do we?  Thanks,

Alex

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-05 13:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Anthony Liguori, linuxppc-dev
In-Reply-To: <1312540958.8598.46.camel@pasglop>

On Fri, Aug 05, 2011 at 08:42:38PM +1000, Benjamin Herrenschmidt wrote:

> Right. In fact to try to clarify the problem for everybody, I think we
> can distinguish two different classes of "constraints" that can
> influence the grouping of devices:
> 
>  1- Hard constraints. These are typically devices using the same RID or
> where the RID cannot be reliably guaranteed (the later is the case with
> some PCIe-PCIX bridges which will take ownership of "some" transactions
> such as split but not all). Devices like that must be in the same
> domain. This is where PowerPC adds to what x86 does today the concept
> that the domains are pre-existing, since we use the RID for error
> isolation & MMIO segmenting as well. so we need to create those domains
> at boot time.

Domains (in the iommu-sense) are created at boot time on x86 today.
Every device needs at least a domain to provide dma-mapping
functionality to the drivers. So all the grouping is done too at
boot-time. This is specific to the iommu-drivers today but can be
generalized I think.

>  2- Softer constraints. Those constraints derive from the fact that not
> applying them risks enabling the guest to create side effects outside of
> its "sandbox". To some extent, there can be "degrees" of badness between
> the various things that can cause such constraints. Examples are shared
> LSIs (since trusting DisINTx can be chancy, see earlier discussions),
> potentially any set of functions in the same device can be problematic
> due to the possibility to get backdoor access to the BARs etc...

Hmm, there is no sane way to handle such constraints in a safe way,
right? We can either blacklist devices which are know to have such
backdoors or we just ignore the problem.

> Now, what I derive from the discussion we've had so far, is that we need
> to find a proper fix for #1, but Alex and Avi seem to prefer that #2
> remains a matter of libvirt/user doing the right thing (basically
> keeping a loaded gun aimed at the user's foot with a very very very
> sweet trigger but heh, let's not start a flamewar here :-)
> 
> So let's try to find a proper solution for #1 now, and leave #2 alone
> for the time being.

Yes, and the solution for #1 should be entirely in the kernel. The
question is how to do that. Probably the most sane way is to introduce a
concept of device ownership. The ownership can either be a kernel driver
or a userspace process. Giving ownership of a device to userspace is
only possible if all devices in the same group are unbound from its
respective drivers. This is a very intrusive concept, no idea if it
has a chance of acceptance :-)
But the advantage is clearly that this allows better semantics in the
IOMMU drivers and a more stable handover of devices from host drivers to
kvm guests.

> Maybe the right option is for x86 to move toward pre-existing domains
> like powerpc does, or maybe we can just expose some kind of ID.

As I said, the domains are created a iommu driver initialization time
(usually boot time). But the groups are internal to the iommu drivers
and not visible somewhere else.

> Ah you started answering to my above questions :-)
> 
> We could do what you propose. It depends what we want to do with
> domains. Practically speaking, we could make domains pre-existing (with
> the ability to group several PEs into larger domains) or we could keep
> the concepts different, possibly with the limitation that on powerpc, a
> domain == a PE.
> 
> I suppose we -could- make arbitrary domains on ppc as well by making the
> various PE's iommu's in HW point to the same in-memory table, but that's
> a bit nasty in practice due to the way we manage those, and it would to
> some extent increase the risk of a failing device/driver stomping on
> another one and thus taking it down with itself. IE. isolation of errors
> is an important feature for us.

These arbitrary domains exist in the iommu-api. It would be good to
emulate them on Power too. Can't you put a PE into an isolated
error-domain when something goes wrong with it? This should provide the
same isolation as before.
What you derive the group number from is your business :-) On x86 it is
certainly the best to use the RID these devices share together with the
PCI segment number.

Regards,

	Joerg

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-05 12:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Avi Kivity, Anthony Liguori, linuxppc-dev
In-Reply-To: <1312539971.8598.29.camel@pasglop>

On Fri, Aug 05, 2011 at 08:26:11PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2011-08-04 at 12:41 +0200, Joerg Roedel wrote:
> > On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote:
> > > It's not clear to me how we could skip it.  With VT-d, we'd have to
> > > implement an emulated interrupt remapper and hope that the guest picks
> > > unused indexes in the host interrupt remapping table before it could do
> > > anything useful with direct access to the MSI-X table.  Maybe AMD IOMMU
> > > makes this easier?
> > 
> > AMD IOMMU provides remapping tables per-device, and not a global one.
> > But that does not make direct guest-access to the MSI-X table safe. The
> > table contains the table contains the interrupt-type and the vector
> > which is used as an index into the remapping table by the IOMMU. So when
> > the guest writes into its MSI-X table the remapping-table in the host
> > needs to be updated too.
> 
> Right, you need paravirt to avoid filtering :-)

Or a shadow MSI-X table like done on x86. How to handle this seems to be
platform specific. As you indicate there is a standardized paravirt
interface for that on Power.

> IE the problem is two fold:
> 
>  - Getting the right value in the table / remapper so things work
> (paravirt)
> 
>  - Protecting against the guest somewhat managing to change the value in
> the table (either directly or via a backdoor access to its own config
> space).
> 
> The later for us comes from the HW PE filtering of the MSI transactions.

Right. The second part of the problem can be avoided with
interrupt-remapping/filtering hardware in the IOMMUs.

	Joerg

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-05 10:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Anthony Liguori, linuxppc-dev
In-Reply-To: <20110804102752.GA22329@8bytes.org>

On Thu, 2011-08-04 at 12:27 +0200, Joerg Roedel wrote:
> Hi Ben,
> 
> thanks for your detailed introduction to the requirements for POWER. Its
> good to know that the granularity problem is not x86-only.

I'm happy to see your reply :-) I had the feeling I was a bit alone
here...

> On Sat, Jul 30, 2011 at 09:58:53AM +1000, Benjamin Herrenschmidt wrote:
> > In IBM POWER land, we call this a "partitionable endpoint" (the term
> > "endpoint" here is historic, such a PE can be made of several PCIe
> > "endpoints"). I think "partitionable" is a pretty good name tho to
> > represent the constraints, so I'll call this a "partitionable group"
> > from now on.
> 
> On x86 this is mostly an issue of the IOMMU and which set of devices use
> the same request-id. I used to call that an alias-group because the
> devices have a request-id alias to the pci-bridge.

Right. In fact to try to clarify the problem for everybody, I think we
can distinguish two different classes of "constraints" that can
influence the grouping of devices:

 1- Hard constraints. These are typically devices using the same RID or
where the RID cannot be reliably guaranteed (the later is the case with
some PCIe-PCIX bridges which will take ownership of "some" transactions
such as split but not all). Devices like that must be in the same
domain. This is where PowerPC adds to what x86 does today the concept
that the domains are pre-existing, since we use the RID for error
isolation & MMIO segmenting as well. so we need to create those domains
at boot time.

 2- Softer constraints. Those constraints derive from the fact that not
applying them risks enabling the guest to create side effects outside of
its "sandbox". To some extent, there can be "degrees" of badness between
the various things that can cause such constraints. Examples are shared
LSIs (since trusting DisINTx can be chancy, see earlier discussions),
potentially any set of functions in the same device can be problematic
due to the possibility to get backdoor access to the BARs etc...

Now, what I derive from the discussion we've had so far, is that we need
to find a proper fix for #1, but Alex and Avi seem to prefer that #2
remains a matter of libvirt/user doing the right thing (basically
keeping a loaded gun aimed at the user's foot with a very very very
sweet trigger but heh, let's not start a flamewar here :-)

So let's try to find a proper solution for #1 now, and leave #2 alone
for the time being.

Maybe the right option is for x86 to move toward pre-existing domains
like powerpc does, or maybe we can just expose some kind of ID.

Because #1 is a mix of generic constraints (nasty bridges) and very
platform specific ones (whatever capacity limits in our MMIO segmenting
forced us to put two devices in the same hard domain on power), I
believe it's really something the kernel must solve, not libvirt nor
qemu user or anything else.

I am open to suggestions here. I can easily expose my PE# (it's just a
number) somewhere in sysfs, in fact I'm considering doing it in the PCI
devices sysfs directory, simply because it can/will be useful for other
things such as error reporting, so we could maybe build on that.

The crux for me is really the need for pre-existence of the iommu
domains as my PE's imply a shared iommu space.

> > - The -minimum- granularity of pass-through is not always a single
> > device and not always under SW control
> 
> Correct.
>  
> > - Having a magic heuristic in libvirt to figure out those constraints is
> > WRONG. This reeks of XFree 4 PCI layer trying to duplicate the kernel
> > knowledge of PCI resource management and getting it wrong in many many
> > cases, something that took years to fix essentially by ripping it all
> > out. This is kernel knowledge and thus we need the kernel to expose in a
> > way or another what those constraints are, what those "partitionable
> > groups" are.
> 
> I agree. Managing the ownership of a group should be done in the kernel.
> Doing this in userspace is just too dangerous.
> 
> The problem to be solved here is how to present these PEs inside the
> kernel and to userspace. I thought a bit about making this visbible
> through the iommu-api for in-kernel users. That is probably the most
> logical place.

Ah you started answering to my above questions :-)

We could do what you propose. It depends what we want to do with
domains. Practically speaking, we could make domains pre-existing (with
the ability to group several PEs into larger domains) or we could keep
the concepts different, possibly with the limitation that on powerpc, a
domain == a PE.

I suppose we -could- make arbitrary domains on ppc as well by making the
various PE's iommu's in HW point to the same in-memory table, but that's
a bit nasty in practice due to the way we manage those, and it would to
some extent increase the risk of a failing device/driver stomping on
another one and thus taking it down with itself. IE. isolation of errors
is an important feature for us.

So I'd rather avoid the whole domain thing for now and keep the
constraint, for powerpc at least, that a domain == a PE, and thus find a
proper way to expose that to qemu/libvirt.

> For userspace I would like to propose a new device attribute in sysfs.
> This attribute contains the group number. All devices with the same
> group number belong to the same PE. Libvirt needs to scan the whole
> device tree to build the groups but that is probalbly not a big deal.

That's trivial for me to map that to my existing PE number. Should we
define the number space to be within a PCI domain (ie a host bridge). Or
should it be a global space ? In the later case I can construct them
using domain << 16 | PE# or something like that.

Cheers,
Ben.

>	Joerg
> 
> > 
> > - That does -not- mean that we cannot specify for each individual device
> > within such a group where we want to put it in qemu (what devfn etc...).
> > As long as there is a clear understanding that the "ownership" of the
> > device goes with the group, this is somewhat orthogonal to how they are
> > represented in qemu. (Not completely... if the iommu is exposed to the
> > guest ,via paravirt for example, some of these constraints must be
> > exposed but I'll talk about that more later).
> > 
> > The interface currently proposed for VFIO (and associated uiommu)
> > doesn't handle that problem at all. Instead, it is entirely centered
> > around a specific "feature" of the VTd iommu's for creating arbitrary
> > domains with arbitrary devices (tho those devices -do- have the same
> > constraints exposed above, don't try to put 2 legacy PCI devices behind
> > the same bridge into 2 different domains !), but the API totally ignores
> > the problem, leaves it to libvirt "magic foo" and focuses on something
> > that is both quite secondary in the grand scheme of things, and quite
> > x86 VTd specific in the implementation and API definition.
> > 
> > Now, I'm not saying these programmable iommu domains aren't a nice
> > feature and that we shouldn't exploit them when available, but as it is,
> > it is too much a central part of the API.
> > 
> > I'll talk a little bit more about recent POWER iommu's here to
> > illustrate where I'm coming from with my idea of groups:
> > 
> > On p7ioc (the IO chip used on recent P7 machines), there -is- a concept
> > of domain and a per-RID filtering. However it differs from VTd in a few
> > ways:
> > 
> > The "domains" (aka PEs) encompass more than just an iommu filtering
> > scheme. The MMIO space and PIO space are also segmented, and those
> > segments assigned to domains. Interrupts (well, MSI ports at least) are
> > assigned to domains. Inbound PCIe error messages are targeted to
> > domains, etc...
> > 
> > Basically, the PEs provide a very strong isolation feature which
> > includes errors, and has the ability to immediately "isolate" a PE on
> > the first occurence of an error. For example, if an inbound PCIe error
> > is signaled by a device on a PE or such a device does a DMA to a
> > non-authorized address, the whole PE gets into error state. All
> > subsequent stores (both DMA and MMIO) are swallowed and reads return all
> > 1's, interrupts are blocked. This is designed to prevent any propagation
> > of bad data, which is a very important feature in large high reliability
> > systems.
> > 
> > Software then has the ability to selectively turn back on MMIO and/or
> > DMA, perform diagnostics, reset devices etc...
> > 
> > Because the domains encompass more than just DMA, but also segment the
> > MMIO space, it is not practical at all to dynamically reconfigure them
> > at runtime to "move" devices into domains. The firmware or early kernel
> > code (it depends) will assign devices BARs using an algorithm that keeps
> > them within PE segment boundaries, etc....
> > 
> > Additionally (and this is indeed a "restriction" compared to VTd, though
> > I expect our future IO chips to lift it to some extent), PE don't get
> > separate DMA address spaces. There is one 64-bit DMA address space per
> > PCI host bridge, and it is 'segmented' with each segment being assigned
> > to a PE. Due to the way PE assignment works in hardware, it is not
> > practical to make several devices share a segment unless they are on the
> > same bus. Also the resulting limit in the amount of 32-bit DMA space a
> > device can access means that it's impractical to put too many devices in
> > a PE anyways. (This is clearly designed for paravirt iommu, I'll talk
> > more about that later).
> > 
> > The above essentially extends the granularity requirement (or rather is
> > another factor defining what the granularity of partitionable entities
> > is). You can think of it as "pre-existing" domains.
> > 
> > I believe the way to solve that is to introduce a kernel interface to
> > expose those "partitionable entities" to userspace. In addition, it
> > occurs to me that the ability to manipulate VTd domains essentially
> > boils down to manipulating those groups (creating larger ones with
> > individual components).
> > 
> > I like the idea of defining / playing with those groups statically
> > (using a command line tool or sysfs, possibly having a config file
> > defining them in a persistent way) rather than having their lifetime
> > tied to a uiommu file descriptor.
> > 
> > It also makes it a LOT easier to have a channel to manipulate
> > platform/arch specific attributes of those domains if any.
> > 
> > So we could define an API or representation in sysfs that exposes what
> > the partitionable entities are, and we may add to it an API to
> > manipulate them. But we don't have to and I'm happy to keep the
> > additional SW grouping you can do on VTd as a sepparate "add-on" API
> > (tho I don't like at all the way it works with uiommu). However, qemu
> > needs to know what the grouping is regardless of the domains, and it's
> > not nice if it has to manipulate two different concepts here so
> > eventually those "partitionable entities" from a qemu standpoint must
> > look like domains.
> > 
> > My main point is that I don't want the "knowledge" here to be in libvirt
> > or qemu. In fact, I want to be able to do something as simple as passing
> > a reference to a PE to qemu (sysfs path ?) and have it just pickup all
> > the devices in there and expose them to the guest.
> > 
> > This can be done in a way that isn't PCI specific as well (the
> > definition of the groups and what is grouped would would obviously be
> > somewhat bus specific and handled by platform code in the kernel).
> > 
> > Maybe something like /sys/devgroups ? This probably warrants involving
> > more kernel people into the discussion.
> > 
> > * IOMMU
> > 
> > Now more on iommu. I've described I think in enough details how ours
> > work, there are others, I don't know what freescale or ARM are doing,
> > sparc doesn't quite work like VTd either, etc...
> > 
> > The main problem isn't that much the mechanics of the iommu but really
> > how it's exposed (or not) to guests.
> > 
> > VFIO here is basically designed for one and only one thing: expose the
> > entire guest physical address space to the device more/less 1:1.
> > 
> > This means:
> > 
> >   - It only works with iommu's that provide complete DMA address spaces
> > to devices. Won't work with a single 'segmented' address space like we
> > have on POWER.
> > 
> >   - It requires the guest to be pinned. Pass-through -> no more swap
> > 
> >   - The guest cannot make use of the iommu to deal with 32-bit DMA
> > devices, thus a guest with more than a few G of RAM (I don't know the
> > exact limit on x86, depends on your IO hole I suppose), and you end up
> > back to swiotlb & bounce buffering.
> > 
> >   - It doesn't work for POWER server anyways because of our need to
> > provide a paravirt iommu interface to the guest since that's how pHyp
> > works today and how existing OSes expect to operate.
> > 
> > Now some of this can be fixed with tweaks, and we've started doing it
> > (we have a working pass-through using VFIO, forgot to mention that, it's
> > just that we don't like what we had to do to get there).
> > 
> > Basically, what we do today is:
> > 
> > - We add an ioctl to VFIO to expose to qemu the segment information. IE.
> > What is the DMA address and size of the DMA "window" usable for a given
> > device. This is a tweak, that should really be handled at the "domain"
> > level.
> > 
> > That current hack won't work well if two devices share an iommu. Note
> > that we have an additional constraint here due to our paravirt
> > interfaces (specificed in PAPR) which is that PE domains must have a
> > common parent. Basically, pHyp makes them look like a PCIe host bridge
> > per domain in the guest. I think that's a pretty good idea and qemu
> > might want to do the same.
> > 
> > - We hack out the currently unconditional mapping of the entire guest
> > space in the iommu. Something will have to be done to "decide" whether
> > to do that or not ... qemu argument -> ioctl ?
> > 
> > - We hook up the paravirt call to insert/remove a translation from the
> > iommu to the VFIO map/unmap ioctl's.
> > 
> > This limps along but it's not great. Some of the problems are:
> > 
> > - I've already mentioned, the domain problem again :-) 
> > 
> > - Performance sucks of course, the vfio map ioctl wasn't mean for that
> > and has quite a bit of overhead. However we'll want to do the paravirt
> > call directly in the kernel eventually ...
> > 
> >   - ... which isn't trivial to get back to our underlying arch specific
> > iommu object from there. We'll probably need a set of arch specific
> > "sideband" ioctl's to "register" our paravirt iommu "bus numbers" and
> > link them to the real thing kernel-side.
> > 
> > - PAPR (the specification of our paravirt interface and the expectation
> > of current OSes) wants iommu pages to be 4k by default, regardless of
> > the kernel host page size, which makes things a bit tricky since our
> > enterprise host kernels have a 64k base page size. Additionally, we have
> > new PAPR interfaces that we want to exploit, to allow the guest to
> > create secondary iommu segments (in 64-bit space), which can be used
> > (under guest control) to do things like map the entire guest (here it
> > is :-) or use larger iommu page sizes (if permitted by the host kernel,
> > in our case we could allow 64k iommu page size with a 64k host kernel).
> > 
> > The above means we need arch specific APIs. So arch specific vfio
> > ioctl's, either that or kvm ones going to vfio or something ... the
> > current structure of vfio/kvm interaction doesn't make it easy.
> > 
> > * IO space
> > 
> > On most (if not all) non-x86 archs, each PCI host bridge provide a
> > completely separate PCI address space. Qemu doesn't deal with that very
> > well. For MMIO it can be handled since those PCI address spaces are
> > "remapped" holes in the main CPU address space so devices can be
> > registered by using BAR + offset of that window in qemu MMIO mapping.
> > 
> > For PIO things get nasty. We have totally separate PIO spaces and qemu
> > doesn't seem to like that. We can try to play the offset trick as well,
> > we haven't tried yet, but basically that's another one to fix. Not a
> > huge deal I suppose but heh ...
> > 
> > Also our next generation chipset may drop support for PIO completely.
> > 
> > On the other hand, because PIO is just a special range of MMIO for us,
> > we can do normal pass-through on it and don't need any of the emulation
> > done qemu.
> > 
> >   * MMIO constraints
> > 
> > The QEMU side VFIO code hard wires various constraints that are entirely
> > based on various requirements you decided you have on x86 but don't
> > necessarily apply to us :-)
> > 
> > Due to our paravirt nature, we don't need to masquerade the MSI-X table
> > for example. At all. If the guest configures crap into it, too bad, it
> > can only shoot itself in the foot since the host bridge enforce
> > validation anyways as I explained earlier. Because it's all paravirt, we
> > don't need to "translate" the interrupt vectors & addresses, the guest
> > will call hyercalls to configure things anyways.
> > 
> > We don't need to prevent MMIO pass-through for small BARs at all. This
> > should be some kind of capability or flag passed by the arch. Our
> > segmentation of the MMIO domain means that we can give entire segments
> > to the guest and let it access anything in there (those segments are a
> > multiple of the page size always). Worst case it will access outside of
> > a device BAR within a segment and will cause the PE to go into error
> > state, shooting itself in the foot, there is no risk of side effect
> > outside of the guest boundaries.
> > 
> > In fact, we don't even need to emulate BAR sizing etc... in theory. Our
> > paravirt guests expect the BARs to have been already allocated for them
> > by the firmware and will pick up the addresses from the device-tree :-)
> > 
> > Today we use a "hack", putting all 0's in there and triggering the linux
> > code path to reassign unassigned resources (which will use BAR
> > emulation) but that's not what we are -supposed- to do. Not a big deal
> > and having the emulation there won't -hurt- us, it's just that we don't
> > really need any of it.
> > 
> > We have a small issue with ROMs. Our current KVM only works with huge
> > pages for guest memory but that is being fixed. So the way qemu maps the
> > ROM copy into the guest address space doesn't work. It might be handy
> > anyways to have a way for qemu to use MMIO emulation for ROM access as a
> > fallback. I'll look into it.
> > 
> >   * EEH
> > 
> > This is the name of those fancy error handling & isolation features I
> > mentioned earlier. To some extent it's a superset of AER, but we don't
> > generally expose AER to guests (or even the host), it's swallowed by
> > firmware into something else that provides a superset (well mostly) of
> > the AER information, and allow us to do those additional things like
> > isolating/de-isolating, reset control etc...
> > 
> > Here too, we'll need arch specific APIs through VFIO. Not necessarily a
> > huge deal, I mention it for completeness.
> > 
> >    * Misc
> > 
> > There's lots of small bits and pieces... in no special order:
> > 
> >  - netlink ? WTF ! Seriously, we don't need a hybrid API with a bit of
> > netlink and a bit of ioctl's ... it's not like there's something
> > fundamentally  better for netlink vs. ioctl... it really depends what
> > you are doing, and in this case I fail to see what netlink brings you
> > other than bloat and more stupid userspace library deps.
> > 
> >  - I don't like too much the fact that VFIO provides yet another
> > different API to do what we already have at least 2 kernel APIs for, ie,
> > BAR mapping and config space access. At least it should be better at
> > using the backend infrastructure of the 2 others (sysfs & procfs). I
> > understand it wants to filter in some case (config space) and -maybe-
> > yet another API is the right way to go but allow me to have my doubts.
> > 
> > One thing I thought about but you don't seem to like it ... was to use
> > the need to represent the partitionable entity as groups in sysfs that I
> > talked about earlier. Those could have per-device subdirs with the usual
> > config & resource files, same semantic as the ones in the real device,
> > but when accessed via the group they get filtering. I might or might not
> > be practical in the end, tbd, but it would allow apps using a slightly
> > modified libpci for example to exploit some of this.
> > 
> >  - The qemu vfio code hooks directly into ioapic ... of course that
> > won't fly with anything !x86
> > 
> >  - The various "objects" dealt with here, -especially- interrupts and
> > iommu, need a better in-kernel API so that fast in-kernel emulation can
> > take over from qemu based emulation. The way we need to do some of this
> > on POWER differs from x86. We can elaborate later, it's not necessarily
> > a killer either but essentially we'll take the bulk of interrupt
> > handling away from VFIO to the point where it won't see any of it at
> > all.
> > 
> >   - Non-PCI devices. That's a hot topic for embedded. I think the vast
> > majority here is platform devices. There's quite a bit of vfio that
> > isn't intrinsically PCI specific. We could have an in-kernel platform
> > driver like we have an in-kernel PCI driver to attach to. The mapping of
> > resources to userspace is rather generic, as goes for interrupts. I
> > don't know whether that idea can be pushed much further, I don't have
> > the bandwidth to look into it much at this point, but maybe it would be
> > possible to refactor vfio a bit to better separate what is PCI specific
> > to what is not. The idea would be to move the PCI specific bits to
> > inside the "placeholder" PCI driver, and same goes for platform bits.
> > "generic" ioctl's go to VFIO core, anything that doesn't handle, it
> > passes them to the driver which allows the PCI one to handle things
> > differently than the platform one, maybe an amba one while at it,
> > etc.... just a thought, I haven't gone into the details at all.
> > 
> > I think that's all I had on my plate today, it's a long enough email
> > anyway :-) Anthony suggested we put that on a wiki, I'm a bit
> > wiki-disabled myself so he proposed to pickup my email and do that. We
> > should probably discuss the various items in here separately as
> > different threads to avoid too much confusion.
> > 
> > One other thing we should do on our side is publish somewhere our
> > current hacks to get you an idea of where we are going and what we had
> > to do (code speaks more than words). We'll try to do that asap, possibly
> > next week.
> > 
> > Note that I'll be on/off the next few weeks, travelling and doing
> > bringup. So expect latency in my replies.
> > 
> > Cheers,
> > Ben.
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-05 10:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, David Gibson, Alex Williamson,
	Avi Kivity, Anthony Liguori, linuxppc-dev
In-Reply-To: <20110804104105.GC22329@8bytes.org>

On Thu, 2011-08-04 at 12:41 +0200, Joerg Roedel wrote:
> On Mon, Aug 01, 2011 at 02:27:36PM -0600, Alex Williamson wrote:
> > It's not clear to me how we could skip it.  With VT-d, we'd have to
> > implement an emulated interrupt remapper and hope that the guest picks
> > unused indexes in the host interrupt remapping table before it could do
> > anything useful with direct access to the MSI-X table.  Maybe AMD IOMMU
> > makes this easier?
> 
> AMD IOMMU provides remapping tables per-device, and not a global one.
> But that does not make direct guest-access to the MSI-X table safe. The
> table contains the table contains the interrupt-type and the vector
> which is used as an index into the remapping table by the IOMMU. So when
> the guest writes into its MSI-X table the remapping-table in the host
> needs to be updated too.

Right, you need paravirt to avoid filtering :-)

IE the problem is two fold:

 - Getting the right value in the table / remapper so things work
(paravirt)

 - Protecting against the guest somewhat managing to change the value in
the table (either directly or via a backdoor access to its own config
space).

The later for us comes from the HW PE filtering of the MSI transactions.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 2/2] powerpc: Fix build without CONFIG_PCI
From: Benjamin Herrenschmidt @ 2011-08-05 10:16 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linuxppc-dev
In-Reply-To: <20110805173700.19310798b5a30bfb666d7031@canb.auug.org.au>

On Fri, 2011-08-05 at 17:37 +1000, Stephen Rothwell wrote:
> Hi Ben,
> 
> On Fri, 05 Aug 2011 16:04:07 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >
> > Commit fea80311a939a746533a6d7e7c3183729d6a3faf
> > "iomap: make IOPORT/PCI mapping functions conditional"
> > 
> > Broke powerpc build without CONFIG_PCI as we would still define
> > pci_iomap(), which overlaps with the new empty inline in the headers.
> 
> If we are using a static inline implementation for !CONFIG_PCI, then we
> should probably not EXPORT it in that case either ...

Good point, interesting that my build test didn't catch it ... I'll
add another patch on top before I ask Linus to pull.

Cheers,
Ben.

> > @@ -143,6 +144,7 @@ void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
> >  		return;
> >  	iounmap(addr);
> >  }
> > +#endif /* CONFIG_PCI */
> >  
> >  EXPORT_SYMBOL(pci_iomap);
> >  EXPORT_SYMBOL(pci_iounmap);

^ permalink raw reply

* Re: [PATCH 2/2] powerpc: Fix build without CONFIG_PCI
From: Stephen Rothwell @ 2011-08-05  7:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1312524247.8598.25.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

Hi Ben,

On Fri, 05 Aug 2011 16:04:07 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> Commit fea80311a939a746533a6d7e7c3183729d6a3faf
> "iomap: make IOPORT/PCI mapping functions conditional"
> 
> Broke powerpc build without CONFIG_PCI as we would still define
> pci_iomap(), which overlaps with the new empty inline in the headers.

If we are using a static inline implementation for !CONFIG_PCI, then we
should probably not EXPORT it in that case either ...

> @@ -143,6 +144,7 @@ void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
>  		return;
>  	iounmap(addr);
>  }
> +#endif /* CONFIG_PCI */
>  
>  EXPORT_SYMBOL(pci_iomap);
>  EXPORT_SYMBOL(pci_iounmap);


-- 
Stephen Rothwell <sfr@canb.auug.org.au>

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH 2/2] powerpc: Fix build without CONFIG_PCI
From: Benjamin Herrenschmidt @ 2011-08-05  6:04 UTC (permalink / raw)
  To: linuxppc-dev

Commit fea80311a939a746533a6d7e7c3183729d6a3faf
"iomap: make IOPORT/PCI mapping functions conditional"

Broke powerpc build without CONFIG_PCI as we would still define
pci_iomap(), which overlaps with the new empty inline in the headers.

Make our implementation conditional on CONFIG_PCI

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/iomap.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 1577434..faca64a 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -117,6 +117,7 @@ void ioport_unmap(void __iomem *addr)
 EXPORT_SYMBOL(ioport_map);
 EXPORT_SYMBOL(ioport_unmap);
 
+#ifdef CONFIG_PCI
 void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max)
 {
 	resource_size_t start = pci_resource_start(dev, bar);
@@ -143,6 +144,7 @@ void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
 		return;
 	iounmap(addr);
 }
+#endif /* CONFIG_PCI */
 
 EXPORT_SYMBOL(pci_iomap);
 EXPORT_SYMBOL(pci_iounmap);

^ permalink raw reply related

* [PATCH] powerpc/4xx: Fix build of PCI code on 405
From: Benjamin Herrenschmidt @ 2011-08-05  6:03 UTC (permalink / raw)
  To: linuxppc-dev

Commit 112d1fe9f7715db423ffeec5ac1beccff6093dc4
"powerpc/4xx: Add check_link to struct ppc4xx_pciex_hwops" inadvertently
broke 405 builds due to some functions being over protected by an
ifdef CONFIG_44x.

Move them back out.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/sysdev/ppc4xx_pci.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c b/arch/powerpc/sysdev/ppc4xx_pci.c
index a59ba96..dbfe96b 100644
--- a/arch/powerpc/sysdev/ppc4xx_pci.c
+++ b/arch/powerpc/sysdev/ppc4xx_pci.c
@@ -655,8 +655,6 @@ struct ppc4xx_pciex_hwops
 
 static struct ppc4xx_pciex_hwops *ppc4xx_pciex_hwops;
 
-#ifdef CONFIG_44x
-
 static int __init ppc4xx_pciex_wait_on_sdr(struct ppc4xx_pciex_port *port,
 					   unsigned int sdr_offset,
 					   unsigned int mask,
@@ -688,6 +686,7 @@ static int __init ppc4xx_pciex_port_reset_sdr(struct ppc4xx_pciex_port *port)
 	return 0;
 }
 
+
 static void __init ppc4xx_pciex_check_link_sdr(struct ppc4xx_pciex_port *port)
 {
 	printk(KERN_INFO "PCIE%d: Checking link...\n", port->index);
@@ -718,6 +717,8 @@ static void __init ppc4xx_pciex_check_link_sdr(struct ppc4xx_pciex_port *port)
 		printk(KERN_INFO "PCIE%d: No device detected.\n", port->index);
 }
 
+#ifdef CONFIG_44x
+
 /* Check various reset bits of the 440SPe PCIe core */
 static int __init ppc440spe_pciex_check_reset(struct device_node *np)
 {

^ permalink raw reply related

* FW: Ethernet driver WR linux
From: smitha.vanga @ 2011-08-05  5:31 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

Hi

I have bringup WR linux 2.6.38 on a custom mpc8247 board. Not able to
configure the ipaddress.  When I boot my linux kernel I get the below
message.
Also I am using the fec 1.  The fec is not getting initialized.
fs_enet.c:v1.0 (Aug 8, 2005)
BB MII Bus: Cannot register as MDIO bus
fsl-bb-mdio: probe of fsl-bb-mdio.0 failed with error -1
BB MII Bus: Cannot register as MDIO bus
fsl-bb-mdio: probe of fsl-bb-mdio.1 failed with error -1

I have enabled the following:

CONFIG_NET_ETHERNET=y

CONFIG_MII=y

CONFIG_FS_ENET=y

# CONFIG_FS_ENET_HAS_SCC is not set

CONFIG_FS_ENET_HAS_FCC=y


Regards,
Smitha

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 

www.wipro.com

[-- Attachment #2: Type: text/html, Size: 2806 bytes --]

^ permalink raw reply

* [PATCH] powerpc: Make KVM_GUEST default to n
From: Anton Blanchard @ 2011-08-05  3:23 UTC (permalink / raw)
  To: benh, paulus, agraf; +Cc: linuxppc-dev


KVM_GUEST adds a 1 MB array to the kernel (kvm_tmp) which grew
my kernel enough to cause it to fail to boot.

Dynamically allocating or reducing the size of this array is a
good idea, but in the meantime I think it makes sense to make
KVM_GUEST default to n in order to minimise surprises.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-powerpc/arch/powerpc/platforms/Kconfig
===================================================================
--- linux-powerpc.orig/arch/powerpc/platforms/Kconfig	2011-08-01 17:33:46.120121554 +1000
+++ linux-powerpc/arch/powerpc/platforms/Kconfig	2011-08-01 17:35:06.921772044 +1000
@@ -24,7 +24,7 @@ source "arch/powerpc/platforms/wsp/Kconf
 
 config KVM_GUEST
 	bool "KVM Guest support"
-	default y
+	default n
 	---help---
 	  This option enables various optimizations for running under the KVM
 	  hypervisor. Overhead for the kernel when not running inside KVM should

^ permalink raw reply

* MESH SCSI driver not working
From: Pol Vangheluwe @ 2011-08-04 19:40 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2926 bytes --]

On Fri, 2010-03-19 at 00:21 +0100, Stef Simoens wrote:
> Hello,
> 
> Some time ago (July 24th 2009 my mailbox says) I emailed you and the
> linuxppc-dev list about my problems booting from the mesh SCSI
> controller.
> 
> I just compiled 2.6.31 (actually, gentoo-sources-2.6.31-r10); but the
> problem remains
> I know that 2.6.33 is out, but as I didn't see any changes to the
> mesh-driver I guess that the problem is still there ...

Sadly I haven't had a chance to look. The MESH driver is a pretty
complicated thing that I didn't write and am not familiar with, so it's
going to take some work to sort out what's going on, and so far I
haven't had time to dedicate to this.

Cheers,
Ben.
I found this topic in the linuxppc-dev archives while googling for a mesh problem I currently have on my PowerPC 7500:
the boot process fails when populating /dev.  I think I have the same problem as reported by Stef Simoens: after a lot of complaints
like 'mesh_abort' and 'mesh_host_reset', it ends with Kernel panic about 'double DMA start'.

A rebuilt my kernel several times with different configs, but the only way to make it bootable is to disable the CDROM driver
(the 'sr' block of the SCSI upper layer).  The problem even occurs when 'sr' is built as a module.

My platform: pol [ ~ ]$ cat /proc/cpuinfo
processor	: 0
cpu		: 604e
clock		: 200.000000MHz
revision	: 2.4 (pvr 0009 0204)
bogomips	: 25.00
timebase	: 12500550
platform	: PowerMac
model		: Power Macintosh
machine		: Power Macintosh
motherboard	: AAPL,7500 MacRISC
detected as	: 16 (PowerMac 7500)
pmac flags	: 00000000
L2 cache	: 256K unified
pmac-generation	: OldWorld
Memory		: 208 MB

My kernel: pol [ ~ ]$ uname -a
Linux ppc7600 2.6.37.6 #4 Thu Aug 4 18:36:07 CEST 2011 ppc GNU/Linux

My compiler: pol [ ~ ]$ gcc -v       
Ingebouwde specs worden gebruikt.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc-unknown-linux-gnu/4.5.2/lto-wrapper
Target: powerpc-unknown-linux-gnu
Configured with: ../gcc-4.5.2/configure --prefix=/usr --libexecdir=/usr/lib --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-languages=c,c++ --disable-multilib --disable-bootstrap --with-system-zlib
Thread model: posix
gcc versie 4.5.2 (GCC) 

My system: LFS-2.8 (Linux From Scratch)

I have on the same machine also Ubuntu 8.04.1 (hardy), running the 2.6.24-19-powerpc kernel 
and gcc 4.1.3.  There is no mesh problem with this distribution.

Stef reported kernel 2.6.29 OK but 2.6.30-rcl and 2.6.31-rc3 NOK.

I recompiled mesh.c with DEBUG on and I see a lot of messages on the screen but I cannot capture them in a file.  We are too early
in the boot process for a working serial port and there is no disk mounted yet to log anything.

BenH was suspecting a DMA buffer misalignment but so far, I could not find a real solution on the Internet.
Anybody?

pvg


[-- Attachment #2: Type: text/html, Size: 4837 bytes --]

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 07/15] ps3flash: Refuse to work in lpars other than OtherOS
From: Geert Uytterhoeven @ 2011-08-04 19:27 UTC (permalink / raw)
  To: Andre Heider; +Cc: Geoff Levand, cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <CAHsu+b8iEA53gq45Dyv-n+9X8ZG97xTTp50JQxGxVLA_UUppBw@mail.gmail.com>

On Thu, Aug 4, 2011 at 18:40, Andre Heider <a.heider@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 12:34 AM, Geoff Levand <geoff@infradead.org> wrote=
:
>> On 08/01/2011 01:02 PM, Andre Heider wrote:
>>> --- a/drivers/char/ps3flash.c
>>> +++ b/drivers/char/ps3flash.c
>>> @@ -25,6 +25,7 @@
>>>
>>> =C2=A0#include <asm/lv1call.h>
>>> =C2=A0#include <asm/ps3stor.h>
>>> +#include <asm/firmware.h>
>>>
>>>
>>> =C2=A0#define DEVICE_NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"ps3flash"
>>> @@ -455,6 +456,12 @@ static struct ps3_system_bus_driver ps3flash =3D {
>>>
>>> =C2=A0static int __init ps3flash_init(void)
>>> =C2=A0{
>>> + =C2=A0 =C2=A0 if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return -ENODEV;
>>
>> Is this needed? =C2=A0Won't this driver only be loaded on PS3 hardware?
>>
>
> The same code is in drivers/block/ps3disk.c, I wasn't sure if it is
> missing here or redundant there.
> Should I remove it here?
>
>>> +
>>> + =C2=A0 =C2=A0 if (ps3_get_ss_laid() !=3D PS3_SS_LAID_OTHEROS)
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return -ENODEV;
>>> +
>>> =C2=A0 =C2=A0 =C2=A0 return ps3_system_bus_driver_register(&ps3flash);
>>> =C2=A0}

ps3flash_init() is called straight from module_init(), so it could be
called on non-PS3.
ps3_system_bus_driver_register() has the firmware_has_feature_check(),
so it will
reject non-PS3.

But if your *_init() does any processing before calling
ps3_system_bus_driver_register()
(like ps3disk_init() does, and ps3flash_init() now does due to your
patch), you have to
do the check yourself, to make sure it returns early on non-PS3.

Gr{oetje,eeting}s,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k=
.org

In personal conversations with technical people, I call myself a hacker. Bu=
t
when I'm talking to journalists I just say "programmer" or something like t=
hat.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 02/15] [PS3] Get lv1 high memory region from devtree
From: Geoff Levand @ 2011-08-04 19:24 UTC (permalink / raw)
  To: Hector Martin; +Cc: cbe-oss-dev, Andre Heider, linuxppc-dev
In-Reply-To: <4E39F386.8020403@marcansoft.com>

On 08/03/2011 06:19 PM, Hector Martin wrote:
> On 08/04/2011 12:30 AM, Geoff Levand wrote:
>> How would a kexec based bootloader work?  If it's kernel were to allocate
>> high mem and the bootloader program uses the high mem, how could it tell
>> that kernel not to destroy the region on shutdown?
> 
> The current code contemplates the case where a non-kexec based
> bootloader is the first stage and allocates highmem (and knows how to
> tell the kernel about it), possibly followed by kexec stages that just
> keep that allocation. To support a kexec bootloader as the first
> bootloader using this mechanism would indeed require extra support to
> tell that kernel to retain its allocation, preferably something that can
> be decided from userland. Of course the current kexec bootloader
> behavior where highmem isn't handed over to the child kernel will still
> work.
> 
>> If arch/powerpc/boot/ps3.c allocated the mem and added a DT entry
>> then other OSes that don't know about the Linux device tree won't
>> be able to use that allocated memory.  Other OSes could do a
>> test to see if the allocation was already done.  Another option
>> that might work is to write info into the LV1 repository then
>> have boot code look there for allocated hig mem.
> 
> If you're booting another OS that isn't Linux then it also has no use
> for a Linux-specific ramdisk (linux,initrd-start) and thus no use for
> preallocated highmem and should be booted as such (maybe make the
> userland tools tell the kernel to release highmem if there's no initrd
> defined).

This sounds complicated, user programs managing memory regions.

Also, it needs to be considered that a lot of kernels are out
there will be confused if started with high mem already allocated.

> Using the lv1 repo is an option, but does it make sense? It's even less
> standard than a FDT and we'd have to put both the region1 location and
> the initrd location in there (there's no point to maintaining highmem if
> you aren't going to use it).
> 
> FWIW, the lv1 repo writing hypercalls are unused and undocumented.

The hcalls to create, write, and delete nodes are known, but I don't
recall if I verified they work:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/powerpc/include/asm/lv1call.h;hb=HEAD#l265

#92 should be named lv1_write_repository_node.

You can only modify the repo for your lpar, so:

  lv1_{create,write}_repository_node(n1, n2, n3, n4, v1, v2);
  lv1_delete_repository_node(n1, n2, n3, n4);

-Geoff

^ permalink raw reply

* Re: [PATCH 07/15] ps3flash: Refuse to work in lpars other than OtherOS
From: Andre Heider @ 2011-08-04 16:40 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <4E39CCF3.7000406@infradead.org>

On Thu, Aug 4, 2011 at 12:34 AM, Geoff Levand <geoff@infradead.org> wrote:
> On 08/01/2011 01:02 PM, Andre Heider wrote:
>> The driver implements a character and misc device, meant for the
>> axed OtherOS to exchange various settings with GameOS.
>> Since Firmware 3.21 there is no GameOS support anymore to write these
>> settings, so limit the driver to the OtherOS environment.
>
> This is really a test if running on the PS3 OtherOS, so this
> comment should state that.

Ok.

>
>>
>> Signed-off-by: Andre Heider <a.heider@gmail.com>
>> ---
>> =A0arch/powerpc/platforms/ps3/Kconfig | =A0 =A01 +
>> =A0drivers/char/ps3flash.c =A0 =A0 =A0 =A0 =A0 =A0| =A0 =A07 +++++++
>> =A02 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/ps3/Kconfig b/arch/powerpc/platforms=
/ps3/Kconfig
>> index 84df5c8..5eb956a 100644
>> --- a/arch/powerpc/platforms/ps3/Kconfig
>> +++ b/arch/powerpc/platforms/ps3/Kconfig
>> @@ -121,6 +121,7 @@ config PS3_FLASH
>>
>> =A0 =A0 =A0 =A0 This support is required to access the PS3 FLASH ROM, wh=
ich
>> =A0 =A0 =A0 =A0 contains the boot loader and some boot options.
>> + =A0 =A0 =A0 This driver only supports the deprecated OtherOS LPAR.
>
> This will be confusing for OtherOS users, so should be removed.

Ok.

>
>> =A0 =A0 =A0 =A0 In general, all users will say Y or M.
>
>
> This could be changed to: 'In general, all PS3 OtherOS users will say Y o=
r M.'

Yeah, you're right, that's much better

>
>>
>> =A0 =A0 =A0 =A0 As this driver needs a fixed buffer of 256 KiB of memory=
, it can
>> diff --git a/drivers/char/ps3flash.c b/drivers/char/ps3flash.c
>> index 69c734a..b1e8659 100644
>> --- a/drivers/char/ps3flash.c
>> +++ b/drivers/char/ps3flash.c
>> @@ -25,6 +25,7 @@
>>
>> =A0#include <asm/lv1call.h>
>> =A0#include <asm/ps3stor.h>
>> +#include <asm/firmware.h>
>>
>>
>> =A0#define DEVICE_NAME =A0 =A0 =A0 =A0 =A0"ps3flash"
>> @@ -455,6 +456,12 @@ static struct ps3_system_bus_driver ps3flash =3D {
>>
>> =A0static int __init ps3flash_init(void)
>> =A0{
>> + =A0 =A0 if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
>> + =A0 =A0 =A0 =A0 =A0 =A0 return -ENODEV;
>
> Is this needed? =A0Won't this driver only be loaded on PS3 hardware?
>

The same code is in drivers/block/ps3disk.c, I wasn't sure if it is
missing here or redundant there.
Should I remove it here?

>> +
>> + =A0 =A0 if (ps3_get_ss_laid() !=3D PS3_SS_LAID_OTHEROS)
>> + =A0 =A0 =A0 =A0 =A0 =A0 return -ENODEV;
>> +
>> =A0 =A0 =A0 return ps3_system_bus_driver_register(&ps3flash);
>> =A0}
>>
>
> -Geoff
>
>

^ permalink raw reply

* Re: [PATCH 01/15] [PS3] Add udbg driver using the PS3 gelic Ethernet device
From: Andre Heider @ 2011-08-04 16:35 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <4E39CC67.2060606@infradead.org>

On Thu, Aug 4, 2011 at 12:32 AM, Geoff Levand <geoff@infradead.org> wrote:
> On 08/01/2011 01:02 PM, Andre Heider wrote:
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
>> @@ -0,0 +1,272 @@
>> +/*
>> + * arch/powerpc/platforms/ps3/gelic_udbg.c
>
> Don't put file names in files. =A0When the file gets moved, then this wil=
l
> no longer be correct.
>
>> + *
>> + * udbg debug output routine via GELIC UDP broadcasts
>> + * Copyright (C) 2010 Hector Martin <hector@marcansoft.com>
>> + * Copyright (C) 2011 Andre Heider <a.heider@gmail.com>
>
> Some of this seems to be taken from the gelic driver, so shouldn't
> the copyright info from there be included here?
>
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + */
>> +
>> +#include <asm/io.h>
>> +#include <asm/udbg.h>
>> +#include <asm/lv1call.h>
>> +
>> +#define GELIC_BUS_ID 1
>> +#define GELIC_DEVICE_ID 0
>> +#define GELIC_DEBUG_PORT 18194
>> +#define GELIC_MAX_MESSAGE_SIZE 1000
>> +
>> +#define GELIC_LV1_GET_MAC_ADDRESS 1
>> +#define GELIC_LV1_GET_VLAN_ID 4
>> +#define GELIC_LV1_VLAN_TX_ETHERNET_0 2
>> +
>> +#define GELIC_DESCR_DMA_STAT_MASK 0xf0000000
>> +#define GELIC_DESCR_DMA_CARDOWNED 0xa0000000
>> +
>> +#define GELIC_DESCR_TX_DMA_IKE 0x00080000
>> +#define GELIC_DESCR_TX_DMA_NO_CHKSUM 0x00000000
>> +#define GELIC_DESCR_TX_DMA_FRAME_TAIL 0x00040000
>> +
>> +#define GELIC_DESCR_DMA_CMD_NO_CHKSUM (GELIC_DESCR_DMA_CARDOWNED | \
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0GELIC_DESCR_TX_DMA_IKE | \
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0GELIC_DESCR_TX_DMA_NO_CHKSUM)
>> +
>> +static u64 bus_addr;
>> +
>> +struct gelic_descr {
>> + =A0 =A0 /* as defined by the hardware */
>
> These are BE from the hardware, so should be __beXX types.
>
>> + =A0 =A0 u32 buf_addr;
>> + =A0 =A0 u32 buf_size;
>> + =A0 =A0 u32 next_descr_addr;
>> + =A0 =A0 u32 dmac_cmd_status;
>> + =A0 =A0 u32 result_size;
>> + =A0 =A0 u32 valid_size; /* all zeroes for tx */
>> + =A0 =A0 u32 data_status;
>> + =A0 =A0 u32 data_error; /* all zeroes for tx */
>> +} __attribute__((aligned(32)));
>
> ...
>
>> +static void gelic_debug_init(void)
>> +{
>
> ...
>
>> + =A0 =A0 result =3D lv1_net_control(GELIC_BUS_ID, GELIC_DEVICE_ID,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0GELIC_LV1_G=
ET_VLAN_ID,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0GELIC_LV1_V=
LAN_TX_ETHERNET_0, 0, 0,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&vlan_id, &=
v2);
>> + =A0 =A0 if (result =3D=3D 0) {
>
> This should be 'if (!result)'

Ack on all points.

^ permalink raw reply

* Re: [PATCH 05/15] ps3: Detect the current lpar environment
From: Andre Heider @ 2011-08-04 16:34 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <4E39CC54.7030705@infradead.org>

On Thu, Aug 4, 2011 at 12:31 AM, Geoff Levand <geoff@infradead.org> wrote:
> On 08/01/2011 01:02 PM, Andre Heider wrote:
>> ---
>> =A0arch/powerpc/include/asm/ps3.h =A0 =A0 =A0 =A0 =A0| =A0 =A07 +++++++
>> =A0arch/powerpc/platforms/ps3/platform.h =A0 | =A0 =A04 ++++
>> =A0arch/powerpc/platforms/ps3/repository.c | =A0 19 +++++++++++++++++++
>> =A0arch/powerpc/platforms/ps3/setup.c =A0 =A0 =A0| =A0 22 ++++++++++++++=
++++++++
>> =A04 files changed, 52 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/ps3.h b/arch/powerpc/include/asm/p=
s3.h
>> index 7f065e1..136354a 100644
>> --- a/arch/powerpc/include/asm/ps3.h
>> +++ b/arch/powerpc/include/asm/ps3.h
>> @@ -39,6 +39,13 @@ union ps3_firmware_version {
>> =A0void ps3_get_firmware_version(union ps3_firmware_version *v);
>> =A0int ps3_compare_firmware_version(u16 major, u16 minor, u16 rev);
>>
>> +enum ps3_ss_laid {
>> + =A0 =A0 PS3_SS_LAID_GAMEOS =3D 0x1070000002000001UL,
>> + =A0 =A0 PS3_SS_LAID_OTHEROS =3D 0x1080000004000001UL,
>
> Only PS3_SS_LAID_OTHEROS is used for anything outside ps3_setup_arch(),
> so I think it makes sense to split this into two patches with one adding
> just PS3_SS_LAID_OTHEROS and ps3_get_ss_laid() with a comment that
> it adds the ps3_get_ss_laid routine.

Sounds reasonable, I will split that into two patches then.

^ permalink raw reply

* Re: [PATCH 00/15] ps3: Support more than the OtherOS lpar
From: Andre Heider @ 2011-08-04 16:31 UTC (permalink / raw)
  To: Geoff Levand; +Cc: cbe-oss-dev, Hector Martin, linuxppc-dev
In-Reply-To: <4E39CA6C.3000202@infradead.org>

Hi Geoff,

On Thu, Aug 4, 2011 at 12:23 AM, Geoff Levand <geoff@infradead.org> wrote:
> Hi Andre,
>
> On 08/01/2011 01:02 PM, Andre Heider wrote:
>> This series addresses various issues and extends support when running
>> in lpars like GameOS. Included are some patches from Hector Martin, whic=
h
>> I found useful.
>
> Much of this is just general fixups and improvements to the existing PS3
> support. =A0I think you should separate those changes out and work to get
> them included, then consider others. =A0If I give some comment, then
> I consider that part worth pursuing at the present time.

Sounds like a good approach to me.

> I have limited time to review the patches, so it will take me a while to
> get through them.

No problem at all, it will probably take some time to resolve all the
details anyway.

>> Patches are based on 2.6.39 since master doesn't boot with smp on my
>> console. =A0I wasn't able to pinpoint the cause so far (not that I tried
>> too hard).
>
> I'm looking into this problem, but it will take some time.

Just for the record: I didn't mean to push ;)

Thanks,
Andre

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox