LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Stephen Rothwell @ 2011-08-26  0:39 UTC (permalink / raw)
  To: Arnaud Lacombe; +Cc: Greg KH, linux-next, ppc-dev, Timur Tabi, linux-kernel
In-Reply-To: <CACqU3MUs_97=MuuKAjr-LeUT7kSRxtzSntwUeKeeH4n8s7SSGg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

Hi Arnaud,

On Thu, 25 Aug 2011 12:09:20 -0400 Arnaud Lacombe <lacombar@gmail.com> wrote:
>
> If you could provide an exhaustive list of them, I'd be interested. Do
> you account/reference them in the report you make on each new -next
> tree ?

I don't refer to them at all :-(

If you are not just referring to powerpc ones, then an x86_64
allmodconfig is a good place to start, there are several in there.

Otherwise, I will send you the results of some of my builds this evening.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] xics/icp_natives: add __init to marker icp_native_init()
From: Arnaud Lacombe @ 2011-08-25 20:00 UTC (permalink / raw)
  To: Timur Tabi; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4E56A15D.8070406@freescale.com>

Hi,

On Thu, Aug 25, 2011 at 3:24 PM, Timur Tabi <timur@freescale.com> wrote:
> Arnaud Lacombe wrote:
>> This should fix the following warning:
>>
>> =A0LD =A0 =A0 =A0arch/powerpc/sysdev/xics/built-in.o
>> WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mism=
atch in
>> reference from the function .icp_native_init() to the function
>> .init.text:.icp_native_init_one_node()
>> The function .icp_native_init() references
>> the function __init .icp_native_init_one_node().
>> This is often because .icp_native_init lacks a __init
>> annotation or the annotation of .icp_native_init_one_node is wrong.
>>
>> icp_native_init() is only referenced in `arch/powerpc/sysdev/xics/xics-c=
ommon.c'
>> by xics_init() which is itself marked with __init.
>>
>> =3D not built-tested =3D
>>
>> Reported-by: Timur Tabi <timur@freescale.com>
>> Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
>
> Acked-by: Timur Tabi <timur@freescale.com>
>
> This warning still appears, though:
>
> WARNING: arch/powerpc/sysdev/built-in.o(.text+0xf6b8): Section mismatch i=
n
> reference from the function .ics_rtas_init() to the function
> .init.text:.xics_register_ics()
> The function .ics_rtas_init() references
> the function __init .xics_register_ics().
> This is often because .ics_rtas_init lacks a __init
> annotation or the annotation of .xics_register_ics is wrong.
>
he, chain-reaction :)

> To fix this warning, you'll also need:
>
> diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xi=
cs/ics-
> index c782f85..a125721 100644
> --- a/arch/powerpc/sysdev/xics/ics-rtas.c
> +++ b/arch/powerpc/sysdev/xics/ics-rtas.c
> @@ -213,7 +213,7 @@ static int ics_rtas_host_match(struct ics *ics, struc=
t devic
> =A0 =A0 =A0 =A0return !of_device_is_compatible(node, "chrp,iic");
> =A0}
>
> -int ics_rtas_init(void)
> +int __init ics_rtas_init(void)
> =A0{
> =A0 =A0 =A0 =A0ibm_get_xive =3D rtas_token("ibm,get-xive");
> =A0 =A0 =A0 =A0ibm_set_xive =3D rtas_token("ibm,set-xive");
>
>
> However, now we get another similar warning:
>
> WARNING: drivers/built-in.o(.text+0x259c484): Section mismatch in referen=
ce from
> the function .tc3589x_keypad_open() to the function
> .devinit.text:.tc3589x_keypad_init_key_hardware()
> The function .tc3589x_keypad_open() references
> the function __devinit .tc3589x_keypad_init_key_hardware().
> This is often because .tc3589x_keypad_open lacks a __devinit
> annotation or the annotation of .tc3589x_keypad_init_key_hardware is wron=
g.
>
> I'm not sure what to do at this point, because I have a suspicion that ad=
ding
> __devinit to tc3589x_keypad_open() is wrong.
>
tc3589x_keypad_init_key_hardware() annotation looks plain wrong.

 - Arnaud

> --
> Timur Tabi
> Linux kernel developer at Freescale
>
>

^ permalink raw reply

* Re: Kernel boot up
From: Scott Wood @ 2011-08-25 19:31 UTC (permalink / raw)
  To: smitha.vanga; +Cc: linuxppc-dev
In-Reply-To: <07ACDFB8ECA8EF47863A613BC01BBB22035E3A70@HYD-MKD-MBX02.wipro.com>

On 08/25/2011 02:57 AM, smitha.vanga@wipro.com wrote:
> Hi Scott,
> 
> I am currently trying to bring up 2.6.39 kernel on a target based on MPC8247
> Processor, using the attched .dts  file . I get the below logs while the
> kernel is booting.
> I see that the unflattening of the device tree and the initial loading
> of the kernel and ramdisk file system is happening correctly. Can you
> point me where exactly I can look for this issue. I am attaching the
> .config and .dts file I am using.

Which error are you referring to?

> of-flash ff800000.flash: do_map_probe() failed

What kind of flash chip do you have?  Does the node in the device tree
accurately describe it (four interleaved 8-bit chips that only do JEDEC
and not CFI)?

> PPP generic driver version 2.4.2
> PPP Deflate Compression module registered
> tun: Universal TUN/TAP device driver, 1.6
> tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> eth0: fs_enet: 00:00:00:00:00:00
> eth1: fs_enet: 00:00:00:00:00:00

These MAC addresses should have been set in the device tree.  If you're
using U-Boot, it should be doing the fixup.

> Populating /dev using udev: /sbin/udevd: '/lib/libc.so.6' library
> contains unsup
> ported TLS
> /sbin/udevd: '/lib/libc.so.6' library contains unsupported TLS
> /sbin/udevd: can't load library 'libc.so.6'
> FAIL
> /sbin/udevstart: '/lib/libc.so.6' library contains unsupported TLS
> /sbin/udevstart: '/lib/libc.so.6' library contains unsupported TLS
> /sbin/udevstart: can't load library 'libc.so.6'
> FAIL

This is a problem with the root filesystem, not the kernel.

-Scott

^ permalink raw reply

* Re: [PATCH] xics/icp_natives: add __init to marker icp_native_init()
From: Timur Tabi @ 2011-08-25 19:24 UTC (permalink / raw)
  To: Arnaud Lacombe; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1314288433-26322-1-git-send-email-lacombar@gmail.com>

Arnaud Lacombe wrote:
> This should fix the following warning:
> 
>  LD      arch/powerpc/sysdev/xics/built-in.o
> WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mismatch in
> reference from the function .icp_native_init() to the function
> .init.text:.icp_native_init_one_node()
> The function .icp_native_init() references
> the function __init .icp_native_init_one_node().
> This is often because .icp_native_init lacks a __init
> annotation or the annotation of .icp_native_init_one_node is wrong.
> 
> icp_native_init() is only referenced in `arch/powerpc/sysdev/xics/xics-common.c'
> by xics_init() which is itself marked with __init.
> 
> = not built-tested =
> 
> Reported-by: Timur Tabi <timur@freescale.com>
> Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>

Acked-by: Timur Tabi <timur@freescale.com>

This warning still appears, though:

WARNING: arch/powerpc/sysdev/built-in.o(.text+0xf6b8): Section mismatch in
reference from the function .ics_rtas_init() to the function
.init.text:.xics_register_ics()
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.

To fix this warning, you'll also need:

diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-
index c782f85..a125721 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -213,7 +213,7 @@ static int ics_rtas_host_match(struct ics *ics, struct devic
        return !of_device_is_compatible(node, "chrp,iic");
 }

-int ics_rtas_init(void)
+int __init ics_rtas_init(void)
 {
        ibm_get_xive = rtas_token("ibm,get-xive");
        ibm_set_xive = rtas_token("ibm,set-xive");

However, now we get another similar warning:

WARNING: drivers/built-in.o(.text+0x259c484): Section mismatch in reference from
the function .tc3589x_keypad_open() to the function
.devinit.text:.tc3589x_keypad_init_key_hardware()
The function .tc3589x_keypad_open() references
the function __devinit .tc3589x_keypad_init_key_hardware().
This is often because .tc3589x_keypad_open lacks a __devinit
annotation or the annotation of .tc3589x_keypad_init_key_hardware is wrong.

I'm not sure what to do at this point, because I have a suspicion that adding
__devinit to tc3589x_keypad_open() is wrong.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply related

* Re: [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Greg KH @ 2011-08-25 19:03 UTC (permalink / raw)
  To: Timur Tabi; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <4E5699A8.1070708@freescale.com>

On Thu, Aug 25, 2011 at 01:51:20PM -0500, Timur Tabi wrote:
> Greg KH wrote:
> > But don't you really want this type of check at runtime?  What happens
> > if you load this driver on a machine that is not a guest?  Will things
> > break?  Shouldn't you still refuse to load somehow?
> 
> This is in the udbg code, which falls under the category of, "turn this on only
> if you know what you're doing."
> 
> The udbg code runs very early, before the device tree is available.  There's no
> way of knowing at this point whether or not we're running under a hypervisor.
> If you turn on udbg support, then it means that you're trying to do some very
> specific debugging on a specific platform.
> 
> So I'm not removing this code just to fix the build break.  It really should
> never have been there in the first place.

Ok, thanks for the details, I'll queue up the patch in a bit.

greg k-h

^ permalink raw reply

* Re: [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Timur Tabi @ 2011-08-25 18:51 UTC (permalink / raw)
  To: Greg KH; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <20110825184655.GB1891@kroah.com>

Greg KH wrote:
> But don't you really want this type of check at runtime?  What happens
> if you load this driver on a machine that is not a guest?  Will things
> break?  Shouldn't you still refuse to load somehow?

This is in the udbg code, which falls under the category of, "turn this on only
if you know what you're doing."

The udbg code runs very early, before the device tree is available.  There's no
way of knowing at this point whether or not we're running under a hypervisor.
If you turn on udbg support, then it means that you're trying to do some very
specific debugging on a specific platform.

So I'm not removing this code just to fix the build break.  It really should
never have been there in the first place.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Greg KH @ 2011-08-25 18:46 UTC (permalink / raw)
  To: Timur Tabi; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <4E568E19.405@freescale.com>

On Thu, Aug 25, 2011 at 01:02:01PM -0500, Timur Tabi wrote:
> Greg KH wrote:
> > tested doesn't mean that it shouldn't still build properly for other
> > platforms, right?
> 
> The problem is the dependency on MSR_GS, which is defined only for Book-E
> PowerPC chips, not all PowerPC.
> 
> So I gave it some more thought, and technically ePAPR extends beyond Book-E, so
> it's wrong for the driver to depend on anything specific to Book-E.  I've
> removed the code that breaks:
> 
> 	/* Check if we're running as a guest of a hypervisor */
> 	if (!(mfmsr() & MSR_GS))
> 		return;

But don't you really want this type of check at runtime?  What happens
if you load this driver on a machine that is not a guest?  Will things
break?  Shouldn't you still refuse to load somehow?

thanks,

greg k-h

^ permalink raw reply

* [PATCH] [v2] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Timur Tabi @ 2011-08-25 18:06 UTC (permalink / raw)
  To: greg, sfr, linux-next, linux-kernel, linuxppc-dev

The ePAPR hypervisor byte channel driver is supposed to work on all
ePAPR-compliant embedded PowerPC systems, but it had a reference to the MSR_GS
bit, which is available only on Book-E systems.

Also fix a couple integer-to-pointer typecast problems.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 drivers/tty/ehv_bytechan.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c
index e67f70b..f733718 100644
--- a/drivers/tty/ehv_bytechan.c
+++ b/drivers/tty/ehv_bytechan.c
@@ -226,10 +226,6 @@ void __init udbg_init_ehv_bc(void)
 	unsigned int rx_count, tx_count;
 	unsigned int ret;
 
-	/* Check if we're running as a guest of a hypervisor */
-	if (!(mfmsr() & MSR_GS))
-		return;
-
 	/* Verify the byte channel handle */
 	ret = ev_byte_channel_poll(CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE,
 				   &rx_count, &tx_count);
@@ -286,7 +282,7 @@ static int ehv_bc_console_byte_channel_send(unsigned int handle, const char *s,
 static void ehv_bc_console_write(struct console *co, const char *s,
 				 unsigned int count)
 {
-	unsigned int handle = (unsigned int)co->data;
+	unsigned int handle = (uintptr_t)co->data;
 	char s2[EV_BYTE_CHANNEL_MAX_BYTES];
 	unsigned int i, j = 0;
 	char c;
@@ -352,7 +348,7 @@ static int __init ehv_bc_console_init(void)
 			   CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE);
 #endif
 
-	ehv_bc_console.data = (void *)stdout_bc;
+	ehv_bc_console.data = (void *)(uintptr_t)stdout_bc;
 
 	/* add_preferred_console() must be called before register_console(),
 	   otherwise it won't work.  However, we don't want to enumerate all the
-- 
1.7.3.4

^ permalink raw reply related

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-25 18:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: chrisw, Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	Roedel, Joerg, linux-pci@vger.kernel.org, qemu-devel,
	Aaron Fabbri, iommu, Avi Kivity, Anthony Liguori, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <1314292832.2492.31.camel@x201.home>

On Thu, Aug 25, 2011 at 11:20:30AM -0600, Alex Williamson wrote:
> On Thu, 2011-08-25 at 12:54 +0200, Roedel, Joerg wrote:

> > We need to solve this differently. ARM is starting to use the iommu-api
> > too and this definitly does not work there. One possible solution might
> > be to make the iommu-ops per-bus.
> 
> That sounds good.  Is anyone working on it?  It seems like it doesn't
> hurt to use this in the interim, we may just be watching the wrong bus
> and never add any sysfs group info.

I'll cook something up for RFC over the weekend.

> > Also the return type should not be long but something that fits into
> > 32bit on all platforms. Since you use -ENODEV, probably s32 is a good
> > choice.
> 
> The convenience of using seg|bus|dev|fn was too much to resist, too bad
> it requires a full 32bits.  Maybe I'll change it to:
>         int iommu_device_group(struct device *dev, unsigned int *group)

If we really expect segment numbers that need the full 16 bit then this
would be the way to go. Otherwise I would prefer returning the group-id
directly and partition the group-id space for the error values (s32 with
negative numbers being errors).

> > > @@ -438,6 +439,10 @@ static int __init intel_iommu_setup(char *str)
> > >  			printk(KERN_INFO
> > >  				"Intel-IOMMU: disable supported super page\n");
> > >  			intel_iommu_superpage = 0;
> > > +		} else if (!strncmp(str, "no_mf_groups", 12)) {
> > > +			printk(KERN_INFO
> > > +				"Intel-IOMMU: disable separate groups for multifunction devices\n");
> > > +			intel_iommu_no_mf_groups = 1;
> > 
> > This should really be a global iommu option and not be VT-d specific.
> 
> You think?  It's meaningless on benh's power systems.

But it is not meaningless on AMD-Vi systems :) There should be one
option for both.
On the other hand this requires an iommu= parameter on ia64, but thats
probably not that bad.

> > This looks like code duplication in the VT-d driver. It doesn't need to
> > be generalized now, but we should keep in mind to do a more general
> > solution later.
> > Maybe it is beneficial if the IOMMU drivers only setup the number in
> > dev->arch.iommu.groupid and the iommu-api fetches it from there then.
> > But as I said, this is some more work and does not need to be done for
> > this patch(-set).
> 
> The iommu-api reaches into dev->arch.iommu.groupid?  I figured we should
> at least start out with a lightweight, optional interface without the
> overhead of predefining groupids setup by bus notification callbacks in
> each iommu driver.  Thanks,

As I said, this is just an idea for an later optimization. It is fine
for now as it is in this patch.

	Joerg

^ permalink raw reply

* Re: [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Timur Tabi @ 2011-08-25 18:02 UTC (permalink / raw)
  To: Greg KH; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <20110825163234.GA31629@kroah.com>

Greg KH wrote:
> tested doesn't mean that it shouldn't still build properly for other
> platforms, right?

The problem is the dependency on MSR_GS, which is defined only for Book-E
PowerPC chips, not all PowerPC.

So I gave it some more thought, and technically ePAPR extends beyond Book-E, so
it's wrong for the driver to depend on anything specific to Book-E.  I've
removed the code that breaks:

	/* Check if we're running as a guest of a hypervisor */
	if (!(mfmsr() & MSR_GS))
		return;

> What is keeping the driver from building on all PPC, or even all arches
> today?

I've made a few changes, and it builds on all PPC now.  I'll post a new patch.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-25 17:20 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: chrisw, Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, Aaron Fabbri, iommu, Avi Kivity, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <20110825105402.GB1923@amd.com>

On Thu, 2011-08-25 at 12:54 +0200, Roedel, Joerg wrote:
> Hi Alex,
> 
> On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote:
> > Is this roughly what you're thinking of for the iommu_group component?
> > Adding a dev_to_group iommu ops callback let's us consolidate the sysfs
> > support in the iommu base.  Would AMD-Vi do something similar (or
> > exactly the same) for group #s?  Thanks,
> 
> The concept looks good, I have some comments, though. On AMD-Vi the
> implementation would look a bit different because there is a
> data-structure were the information can be gathered from, so no need for
> PCI bus scanning there.
> 
> > diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
> > index 6e6b6a1..6b54c1a 100644
> > --- a/drivers/base/iommu.c
> > +++ b/drivers/base/iommu.c
> > @@ -17,20 +17,56 @@
> >   */
> >  
> >  #include <linux/bug.h>
> > +#include <linux/device.h>
> >  #include <linux/types.h>
> >  #include <linux/module.h>
> >  #include <linux/slab.h>
> >  #include <linux/errno.h>
> >  #include <linux/iommu.h>
> > +#include <linux/pci.h>
> >  
> >  static struct iommu_ops *iommu_ops;
> >  
> > +static ssize_t show_iommu_group(struct device *dev,
> > +				struct device_attribute *attr, char *buf)
> > +{
> > +	return sprintf(buf, "%lx", iommu_dev_to_group(dev));
> 
> Probably add a 0x prefix so userspace knows the format?

I think I'll probably change it to %u.  Seems common to have decimal in
sysfs and doesn't get confusing if we cat it with a string.  As a bonus,
it abstracts that vt-d is just stuffing a PCI device address in there,
which nobody should ever rely on.

> > +}
> > +static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
> > +
> > +static int add_iommu_group(struct device *dev, void *unused)
> > +{
> > +	if (iommu_dev_to_group(dev) >= 0)
> > +		return device_create_file(dev, &dev_attr_iommu_group);
> > +
> > +	return 0;
> > +}
> > +
> > +static int device_notifier(struct notifier_block *nb,
> > +			   unsigned long action, void *data)
> > +{
> > +	struct device *dev = data;
> > +
> > +	if (action == BUS_NOTIFY_ADD_DEVICE)
> > +		return add_iommu_group(dev, NULL);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct notifier_block device_nb = {
> > +	.notifier_call = device_notifier,
> > +};
> > +
> >  void register_iommu(struct iommu_ops *ops)
> >  {
> >  	if (iommu_ops)
> >  		BUG();
> >  
> >  	iommu_ops = ops;
> > +
> > +	/* FIXME - non-PCI, really want for_each_bus() */
> > +	bus_register_notifier(&pci_bus_type, &device_nb);
> > +	bus_for_each_dev(&pci_bus_type, NULL, NULL, add_iommu_group);
> >  }
> 
> We need to solve this differently. ARM is starting to use the iommu-api
> too and this definitly does not work there. One possible solution might
> be to make the iommu-ops per-bus.

That sounds good.  Is anyone working on it?  It seems like it doesn't
hurt to use this in the interim, we may just be watching the wrong bus
and never add any sysfs group info.

> >  bool iommu_found(void)
> > @@ -94,6 +130,14 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
> >  
> > +long iommu_dev_to_group(struct device *dev)
> > +{
> > +	if (iommu_ops->dev_to_group)
> > +		return iommu_ops->dev_to_group(dev);
> > +	return -ENODEV;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_dev_to_group);
> 
> Please rename this to iommu_device_group(). The dev_to_group name
> suggests a conversion but it is actually just a property of the device.

Ok.

> Also the return type should not be long but something that fits into
> 32bit on all platforms. Since you use -ENODEV, probably s32 is a good
> choice.

The convenience of using seg|bus|dev|fn was too much to resist, too bad
it requires a full 32bits.  Maybe I'll change it to:
        int iommu_device_group(struct device *dev, unsigned int *group)

> > +
> >  int iommu_map(struct iommu_domain *domain, unsigned long iova,
> >  	      phys_addr_t paddr, int gfp_order, int prot)
> >  {
> > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
> > index f02c34d..477259c 100644
> > --- a/drivers/pci/intel-iommu.c
> > +++ b/drivers/pci/intel-iommu.c
> > @@ -404,6 +404,7 @@ static int dmar_map_gfx = 1;
> >  static int dmar_forcedac;
> >  static int intel_iommu_strict;
> >  static int intel_iommu_superpage = 1;
> > +static int intel_iommu_no_mf_groups;
> >  
> >  #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
> >  static DEFINE_SPINLOCK(device_domain_lock);
> > @@ -438,6 +439,10 @@ static int __init intel_iommu_setup(char *str)
> >  			printk(KERN_INFO
> >  				"Intel-IOMMU: disable supported super page\n");
> >  			intel_iommu_superpage = 0;
> > +		} else if (!strncmp(str, "no_mf_groups", 12)) {
> > +			printk(KERN_INFO
> > +				"Intel-IOMMU: disable separate groups for multifunction devices\n");
> > +			intel_iommu_no_mf_groups = 1;
> 
> This should really be a global iommu option and not be VT-d specific.

You think?  It's meaningless on benh's power systems.

> >  
> >  		str += strcspn(str, ",");
> > @@ -3902,6 +3907,52 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > +/* Group numbers are arbitrary.  Device with the same group number
> > + * indicate the iommu cannot differentiate between them.  To avoid
> > + * tracking used groups we just use the seg|bus|devfn of the lowest
> > + * level we're able to differentiate devices */
> > +static long intel_iommu_dev_to_group(struct device *dev)
> > +{
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +	struct pci_dev *bridge;
> > +	union {
> > +		struct {
> > +			u8 devfn;
> > +			u8 bus;
> > +			u16 segment;
> > +		} pci;
> > +		u32 group;
> > +	} id;
> > +
> > +	if (iommu_no_mapping(dev))
> > +		return -ENODEV;
> > +
> > +	id.pci.segment = pci_domain_nr(pdev->bus);
> > +	id.pci.bus = pdev->bus->number;
> > +	id.pci.devfn = pdev->devfn;
> > +
> > +	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
> > +		return -ENODEV;
> > +
> > +	bridge = pci_find_upstream_pcie_bridge(pdev);
> > +	if (bridge) {
> > +		if (pci_is_pcie(bridge)) {
> > +			id.pci.bus = bridge->subordinate->number;
> > +			id.pci.devfn = 0;
> > +		} else {
> > +			id.pci.bus = bridge->bus->number;
> > +			id.pci.devfn = bridge->devfn;
> > +		}
> > +	}
> > +
> > +	/* Virtual functions always get their own group */
> > +	if (!pdev->is_virtfn && intel_iommu_no_mf_groups)
> > +		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
> > +
> > +	/* FIXME - seg # >= 0x8000 on 32b */
> > +	return id.group;
> > +}
> 
> This looks like code duplication in the VT-d driver. It doesn't need to
> be generalized now, but we should keep in mind to do a more general
> solution later.
> Maybe it is beneficial if the IOMMU drivers only setup the number in
> dev->arch.iommu.groupid and the iommu-api fetches it from there then.
> But as I said, this is some more work and does not need to be done for
> this patch(-set).

The iommu-api reaches into dev->arch.iommu.groupid?  I figured we should
at least start out with a lightweight, optional interface without the
overhead of predefining groupids setup by bus notification callbacks in
each iommu driver.  Thanks,

Alex

> 
> > +
> >  static struct iommu_ops intel_iommu_ops = {
> >  	.domain_init	= intel_iommu_domain_init,
> >  	.domain_destroy = intel_iommu_domain_destroy,
> > @@ -3911,6 +3962,7 @@ static struct iommu_ops intel_iommu_ops = {
> >  	.unmap		= intel_iommu_unmap,
> >  	.iova_to_phys	= intel_iommu_iova_to_phys,
> >  	.domain_has_cap = intel_iommu_domain_has_cap,
> > +	.dev_to_group	= intel_iommu_dev_to_group,
> >  };
> >  
> >  static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 0a2ba40..90c1a86 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -45,6 +45,7 @@ struct iommu_ops {
> >  				    unsigned long iova);
> >  	int (*domain_has_cap)(struct iommu_domain *domain,
> >  			      unsigned long cap);
> > +	long (*dev_to_group)(struct device *dev);
> >  };
> >  
> >  #ifdef CONFIG_IOMMU_API
> > @@ -65,6 +66,7 @@ extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
> >  				      unsigned long iova);
> >  extern int iommu_domain_has_cap(struct iommu_domain *domain,
> >  				unsigned long cap);
> > +extern long iommu_dev_to_group(struct device *dev);
> >  
> >  #else /* CONFIG_IOMMU_API */
> >  
> > @@ -121,6 +123,10 @@ static inline int domain_has_cap(struct iommu_domain *domain,
> >  	return 0;
> >  }
> >  
> > +static inline long iommu_dev_to_group(struct device *dev);
> > +{
> > +	return -ENODEV;
> > +}
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  #endif /* __LINUX_IOMMU_H */
> > 
> > 
> > 
> 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Roedel, Joerg @ 2011-08-25 16:46 UTC (permalink / raw)
  To: Don Dutile
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Avi Kivity,
	Anthony Liguori, linux-pci@vger.kernel.org, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <4E566C61.9060105@redhat.com>

On Thu, Aug 25, 2011 at 11:38:09AM -0400, Don Dutile wrote:

> On 08/25/2011 06:54 AM, Roedel, Joerg wrote:
> > We need to solve this differently. ARM is starting to use the iommu-api
> > too and this definitly does not work there. One possible solution might
> > be to make the iommu-ops per-bus.
> >
> When you think of a system where there isn't just one bus-type
> with iommu support, it makes more sense.
> Additionally, it also allows the long-term architecture to use different types
> of IOMMUs on each bus segment -- think per-PCIe-switch/bridge IOMMUs --
> esp. 'tuned' IOMMUs -- ones better geared for networks, ones better geared
> for direct-attach disk hba's.

Not sure how likely it is to have different types of IOMMUs within a
given bus-type. But if they become reality we can multiplex in the
iommu-api without much hassle :)
For now, something like bus_set_iommu() or bus_register_iommu() would
provide a nice way to do bus-specific setups for a given iommu
implementation.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Greg KH @ 2011-08-25 16:32 UTC (permalink / raw)
  To: Timur Tabi; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <1314289245-14946-1-git-send-email-timur@freescale.com>

On Thu, Aug 25, 2011 at 11:20:45AM -0500, Timur Tabi wrote:
> The Kconfig for the ePAPR hypervisor byte channel driver has a "depends on PPC",
> which means it would compile on all PowerPC platforms, even though it's
> only been tested on Freescale platforms.  Change the Kconfig to depend on
> FSL_SOC instead.

tested doesn't mean that it shouldn't still build properly for other
platforms, right?

What is keeping the driver from building on all PPC, or even all arches
today?

greg k-h

^ permalink raw reply

* [PATCH] tty/powerpc: fix build break with ehv_bytechan.c on allyesconfig
From: Timur Tabi @ 2011-08-25 16:20 UTC (permalink / raw)
  To: greg, sfr, linux-next, linux-kernel, linuxppc-dev

The Kconfig for the ePAPR hypervisor byte channel driver has a "depends on PPC",
which means it would compile on all PowerPC platforms, even though it's
only been tested on Freescale platforms.  Change the Kconfig to depend on
FSL_SOC instead.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 drivers/tty/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index f1ea59b..535af0a 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -353,7 +353,7 @@ config TRACE_SINK
 
 config PPC_EPAPR_HV_BYTECHAN
 	tristate "ePAPR hypervisor byte channel driver"
-	depends on PPC
+	depends on FSL_SOC
 	help
 	  This driver creates /dev entries for each ePAPR hypervisor byte
 	  channel, thereby allowing applications to communicate with byte
-- 
1.7.3.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Arnaud Lacombe @ 2011-08-25 16:09 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: Greg KH, linux-next, ppc-dev, Timur Tabi, linux-kernel
In-Reply-To: <20110826015111.49af16f792d5554fd931d230@canb.auug.org.au>

Hi,

On Thu, Aug 25, 2011 at 11:51 AM, Stephen Rothwell <sfr@canb.auug.org.au> w=
rote:
> Hi Timur,
>
> On Thu, 25 Aug 2011 10:22:05 -0500 Timur Tabi <timur@freescale.com> wrote=
:
>>
>> Is there some trick to building allyesconfig on PowerPC? =A0When I do tr=
y that, I
>> get all sorts of weird build errors, and it dies long before it gets to =
my
>> driver. =A0I get stuff like:
>>
>> =A0 LD =A0 =A0 =A0arch/powerpc/sysdev/xics/built-in.o
>> WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mism=
atch in
>> reference from the function .icp_native_init() to the function
>> .init.text:.icp_native_init_one_node()
>> The function .icp_native_init() references
>> the function __init .icp_native_init_one_node().
>> This is often because .icp_native_init lacks a __init
>> annotation or the annotation of .icp_native_init_one_node is wrong.
>
> We get lots of those in many builds. :-( =A0Just a warning.
>
If you could provide an exhaustive list of them, I'd be interested. Do
you account/reference them in the report you make on each new -next
tree ?

 - Arnaud

>> and
>>
>> =A0 AS =A0 =A0 =A0arch/powerpc/kernel/head_64.o
>> arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
>> arch/powerpc/kernel/exceptions-64s.S:1151: Error: attempt to move .org b=
ackwards
>> arch/powerpc/kernel/exceptions-64s.S:1160: Error: attempt to move .org b=
ackwards
>
> There is a patch for that pending with either the kvm guys or the powerpc=
 guys.
>
>> I guess I don't have the right compiler.
>
> Yours seems to be OK. =A0If you pass -k to make it will get further. =A0O=
r
> you could configure it and then just try building your driver rather than
> the whole tree.
>
>> Anyway, I think I know how to fix the break that Stephen is seeing. =A0I=
 will post
>> a v4 patch in a few minutes.
>
> Thanks.
> --
> Cheers,
> Stephen Rothwell =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sfr@canb.auug.org=
.au
> http://www.canb.auug.org.au/~sfr/
>

^ permalink raw reply

* [PATCH] xics/icp_natives: add __init to marker icp_native_init()
From: Arnaud Lacombe @ 2011-08-25 16:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: timur, Arnaud Lacombe, linux-kernel

This should fix the following warning:

 LD      arch/powerpc/sysdev/xics/built-in.o
WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mismatch in
reference from the function .icp_native_init() to the function
.init.text:.icp_native_init_one_node()
The function .icp_native_init() references
the function __init .icp_native_init_one_node().
This is often because .icp_native_init lacks a __init
annotation or the annotation of .icp_native_init_one_node is wrong.

icp_native_init() is only referenced in `arch/powerpc/sysdev/xics/xics-common.c'
by xics_init() which is itself marked with __init.

= not built-tested =

Reported-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
---
 arch/powerpc/sysdev/xics/icp-native.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/icp-native.c b/arch/powerpc/sysdev/xics/icp-native.c
index 50e32af..4c79b6f 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -276,7 +276,7 @@ static const struct icp_ops icp_native_ops = {
 #endif
 };

-int icp_native_init(void)
+int __init icp_native_init(void)
 {
 	struct device_node *np;
 	u32 indx = 0;
-- 
1.7.6.153.g78432

^ permalink raw reply related

* Re: [PATCH] [v4] tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
From: Timur Tabi @ 2011-08-25 16:03 UTC (permalink / raw)
  To: Greg KH; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <20110825155050.GA10084@kroah.com>

Greg KH wrote:
> No, this doesn't work, I need just a fix, as I took your previous patch
> already.

Sorry, coming right up.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [PATCH] [v4] tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
From: Greg KH @ 2011-08-25 15:50 UTC (permalink / raw)
  To: Timur Tabi; +Cc: sfr, linux-next, linux-kernel, linuxppc-dev
In-Reply-To: <1314286345-27056-1-git-send-email-timur@freescale.com>

On Thu, Aug 25, 2011 at 10:32:25AM -0500, Timur Tabi wrote:
> The ePAPR embedded hypervisor specification provides an API for "byte
> channels", which are serial-like virtual devices for sending and receiving
> streams of bytes.  This driver provides Linux kernel support for byte
> channels via three distinct interfaces:
> 
> 1) An early-console (udbg) driver.  This provides early console output
> through a byte channel.  The byte channel handle must be specified in a
> Kconfig option.
> 
> 2) A normal console driver.  Output is sent to the byte channel designated
> for stdout in the device tree.  The console driver is for handling kernel
> printk calls.
> 
> 3) A tty driver, which is used to handle user-space input and output.  The
> byte channel used for the console is designated as the default tty.
> 
> Signed-off-by: Timur Tabi <timur@freescale.com>

No, this doesn't work, I need just a fix, as I took your previous patch
already.

greg k-h

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Stephen Rothwell @ 2011-08-25 15:51 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Greg KH, linux-next, ppc-dev, linux-kernel
In-Reply-To: <4E56689D.3080202@freescale.com>

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

Hi Timur,

On Thu, 25 Aug 2011 10:22:05 -0500 Timur Tabi <timur@freescale.com> wrote:
>
> Is there some trick to building allyesconfig on PowerPC?  When I do try that, I
> get all sorts of weird build errors, and it dies long before it gets to my
> driver.  I get stuff like:
> 
>   LD      arch/powerpc/sysdev/xics/built-in.o
> WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mismatch in
> reference from the function .icp_native_init() to the function
> .init.text:.icp_native_init_one_node()
> The function .icp_native_init() references
> the function __init .icp_native_init_one_node().
> This is often because .icp_native_init lacks a __init
> annotation or the annotation of .icp_native_init_one_node is wrong.

We get lots of those in many builds. :-(  Just a warning.

> and
> 
>   AS      arch/powerpc/kernel/head_64.o
> arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
> arch/powerpc/kernel/exceptions-64s.S:1151: Error: attempt to move .org backwards
> arch/powerpc/kernel/exceptions-64s.S:1160: Error: attempt to move .org backwards

There is a patch for that pending with either the kvm guys or the powerpc guys.

> I guess I don't have the right compiler.

Yours seems to be OK.  If you pass -k to make it will get further.  Or
you could configure it and then just try building your driver rather than
the whole tree.

> Anyway, I think I know how to fix the break that Stephen is seeing.  I will post
> a v4 patch in a few minutes.

Thanks.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Don Dutile @ 2011-08-25 15:38 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Avi Kivity,
	Anthony Liguori, linux-pci@vger.kernel.org, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <20110825105402.GB1923@amd.com>

On 08/25/2011 06:54 AM, Roedel, Joerg wrote:
> Hi Alex,
>
> On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote:
>> Is this roughly what you're thinking of for the iommu_group component?
>> Adding a dev_to_group iommu ops callback let's us consolidate the sysfs
>> support in the iommu base.  Would AMD-Vi do something similar (or
>> exactly the same) for group #s?  Thanks,
>
> The concept looks good, I have some comments, though. On AMD-Vi the
> implementation would look a bit different because there is a
> data-structure were the information can be gathered from, so no need for
> PCI bus scanning there.
>
>> diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
>> index 6e6b6a1..6b54c1a 100644
>> --- a/drivers/base/iommu.c
>> +++ b/drivers/base/iommu.c
>> @@ -17,20 +17,56 @@
>>    */
>>
>>   #include<linux/bug.h>
>> +#include<linux/device.h>
>>   #include<linux/types.h>
>>   #include<linux/module.h>
>>   #include<linux/slab.h>
>>   #include<linux/errno.h>
>>   #include<linux/iommu.h>
>> +#include<linux/pci.h>
>>
>>   static struct iommu_ops *iommu_ops;
>>
>> +static ssize_t show_iommu_group(struct device *dev,
>> +				struct device_attribute *attr, char *buf)
>> +{
>> +	return sprintf(buf, "%lx", iommu_dev_to_group(dev));
>
> Probably add a 0x prefix so userspace knows the format?
>
>> +}
>> +static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
>> +
>> +static int add_iommu_group(struct device *dev, void *unused)
>> +{
>> +	if (iommu_dev_to_group(dev)>= 0)
>> +		return device_create_file(dev,&dev_attr_iommu_group);
>> +
>> +	return 0;
>> +}
>> +
>> +static int device_notifier(struct notifier_block *nb,
>> +			   unsigned long action, void *data)
>> +{
>> +	struct device *dev = data;
>> +
>> +	if (action == BUS_NOTIFY_ADD_DEVICE)
>> +		return add_iommu_group(dev, NULL);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct notifier_block device_nb = {
>> +	.notifier_call = device_notifier,
>> +};
>> +
>>   void register_iommu(struct iommu_ops *ops)
>>   {
>>   	if (iommu_ops)
>>   		BUG();
>>
>>   	iommu_ops = ops;
>> +
>> +	/* FIXME - non-PCI, really want for_each_bus() */
>> +	bus_register_notifier(&pci_bus_type,&device_nb);
>> +	bus_for_each_dev(&pci_bus_type, NULL, NULL, add_iommu_group);
>>   }
>
> We need to solve this differently. ARM is starting to use the iommu-api
> too and this definitly does not work there. One possible solution might
> be to make the iommu-ops per-bus.
>
When you think of a system where there isn't just one bus-type
with iommu support, it makes more sense.
Additionally, it also allows the long-term architecture to use different types
of IOMMUs on each bus segment -- think per-PCIe-switch/bridge IOMMUs --
esp. 'tuned' IOMMUs -- ones better geared for networks, ones better geared
for direct-attach disk hba's.


>>   bool iommu_found(void)
>> @@ -94,6 +130,14 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
>>
>> +long iommu_dev_to_group(struct device *dev)
>> +{
>> +	if (iommu_ops->dev_to_group)
>> +		return iommu_ops->dev_to_group(dev);
>> +	return -ENODEV;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_dev_to_group);
>
> Please rename this to iommu_device_group(). The dev_to_group name
> suggests a conversion but it is actually just a property of the device.
> Also the return type should not be long but something that fits into
> 32bit on all platforms. Since you use -ENODEV, probably s32 is a good
> choice.
>
>> +
>>   int iommu_map(struct iommu_domain *domain, unsigned long iova,
>>   	      phys_addr_t paddr, int gfp_order, int prot)
>>   {
>> diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
>> index f02c34d..477259c 100644
>> --- a/drivers/pci/intel-iommu.c
>> +++ b/drivers/pci/intel-iommu.c
>> @@ -404,6 +404,7 @@ static int dmar_map_gfx = 1;
>>   static int dmar_forcedac;
>>   static int intel_iommu_strict;
>>   static int intel_iommu_superpage = 1;
>> +static int intel_iommu_no_mf_groups;
>>
>>   #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
>>   static DEFINE_SPINLOCK(device_domain_lock);
>> @@ -438,6 +439,10 @@ static int __init intel_iommu_setup(char *str)
>>   			printk(KERN_INFO
>>   				"Intel-IOMMU: disable supported super page\n");
>>   			intel_iommu_superpage = 0;
>> +		} else if (!strncmp(str, "no_mf_groups", 12)) {
>> +			printk(KERN_INFO
>> +				"Intel-IOMMU: disable separate groups for multifunction devices\n");
>> +			intel_iommu_no_mf_groups = 1;
>
> This should really be a global iommu option and not be VT-d specific.
>
>>
>>   		str += strcspn(str, ",");
>> @@ -3902,6 +3907,52 @@ static int intel_iommu_domain_has_cap(struct iommu_domain *domain,
>>   	return 0;
>>   }
>>
>> +/* Group numbers are arbitrary.  Device with the same group number
>> + * indicate the iommu cannot differentiate between them.  To avoid
>> + * tracking used groups we just use the seg|bus|devfn of the lowest
>> + * level we're able to differentiate devices */
>> +static long intel_iommu_dev_to_group(struct device *dev)
>> +{
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +	struct pci_dev *bridge;
>> +	union {
>> +		struct {
>> +			u8 devfn;
>> +			u8 bus;
>> +			u16 segment;
>> +		} pci;
>> +		u32 group;
>> +	} id;
>> +
>> +	if (iommu_no_mapping(dev))
>> +		return -ENODEV;
>> +
>> +	id.pci.segment = pci_domain_nr(pdev->bus);
>> +	id.pci.bus = pdev->bus->number;
>> +	id.pci.devfn = pdev->devfn;
>> +
>> +	if (!device_to_iommu(id.pci.segment, id.pci.bus, id.pci.devfn))
>> +		return -ENODEV;
>> +
>> +	bridge = pci_find_upstream_pcie_bridge(pdev);
>> +	if (bridge) {
>> +		if (pci_is_pcie(bridge)) {
>> +			id.pci.bus = bridge->subordinate->number;
>> +			id.pci.devfn = 0;
>> +		} else {
>> +			id.pci.bus = bridge->bus->number;
>> +			id.pci.devfn = bridge->devfn;
>> +		}
>> +	}
>> +
>> +	/* Virtual functions always get their own group */
>> +	if (!pdev->is_virtfn&&  intel_iommu_no_mf_groups)
>> +		id.pci.devfn = PCI_DEVFN(PCI_SLOT(id.pci.devfn), 0);
>> +
>> +	/* FIXME - seg #>= 0x8000 on 32b */
>> +	return id.group;
>> +}
>
> This looks like code duplication in the VT-d driver. It doesn't need to
> be generalized now, but we should keep in mind to do a more general
> solution later.
> Maybe it is beneficial if the IOMMU drivers only setup the number in
> dev->arch.iommu.groupid and the iommu-api fetches it from there then.
> But as I said, this is some more work and does not need to be done for
> this patch(-set).
>
>> +
>>   static struct iommu_ops intel_iommu_ops = {
>>   	.domain_init	= intel_iommu_domain_init,
>>   	.domain_destroy = intel_iommu_domain_destroy,
>> @@ -3911,6 +3962,7 @@ static struct iommu_ops intel_iommu_ops = {
>>   	.unmap		= intel_iommu_unmap,
>>   	.iova_to_phys	= intel_iommu_iova_to_phys,
>>   	.domain_has_cap = intel_iommu_domain_has_cap,
>> +	.dev_to_group	= intel_iommu_dev_to_group,
>>   };
>>
>>   static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 0a2ba40..90c1a86 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -45,6 +45,7 @@ struct iommu_ops {
>>   				    unsigned long iova);
>>   	int (*domain_has_cap)(struct iommu_domain *domain,
>>   			      unsigned long cap);
>> +	long (*dev_to_group)(struct device *dev);
>>   };
>>
>>   #ifdef CONFIG_IOMMU_API
>> @@ -65,6 +66,7 @@ extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain,
>>   				      unsigned long iova);
>>   extern int iommu_domain_has_cap(struct iommu_domain *domain,
>>   				unsigned long cap);
>> +extern long iommu_dev_to_group(struct device *dev);
>>
>>   #else /* CONFIG_IOMMU_API */
>>
>> @@ -121,6 +123,10 @@ static inline int domain_has_cap(struct iommu_domain *domain,
>>   	return 0;
>>   }
>>
>> +static inline long iommu_dev_to_group(struct device *dev);
>> +{
>> +	return -ENODEV;
>> +}
>>   #endif /* CONFIG_IOMMU_API */
>>
>>   #endif /* __LINUX_IOMMU_H */
>>
>>
>>
>

^ permalink raw reply

* [PATCH] [v4] tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver
From: Timur Tabi @ 2011-08-25 15:32 UTC (permalink / raw)
  To: greg, sfr, linux-next, linux-kernel, linuxppc-dev

The ePAPR embedded hypervisor specification provides an API for "byte
channels", which are serial-like virtual devices for sending and receiving
streams of bytes.  This driver provides Linux kernel support for byte
channels via three distinct interfaces:

1) An early-console (udbg) driver.  This provides early console output
through a byte channel.  The byte channel handle must be specified in a
Kconfig option.

2) A normal console driver.  Output is sent to the byte channel designated
for stdout in the device tree.  The console driver is for handling kernel
printk calls.

3) A tty driver, which is used to handle user-space input and output.  The
byte channel used for the console is designated as the default tty.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 arch/powerpc/include/asm/udbg.h |    1 +
 arch/powerpc/kernel/udbg.c      |    2 +
 drivers/tty/Kconfig             |   34 ++
 drivers/tty/Makefile            |    1 +
 drivers/tty/ehv_bytechan.c      |  888 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 926 insertions(+), 0 deletions(-)
 create mode 100644 drivers/tty/ehv_bytechan.c

diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
index 93e05d1..5354ae9 100644
--- a/arch/powerpc/include/asm/udbg.h
+++ b/arch/powerpc/include/asm/udbg.h
@@ -54,6 +54,7 @@ extern void __init udbg_init_40x_realmode(void);
 extern void __init udbg_init_cpm(void);
 extern void __init udbg_init_usbgecko(void);
 extern void __init udbg_init_wsp(void);
+extern void __init udbg_init_ehv_bc(void);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_UDBG_H */
diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index faa82c1..b4607a9 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -67,6 +67,8 @@ void __init udbg_early_init(void)
 	udbg_init_usbgecko();
 #elif defined(CONFIG_PPC_EARLY_DEBUG_WSP)
 	udbg_init_wsp();
+#elif defined(CONFIG_PPC_EARLY_DEBUG_EHV_BC)
+	udbg_init_ehv_bc();
 #endif
 
 #ifdef CONFIG_PPC_EARLY_DEBUG
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index bd7cc05..535af0a 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -350,3 +350,37 @@ config TRACE_SINK
 
 	  If you select this option, you need to select
 	  "Trace data router for MIPI P1149.7 cJTAG standard".
+
+config PPC_EPAPR_HV_BYTECHAN
+	tristate "ePAPR hypervisor byte channel driver"
+	depends on FSL_SOC
+	help
+	  This driver creates /dev entries for each ePAPR hypervisor byte
+	  channel, thereby allowing applications to communicate with byte
+	  channels as if they were serial ports.
+
+config PPC_EARLY_DEBUG_EHV_BC
+	bool "Early console (udbg) support for ePAPR hypervisors"
+	depends on PPC_EPAPR_HV_BYTECHAN
+	help
+	  Select this option to enable early console (a.k.a. "udbg") support
+	  via an ePAPR byte channel.  You also need to choose the byte channel
+	  handle below.
+
+config PPC_EARLY_DEBUG_EHV_BC_HANDLE
+	int "Byte channel handle for early console (udbg)"
+	depends on PPC_EARLY_DEBUG_EHV_BC
+	default 0
+	help
+	  If you want early console (udbg) output through a byte channel,
+	  specify the handle of the byte channel to use.
+
+	  For this to work, the byte channel driver must be compiled
+	  in-kernel, not as a module.
+
+	  Note that only one early console driver can be enabled, so don't
+	  enable any others if you enable this one.
+
+	  If the number you specify is not a valid byte channel handle, then
+	  there simply will be no early console output.  This is true also
+	  if you don't boot under a hypervisor at all.
diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index ea89b0b..2953059 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -26,5 +26,6 @@ obj-$(CONFIG_ROCKETPORT)	+= rocket.o
 obj-$(CONFIG_SYNCLINK_GT)	+= synclink_gt.o
 obj-$(CONFIG_SYNCLINKMP)	+= synclinkmp.o
 obj-$(CONFIG_SYNCLINK)		+= synclink.o
+obj-$(CONFIG_PPC_EPAPR_HV_BYTECHAN) += ehv_bytechan.o
 
 obj-y += ipwireless/
diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c
new file mode 100644
index 0000000..e67f70b
--- /dev/null
+++ b/drivers/tty/ehv_bytechan.c
@@ -0,0 +1,888 @@
+/* ePAPR hypervisor byte channel device driver
+ *
+ * Copyright 2009-2011 Freescale Semiconductor, Inc.
+ *
+ * Author: Timur Tabi <timur@freescale.com>
+ *
+ * This file is licensed under the terms of the GNU General Public License
+ * version 2.  This program is licensed "as is" without any warranty of any
+ * kind, whether express or implied.
+ *
+ * This driver support three distinct interfaces, all of which are related to
+ * ePAPR hypervisor byte channels.
+ *
+ * 1) An early-console (udbg) driver.  This provides early console output
+ * through a byte channel.  The byte channel handle must be specified in a
+ * Kconfig option.
+ *
+ * 2) A normal console driver.  Output is sent to the byte channel designated
+ * for stdout in the device tree.  The console driver is for handling kernel
+ * printk calls.
+ *
+ * 3) A tty driver, which is used to handle user-space input and output.  The
+ * byte channel used for the console is designated as the default tty.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <asm/epapr_hcalls.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/cdev.h>
+#include <linux/console.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+#include <linux/circ_buf.h>
+#include <asm/udbg.h>
+
+/* The size of the transmit circular buffer.  This must be a power of two. */
+#define BUF_SIZE	2048
+
+/* Per-byte channel private data */
+struct ehv_bc_data {
+	struct device *dev;
+	struct tty_port port;
+	uint32_t handle;
+	unsigned int rx_irq;
+	unsigned int tx_irq;
+
+	spinlock_t lock;	/* lock for transmit buffer */
+	unsigned char buf[BUF_SIZE];	/* transmit circular buffer */
+	unsigned int head;	/* circular buffer head */
+	unsigned int tail;	/* circular buffer tail */
+
+	int tx_irq_enabled;	/* true == TX interrupt is enabled */
+};
+
+/* Array of byte channel objects */
+static struct ehv_bc_data *bcs;
+
+/* Byte channel handle for stdout (and stdin), taken from device tree */
+static unsigned int stdout_bc;
+
+/* Virtual IRQ for the byte channel handle for stdin, taken from device tree */
+static unsigned int stdout_irq;
+
+/**************************** SUPPORT FUNCTIONS ****************************/
+
+/*
+ * Enable the transmit interrupt
+ *
+ * Unlike a serial device, byte channels have no mechanism for disabling their
+ * own receive or transmit interrupts.  To emulate that feature, we toggle
+ * the IRQ in the kernel.
+ *
+ * We cannot just blindly call enable_irq() or disable_irq(), because these
+ * calls are reference counted.  This means that we cannot call enable_irq()
+ * if interrupts are already enabled.  This can happen in two situations:
+ *
+ * 1. The tty layer makes two back-to-back calls to ehv_bc_tty_write()
+ * 2. A transmit interrupt occurs while executing ehv_bc_tx_dequeue()
+ *
+ * To work around this, we keep a flag to tell us if the IRQ is enabled or not.
+ */
+static void enable_tx_interrupt(struct ehv_bc_data *bc)
+{
+	if (!bc->tx_irq_enabled) {
+		enable_irq(bc->tx_irq);
+		bc->tx_irq_enabled = 1;
+	}
+}
+
+static void disable_tx_interrupt(struct ehv_bc_data *bc)
+{
+	if (bc->tx_irq_enabled) {
+		disable_irq_nosync(bc->tx_irq);
+		bc->tx_irq_enabled = 0;
+	}
+}
+
+/*
+ * find the byte channel handle to use for the console
+ *
+ * The byte channel to be used for the console is specified via a "stdout"
+ * property in the /chosen node.
+ *
+ * For compatible with legacy device trees, we also look for a "stdout" alias.
+ */
+static int find_console_handle(void)
+{
+	struct device_node *np, *np2;
+	const char *sprop = NULL;
+	const uint32_t *iprop;
+
+	np = of_find_node_by_path("/chosen");
+	if (np)
+		sprop = of_get_property(np, "stdout-path", NULL);
+
+	if (!np || !sprop) {
+		of_node_put(np);
+		np = of_find_node_by_name(NULL, "aliases");
+		if (np)
+			sprop = of_get_property(np, "stdout", NULL);
+	}
+
+	if (!sprop) {
+		of_node_put(np);
+		return 0;
+	}
+
+	/* We don't care what the aliased node is actually called.  We only
+	 * care if it's compatible with "epapr,hv-byte-channel", because that
+	 * indicates that it's a byte channel node.  We use a temporary
+	 * variable, 'np2', because we can't release 'np' until we're done with
+	 * 'sprop'.
+	 */
+	np2 = of_find_node_by_path(sprop);
+	of_node_put(np);
+	np = np2;
+	if (!np) {
+		pr_warning("ehv-bc: stdout node '%s' does not exist\n", sprop);
+		return 0;
+	}
+
+	/* Is it a byte channel? */
+	if (!of_device_is_compatible(np, "epapr,hv-byte-channel")) {
+		of_node_put(np);
+		return 0;
+	}
+
+	stdout_irq = irq_of_parse_and_map(np, 0);
+	if (stdout_irq == NO_IRQ) {
+		pr_err("ehv-bc: no 'interrupts' property in %s node\n", sprop);
+		of_node_put(np);
+		return 0;
+	}
+
+	/*
+	 * The 'hv-handle' property contains the handle for this byte channel.
+	 */
+	iprop = of_get_property(np, "hv-handle", NULL);
+	if (!iprop) {
+		pr_err("ehv-bc: no 'hv-handle' property in %s node\n",
+		       np->name);
+		of_node_put(np);
+		return 0;
+	}
+	stdout_bc = be32_to_cpu(*iprop);
+
+	of_node_put(np);
+	return 1;
+}
+
+/*************************** EARLY CONSOLE DRIVER ***************************/
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_EHV_BC
+
+/*
+ * send a byte to a byte channel, wait if necessary
+ *
+ * This function sends a byte to a byte channel, and it waits and
+ * retries if the byte channel is full.  It returns if the character
+ * has been sent, or if some error has occurred.
+ *
+ */
+static void byte_channel_spin_send(const char data)
+{
+	int ret, count;
+
+	do {
+		count = 1;
+		ret = ev_byte_channel_send(CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE,
+					   &count, &data);
+	} while (ret == EV_EAGAIN);
+}
+
+/*
+ * The udbg subsystem calls this function to display a single character.
+ * We convert CR to a CR/LF.
+ */
+static void ehv_bc_udbg_putc(char c)
+{
+	if (c == '\n')
+		byte_channel_spin_send('\r');
+
+	byte_channel_spin_send(c);
+}
+
+/*
+ * early console initialization
+ *
+ * PowerPC kernels support an early printk console, also known as udbg.
+ * This function must be called via the ppc_md.init_early function pointer.
+ * At this point, the device tree has been unflattened, so we can obtain the
+ * byte channel handle for stdout.
+ *
+ * We only support displaying of characters (putc).  We do not support
+ * keyboard input.
+ */
+void __init udbg_init_ehv_bc(void)
+{
+	unsigned int rx_count, tx_count;
+	unsigned int ret;
+
+	/* Check if we're running as a guest of a hypervisor */
+	if (!(mfmsr() & MSR_GS))
+		return;
+
+	/* Verify the byte channel handle */
+	ret = ev_byte_channel_poll(CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE,
+				   &rx_count, &tx_count);
+	if (ret)
+		return;
+
+	udbg_putc = ehv_bc_udbg_putc;
+	register_early_udbg_console();
+
+	udbg_printf("ehv-bc: early console using byte channel handle %u\n",
+		    CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE);
+}
+
+#endif
+
+/****************************** CONSOLE DRIVER ******************************/
+
+static struct tty_driver *ehv_bc_driver;
+
+/*
+ * Byte channel console sending worker function.
+ *
+ * For consoles, if the output buffer is full, we should just spin until it
+ * clears.
+ */
+static int ehv_bc_console_byte_channel_send(unsigned int handle, const char *s,
+			     unsigned int count)
+{
+	unsigned int len;
+	int ret = 0;
+
+	while (count) {
+		len = min_t(unsigned int, count, EV_BYTE_CHANNEL_MAX_BYTES);
+		do {
+			ret = ev_byte_channel_send(handle, &len, s);
+		} while (ret == EV_EAGAIN);
+		count -= len;
+		s += len;
+	}
+
+	return ret;
+}
+
+/*
+ * write a string to the console
+ *
+ * This function gets called to write a string from the kernel, typically from
+ * a printk().  This function spins until all data is written.
+ *
+ * We copy the data to a temporary buffer because we need to insert a \r in
+ * front of every \n.  It's more efficient to copy the data to the buffer than
+ * it is to make multiple hcalls for each character or each newline.
+ */
+static void ehv_bc_console_write(struct console *co, const char *s,
+				 unsigned int count)
+{
+	unsigned int handle = (unsigned int)co->data;
+	char s2[EV_BYTE_CHANNEL_MAX_BYTES];
+	unsigned int i, j = 0;
+	char c;
+
+	for (i = 0; i < count; i++) {
+		c = *s++;
+
+		if (c == '\n')
+			s2[j++] = '\r';
+
+		s2[j++] = c;
+		if (j >= (EV_BYTE_CHANNEL_MAX_BYTES - 1)) {
+			if (ehv_bc_console_byte_channel_send(handle, s2, j))
+				return;
+			j = 0;
+		}
+	}
+
+	if (j)
+		ehv_bc_console_byte_channel_send(handle, s2, j);
+}
+
+/*
+ * When /dev/console is opened, the kernel iterates the console list looking
+ * for one with ->device and then calls that method. On success, it expects
+ * the passed-in int* to contain the minor number to use.
+ */
+static struct tty_driver *ehv_bc_console_device(struct console *co, int *index)
+{
+	*index = co->index;
+
+	return ehv_bc_driver;
+}
+
+static struct console ehv_bc_console = {
+	.name		= "ttyEHV",
+	.write		= ehv_bc_console_write,
+	.device		= ehv_bc_console_device,
+	.flags		= CON_PRINTBUFFER | CON_ENABLED,
+};
+
+/*
+ * Console initialization
+ *
+ * This is the first function that is called after the device tree is
+ * available, so here is where we determine the byte channel handle and IRQ for
+ * stdout/stdin, even though that information is used by the tty and character
+ * drivers.
+ */
+static int __init ehv_bc_console_init(void)
+{
+	if (!find_console_handle()) {
+		pr_debug("ehv-bc: stdout is not a byte channel\n");
+		return -ENODEV;
+	}
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_EHV_BC
+	/* Print a friendly warning if the user chose the wrong byte channel
+	 * handle for udbg.
+	 */
+	if (stdout_bc != CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE)
+		pr_warning("ehv-bc: udbg handle %u is not the stdout handle\n",
+			   CONFIG_PPC_EARLY_DEBUG_EHV_BC_HANDLE);
+#endif
+
+	ehv_bc_console.data = (void *)stdout_bc;
+
+	/* add_preferred_console() must be called before register_console(),
+	   otherwise it won't work.  However, we don't want to enumerate all the
+	   byte channels here, either, since we only care about one. */
+
+	add_preferred_console(ehv_bc_console.name, ehv_bc_console.index, NULL);
+	register_console(&ehv_bc_console);
+
+	pr_info("ehv-bc: registered console driver for byte channel %u\n",
+		stdout_bc);
+
+	return 0;
+}
+console_initcall(ehv_bc_console_init);
+
+/******************************** TTY DRIVER ********************************/
+
+/*
+ * byte channel receive interupt handler
+ *
+ * This ISR is called whenever data is available on a byte channel.
+ */
+static irqreturn_t ehv_bc_tty_rx_isr(int irq, void *data)
+{
+	struct ehv_bc_data *bc = data;
+	struct tty_struct *ttys = tty_port_tty_get(&bc->port);
+	unsigned int rx_count, tx_count, len;
+	int count;
+	char buffer[EV_BYTE_CHANNEL_MAX_BYTES];
+	int ret;
+
+	/* ttys could be NULL during a hangup */
+	if (!ttys)
+		return IRQ_HANDLED;
+
+	/* Find out how much data needs to be read, and then ask the TTY layer
+	 * if it can handle that much.  We want to ensure that every byte we
+	 * read from the byte channel will be accepted by the TTY layer.
+	 */
+	ev_byte_channel_poll(bc->handle, &rx_count, &tx_count);
+	count = tty_buffer_request_room(ttys, rx_count);
+
+	/* 'count' is the maximum amount of data the TTY layer can accept at
+	 * this time.  However, during testing, I was never able to get 'count'
+	 * to be less than 'rx_count'.  I'm not sure whether I'm calling it
+	 * correctly.
+	 */
+
+	while (count > 0) {
+		len = min_t(unsigned int, count, sizeof(buffer));
+
+		/* Read some data from the byte channel.  This function will
+		 * never return more than EV_BYTE_CHANNEL_MAX_BYTES bytes.
+		 */
+		ev_byte_channel_receive(bc->handle, &len, buffer);
+
+		/* 'len' is now the amount of data that's been received. 'len'
+		 * can't be zero, and most likely it's equal to one.
+		 */
+
+		/* Pass the received data to the tty layer. */
+		ret = tty_insert_flip_string(ttys, buffer, len);
+
+		/* 'ret' is the number of bytes that the TTY layer accepted.
+		 * If it's not equal to 'len', then it means the buffer is
+		 * full, which should never happen.  If it does happen, we can
+		 * exit gracefully, but we drop the last 'len - ret' characters
+		 * that we read from the byte channel.
+		 */
+		if (ret != len)
+			break;
+
+		count -= len;
+	}
+
+	/* Tell the tty layer that we're done. */
+	tty_flip_buffer_push(ttys);
+
+	tty_kref_put(ttys);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * dequeue the transmit buffer to the hypervisor
+ *
+ * This function, which can be called in interrupt context, dequeues as much
+ * data as possible from the transmit buffer to the byte channel.
+ */
+static void ehv_bc_tx_dequeue(struct ehv_bc_data *bc)
+{
+	unsigned int count;
+	unsigned int len, ret;
+	unsigned long flags;
+
+	do {
+		spin_lock_irqsave(&bc->lock, flags);
+		len = min_t(unsigned int,
+			    CIRC_CNT_TO_END(bc->head, bc->tail, BUF_SIZE),
+			    EV_BYTE_CHANNEL_MAX_BYTES);
+
+		ret = ev_byte_channel_send(bc->handle, &len, bc->buf + bc->tail);
+
+		/* 'len' is valid only if the return code is 0 or EV_EAGAIN */
+		if (!ret || (ret == EV_EAGAIN))
+			bc->tail = (bc->tail + len) & (BUF_SIZE - 1);
+
+		count = CIRC_CNT(bc->head, bc->tail, BUF_SIZE);
+		spin_unlock_irqrestore(&bc->lock, flags);
+	} while (count && !ret);
+
+	spin_lock_irqsave(&bc->lock, flags);
+	if (CIRC_CNT(bc->head, bc->tail, BUF_SIZE))
+		/*
+		 * If we haven't emptied the buffer, then enable the TX IRQ.
+		 * We'll get an interrupt when there's more room in the
+		 * hypervisor's output buffer.
+		 */
+		enable_tx_interrupt(bc);
+	else
+		disable_tx_interrupt(bc);
+	spin_unlock_irqrestore(&bc->lock, flags);
+}
+
+/*
+ * byte channel transmit interupt handler
+ *
+ * This ISR is called whenever space becomes available for transmitting
+ * characters on a byte channel.
+ */
+static irqreturn_t ehv_bc_tty_tx_isr(int irq, void *data)
+{
+	struct ehv_bc_data *bc = data;
+	struct tty_struct *ttys = tty_port_tty_get(&bc->port);
+
+	ehv_bc_tx_dequeue(bc);
+	if (ttys) {
+		tty_wakeup(ttys);
+		tty_kref_put(ttys);
+	}
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * This function is called when the tty layer has data for us send.  We store
+ * the data first in a circular buffer, and then dequeue as much of that data
+ * as possible.
+ *
+ * We don't need to worry about whether there is enough room in the buffer for
+ * all the data.  The purpose of ehv_bc_tty_write_room() is to tell the tty
+ * layer how much data it can safely send to us.  We guarantee that
+ * ehv_bc_tty_write_room() will never lie, so the tty layer will never send us
+ * too much data.
+ */
+static int ehv_bc_tty_write(struct tty_struct *ttys, const unsigned char *s,
+			    int count)
+{
+	struct ehv_bc_data *bc = ttys->driver_data;
+	unsigned long flags;
+	unsigned int len;
+	unsigned int written = 0;
+
+	while (1) {
+		spin_lock_irqsave(&bc->lock, flags);
+		len = CIRC_SPACE_TO_END(bc->head, bc->tail, BUF_SIZE);
+		if (count < len)
+			len = count;
+		if (len) {
+			memcpy(bc->buf + bc->head, s, len);
+			bc->head = (bc->head + len) & (BUF_SIZE - 1);
+		}
+		spin_unlock_irqrestore(&bc->lock, flags);
+		if (!len)
+			break;
+
+		s += len;
+		count -= len;
+		written += len;
+	}
+
+	ehv_bc_tx_dequeue(bc);
+
+	return written;
+}
+
+/*
+ * This function can be called multiple times for a given tty_struct, which is
+ * why we initialize bc->ttys in ehv_bc_tty_port_activate() instead.
+ *
+ * The tty layer will still call this function even if the device was not
+ * registered (i.e. tty_register_device() was not called).  This happens
+ * because tty_register_device() is optional and some legacy drivers don't
+ * use it.  So we need to check for that.
+ */
+static int ehv_bc_tty_open(struct tty_struct *ttys, struct file *filp)
+{
+	struct ehv_bc_data *bc = &bcs[ttys->index];
+
+	if (!bc->dev)
+		return -ENODEV;
+
+	return tty_port_open(&bc->port, ttys, filp);
+}
+
+/*
+ * Amazingly, if ehv_bc_tty_open() returns an error code, the tty layer will
+ * still call this function to close the tty device.  So we can't assume that
+ * the tty port has been initialized.
+ */
+static void ehv_bc_tty_close(struct tty_struct *ttys, struct file *filp)
+{
+	struct ehv_bc_data *bc = &bcs[ttys->index];
+
+	if (bc->dev)
+		tty_port_close(&bc->port, ttys, filp);
+}
+
+/*
+ * Return the amount of space in the output buffer
+ *
+ * This is actually a contract between the driver and the tty layer outlining
+ * how much write room the driver can guarantee will be sent OR BUFFERED.  This
+ * driver MUST honor the return value.
+ */
+static int ehv_bc_tty_write_room(struct tty_struct *ttys)
+{
+	struct ehv_bc_data *bc = ttys->driver_data;
+	unsigned long flags;
+	int count;
+
+	spin_lock_irqsave(&bc->lock, flags);
+	count = CIRC_SPACE(bc->head, bc->tail, BUF_SIZE);
+	spin_unlock_irqrestore(&bc->lock, flags);
+
+	return count;
+}
+
+/*
+ * Stop sending data to the tty layer
+ *
+ * This function is called when the tty layer's input buffers are getting full,
+ * so the driver should stop sending it data.  The easiest way to do this is to
+ * disable the RX IRQ, which will prevent ehv_bc_tty_rx_isr() from being
+ * called.
+ *
+ * The hypervisor will continue to queue up any incoming data.  If there is any
+ * data in the queue when the RX interrupt is enabled, we'll immediately get an
+ * RX interrupt.
+ */
+static void ehv_bc_tty_throttle(struct tty_struct *ttys)
+{
+	struct ehv_bc_data *bc = ttys->driver_data;
+
+	disable_irq(bc->rx_irq);
+}
+
+/*
+ * Resume sending data to the tty layer
+ *
+ * This function is called after previously calling ehv_bc_tty_throttle().  The
+ * tty layer's input buffers now have more room, so the driver can resume
+ * sending it data.
+ */
+static void ehv_bc_tty_unthrottle(struct tty_struct *ttys)
+{
+	struct ehv_bc_data *bc = ttys->driver_data;
+
+	/* If there is any data in the queue when the RX interrupt is enabled,
+	 * we'll immediately get an RX interrupt.
+	 */
+	enable_irq(bc->rx_irq);
+}
+
+static void ehv_bc_tty_hangup(struct tty_struct *ttys)
+{
+	struct ehv_bc_data *bc = ttys->driver_data;
+
+	ehv_bc_tx_dequeue(bc);
+	tty_port_hangup(&bc->port);
+}
+
+/*
+ * TTY driver operations
+ *
+ * If we could ask the hypervisor how much data is still in the TX buffer, or
+ * at least how big the TX buffers are, then we could implement the
+ * .wait_until_sent and .chars_in_buffer functions.
+ */
+static const struct tty_operations ehv_bc_ops = {
+	.open		= ehv_bc_tty_open,
+	.close		= ehv_bc_tty_close,
+	.write		= ehv_bc_tty_write,
+	.write_room	= ehv_bc_tty_write_room,
+	.throttle	= ehv_bc_tty_throttle,
+	.unthrottle	= ehv_bc_tty_unthrottle,
+	.hangup		= ehv_bc_tty_hangup,
+};
+
+/*
+ * initialize the TTY port
+ *
+ * This function will only be called once, no matter how many times
+ * ehv_bc_tty_open() is called.  That's why we register the ISR here, and also
+ * why we initialize tty_struct-related variables here.
+ */
+static int ehv_bc_tty_port_activate(struct tty_port *port,
+				    struct tty_struct *ttys)
+{
+	struct ehv_bc_data *bc = container_of(port, struct ehv_bc_data, port);
+	int ret;
+
+	ttys->driver_data = bc;
+
+	ret = request_irq(bc->rx_irq, ehv_bc_tty_rx_isr, 0, "ehv-bc", bc);
+	if (ret < 0) {
+		dev_err(bc->dev, "could not request rx irq %u (ret=%i)\n",
+		       bc->rx_irq, ret);
+		return ret;
+	}
+
+	/* request_irq also enables the IRQ */
+	bc->tx_irq_enabled = 1;
+
+	ret = request_irq(bc->tx_irq, ehv_bc_tty_tx_isr, 0, "ehv-bc", bc);
+	if (ret < 0) {
+		dev_err(bc->dev, "could not request tx irq %u (ret=%i)\n",
+		       bc->tx_irq, ret);
+		free_irq(bc->rx_irq, bc);
+		return ret;
+	}
+
+	/* The TX IRQ is enabled only when we can't write all the data to the
+	 * byte channel at once, so by default it's disabled.
+	 */
+	disable_tx_interrupt(bc);
+
+	return 0;
+}
+
+static void ehv_bc_tty_port_shutdown(struct tty_port *port)
+{
+	struct ehv_bc_data *bc = container_of(port, struct ehv_bc_data, port);
+
+	free_irq(bc->tx_irq, bc);
+	free_irq(bc->rx_irq, bc);
+}
+
+static const struct tty_port_operations ehv_bc_tty_port_ops = {
+	.activate = ehv_bc_tty_port_activate,
+	.shutdown = ehv_bc_tty_port_shutdown,
+};
+
+static int __devinit ehv_bc_tty_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct ehv_bc_data *bc;
+	const uint32_t *iprop;
+	unsigned int handle;
+	int ret;
+	static unsigned int index = 1;
+	unsigned int i;
+
+	iprop = of_get_property(np, "hv-handle", NULL);
+	if (!iprop) {
+		dev_err(&pdev->dev, "no 'hv-handle' property in %s node\n",
+			np->name);
+		return -ENODEV;
+	}
+
+	/* We already told the console layer that the index for the console
+	 * device is zero, so we need to make sure that we use that index when
+	 * we probe the console byte channel node.
+	 */
+	handle = be32_to_cpu(*iprop);
+	i = (handle == stdout_bc) ? 0 : index++;
+	bc = &bcs[i];
+
+	bc->handle = handle;
+	bc->head = 0;
+	bc->tail = 0;
+	spin_lock_init(&bc->lock);
+
+	bc->rx_irq = irq_of_parse_and_map(np, 0);
+	bc->tx_irq = irq_of_parse_and_map(np, 1);
+	if ((bc->rx_irq == NO_IRQ) || (bc->tx_irq == NO_IRQ)) {
+		dev_err(&pdev->dev, "no 'interrupts' property in %s node\n",
+			np->name);
+		ret = -ENODEV;
+		goto error;
+	}
+
+	bc->dev = tty_register_device(ehv_bc_driver, i, &pdev->dev);
+	if (IS_ERR(bc->dev)) {
+		ret = PTR_ERR(bc->dev);
+		dev_err(&pdev->dev, "could not register tty (ret=%i)\n", ret);
+		goto error;
+	}
+
+	tty_port_init(&bc->port);
+	bc->port.ops = &ehv_bc_tty_port_ops;
+
+	dev_set_drvdata(&pdev->dev, bc);
+
+	dev_info(&pdev->dev, "registered /dev/%s%u for byte channel %u\n",
+		ehv_bc_driver->name, i, bc->handle);
+
+	return 0;
+
+error:
+	irq_dispose_mapping(bc->tx_irq);
+	irq_dispose_mapping(bc->rx_irq);
+
+	memset(bc, 0, sizeof(struct ehv_bc_data));
+	return ret;
+}
+
+static int ehv_bc_tty_remove(struct platform_device *pdev)
+{
+	struct ehv_bc_data *bc = dev_get_drvdata(&pdev->dev);
+
+	tty_unregister_device(ehv_bc_driver, bc - bcs);
+
+	irq_dispose_mapping(bc->tx_irq);
+	irq_dispose_mapping(bc->rx_irq);
+
+	return 0;
+}
+
+static const struct of_device_id ehv_bc_tty_of_ids[] = {
+	{ .compatible = "epapr,hv-byte-channel" },
+	{}
+};
+
+static struct platform_driver ehv_bc_tty_driver = {
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = "ehv-bc",
+		.of_match_table = ehv_bc_tty_of_ids,
+	},
+	.probe		= ehv_bc_tty_probe,
+	.remove		= ehv_bc_tty_remove,
+};
+
+/**
+ * ehv_bc_init - ePAPR hypervisor byte channel driver initialization
+ *
+ * This function is called when this module is loaded.
+ */
+static int __init ehv_bc_init(void)
+{
+	struct device_node *np;
+	unsigned int count = 0; /* Number of elements in bcs[] */
+	int ret;
+
+	pr_info("ePAPR hypervisor byte channel driver\n");
+
+	/* Count the number of byte channels */
+	for_each_compatible_node(np, NULL, "epapr,hv-byte-channel")
+		count++;
+
+	if (!count)
+		return -ENODEV;
+
+	/* The array index of an element in bcs[] is the same as the tty index
+	 * for that element.  If you know the address of an element in the
+	 * array, then you can use pointer math (e.g. "bc - bcs") to get its
+	 * tty index.
+	 */
+	bcs = kzalloc(count * sizeof(struct ehv_bc_data), GFP_KERNEL);
+	if (!bcs)
+		return -ENOMEM;
+
+	ehv_bc_driver = alloc_tty_driver(count);
+	if (!ehv_bc_driver) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	ehv_bc_driver->owner = THIS_MODULE;
+	ehv_bc_driver->driver_name = "ehv-bc";
+	ehv_bc_driver->name = ehv_bc_console.name;
+	ehv_bc_driver->type = TTY_DRIVER_TYPE_CONSOLE;
+	ehv_bc_driver->subtype = SYSTEM_TYPE_CONSOLE;
+	ehv_bc_driver->init_termios = tty_std_termios;
+	ehv_bc_driver->flags = TTY_DRIVER_REAL_RAW | TTY_DRIVER_DYNAMIC_DEV;
+	tty_set_operations(ehv_bc_driver, &ehv_bc_ops);
+
+	ret = tty_register_driver(ehv_bc_driver);
+	if (ret) {
+		pr_err("ehv-bc: could not register tty driver (ret=%i)\n", ret);
+		goto error;
+	}
+
+	ret = platform_driver_register(&ehv_bc_tty_driver);
+	if (ret) {
+		pr_err("ehv-bc: could not register platform driver (ret=%i)\n",
+		       ret);
+		goto error;
+	}
+
+	return 0;
+
+error:
+	if (ehv_bc_driver) {
+		tty_unregister_driver(ehv_bc_driver);
+		put_tty_driver(ehv_bc_driver);
+	}
+
+	kfree(bcs);
+
+	return ret;
+}
+
+
+/**
+ * ehv_bc_exit - ePAPR hypervisor byte channel driver termination
+ *
+ * This function is called when this driver is unloaded.
+ */
+static void __exit ehv_bc_exit(void)
+{
+	tty_unregister_driver(ehv_bc_driver);
+	put_tty_driver(ehv_bc_driver);
+	kfree(bcs);
+}
+
+module_init(ehv_bc_init);
+module_exit(ehv_bc_exit);
+
+MODULE_AUTHOR("Timur Tabi <timur@freescale.com>");
+MODULE_DESCRIPTION("ePAPR hypervisor byte channel driver");
+MODULE_LICENSE("GPL v2");
-- 
1.7.3.4

^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Timur Tabi @ 2011-08-25 15:22 UTC (permalink / raw)
  To: Greg KH; +Cc: Stephen Rothwell, linux-next, ppc-dev, linux-kernel
In-Reply-To: <20110825140820.GA9126@kroah.com>

Greg KH wrote:
>> > MSR_GS is defined in arch/powerpc/include/asm/reg_booke.h which is
>> > included by arch/powerpc/include/asm/reg.h but only when defined
>> > (CONFIG_BOOKE) || defined(CONFIG_40x).
> Thanks for the report.
> 
> Timur, care to send a fixup patch for this so this gets resolved?

Is there some trick to building allyesconfig on PowerPC?  When I do try that, I
get all sorts of weird build errors, and it dies long before it gets to my
driver.  I get stuff like:

  LD      arch/powerpc/sysdev/xics/built-in.o
WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1310): Section mismatch in
reference from the function .icp_native_init() to the function
.init.text:.icp_native_init_one_node()
The function .icp_native_init() references
the function __init .icp_native_init_one_node().
This is often because .icp_native_init lacks a __init
annotation or the annotation of .icp_native_init_one_node is wrong.

and

  AS      arch/powerpc/kernel/head_64.o
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1151: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1160: Error: attempt to move .org backwards
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

I guess I don't have the right compiler.

Anyway, I think I know how to fix the break that Stephen is seeing.  I will post
a v4 patch in a few minutes.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Greg KH @ 2011-08-25 14:08 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-next, ppc-dev, linux-kernel, Timur Tabi
In-Reply-To: <20110825161843.daf46b6b4023926fcfeec387@canb.auug.org.au>

On Thu, Aug 25, 2011 at 04:18:43PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/tty/ehv_bytechan.c: In function 'udbg_init_ehv_bc':
> drivers/tty/ehv_bytechan.c:230:18: error: 'MSR_GS' undeclared (first use in this function)
> drivers/tty/ehv_bytechan.c: In function 'ehv_bc_console_write':
> drivers/tty/ehv_bytechan.c:289:24: warning: cast from pointer to integer of different size
> drivers/tty/ehv_bytechan.c: In function 'ehv_bc_console_init':
> drivers/tty/ehv_bytechan.c:355:24: warning: cast to pointer from integer of different size
> 
> Caused by commit dcd83aaff1c8 ("tty/powerpc: introduce the ePAPR embedded
> hypervisor byte channel driver").
> 
> MSR_GS is defined in arch/powerpc/include/asm/reg_booke.h which is
> included by arch/powerpc/include/asm/reg.h but only when defined
> (CONFIG_BOOKE) || defined(CONFIG_40x).

Thanks for the report.

Timur, care to send a fixup patch for this so this gets resolved?

greg k-h

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (tty tree related)
From: Timur Tabi @ 2011-08-25 14:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Stephen Rothwell, <linux-next@vger.kernel.org>, ppc-dev,
	<linux-kernel@vger.kernel.org>
In-Reply-To: <20110825140820.GA9126@kroah.com>



On Aug 25, 2011, at 9:08 AM, Greg KH <greg@kroah.com> wrote:

> On Thu, Aug 25, 2011 at 04:18:43PM +1000, Stephen Rothwell wrote:
>> 
> 
> Thanks for the report.
> 
> Timur, care to send a fixup patch for this so this gets resolved?

Yes, I will do it ASAP, probably within the next two hours.
> 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alexander Graf @ 2011-08-25 13:25 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Avi Kivity,
	Anthony Liguori, linux-pci@vger.kernel.org, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <20110825123146.GD1923@amd.com>

[-- Attachment #1: Type: text/plain, Size: 1113 bytes --]


On 25.08.2011, at 07:31, Roedel, Joerg wrote:

> On Wed, Aug 24, 2011 at 11:07:46AM -0400, Alex Williamson wrote:
>> On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote:
> 

[...]

>> We need to try the polite method of attempting to hot unplug the device
>> from qemu first, which the current vfio code already implements.  We can
>> then escalate if it doesn't respond.  The current code calls abort in
>> qemu if the guest doesn't respond, but I agree we should also be
>> enforcing this at the kernel interface.  I think the problem with the
>> hard-unplug is that we don't have a good revoke mechanism for the mmio
>> mmaps.
> 
> For mmio we could stop the guest and replace the mmio region with a
> region that is filled with 0xff, no?

Sure, but that happens in user space. The question is how does kernel space enforce an MMIO region to not be mapped after the hotplug event occured? Keep in mind that user space is pretty much untrusted here - it doesn't have to be QEMU. It could just as well be a generic user space driver. And that can just ignore hotplug events.


Alex


[-- Attachment #2: Type: text/html, Size: 1843 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox