LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv5 4/4] arm64: Add build salt to the vDSO
From: Laura Abbott @ 2018-07-03 23:34 UTC (permalink / raw)
  To: mjw, H . J . Lu, Masahiro Yamada, Catalin Marinas, Will Deacon
  Cc: Laura Abbott, Andy Lutomirski, Linus Torvalds, X86 ML,
	linux-kernel, Nick Clifton, Cary Coutant, linux-kbuild,
	linuxppc-dev, Michael Ellerman, linux-arm-kernel
In-Reply-To: <20180703233430.14416-1-labbott@redhat.com>


The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v5: I was previously focused on x86 only but since powerpc gave a patch,
I figured I would do arm64 since the changes were also fairly simple.
---
 arch/arm64/kernel/vdso/note.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/vdso/note.S b/arch/arm64/kernel/vdso/note.S
index b82c85e5d972..2c429dfd3f45 100644
--- a/arch/arm64/kernel/vdso/note.S
+++ b/arch/arm64/kernel/vdso/note.S
@@ -22,7 +22,10 @@
 #include <linux/uts.h>
 #include <linux/version.h>
 #include <linux/elfnote.h>
+#include <linux/build-salt.h>
 
 ELFNOTE_START(Linux, 0, "a")
 	.long LINUX_VERSION_CODE
 ELFNOTE_END
+
+BUILD_SALT;
-- 
2.17.1

^ permalink raw reply related

* [PATCHv5 3/4] powerpc: Add build salt to the vDSO
From: Laura Abbott @ 2018-07-03 23:34 UTC (permalink / raw)
  To: mjw, H . J . Lu, Masahiro Yamada, Michael Ellerman
  Cc: Laura Abbott, Andy Lutomirski, Linus Torvalds, X86 ML,
	linux-kernel, Nick Clifton, Cary Coutant, linux-kbuild,
	linuxppc-dev, Catalin Marinas, Will Deacon, linux-arm-kernel
In-Reply-To: <20180703233430.14416-1-labbott@redhat.com>


The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v5: New approach with the BUILD_SALT macro
---
 arch/powerpc/kernel/vdso32/note.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/note.S b/arch/powerpc/kernel/vdso32/note.S
index d4b5be4f3d5f..ec1d05265c75 100644
--- a/arch/powerpc/kernel/vdso32/note.S
+++ b/arch/powerpc/kernel/vdso32/note.S
@@ -5,6 +5,7 @@
 
 #include <linux/uts.h>
 #include <linux/version.h>
+#include <linux/build-salt.h>
 
 #define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type)			      \
 	.section name, flags;						      \
@@ -23,3 +24,5 @@
 	ASM_ELF_NOTE_BEGIN(".note.kernel-version", "a", UTS_SYSNAME, 0)
 	.long LINUX_VERSION_CODE
 	ASM_ELF_NOTE_END
+
+BUILD_SALT;
-- 
2.17.1

^ permalink raw reply related

* [PATCHv5 2/4] x86: Add build salt to the vDSO
From: Laura Abbott @ 2018-07-03 23:34 UTC (permalink / raw)
  To: Andy Lutomirski, mjw, H . J . Lu, Masahiro Yamada
  Cc: Laura Abbott, Linus Torvalds, X86 ML, linux-kernel, Nick Clifton,
	Cary Coutant, linux-kbuild, linuxppc-dev, Michael Ellerman,
	Catalin Marinas, Will Deacon, linux-arm-kernel
In-Reply-To: <20180703233430.14416-1-labbott@redhat.com>


The vDSO needs to have a unique build id in a similar manner
to the kernel and modules. Use the build salt macro.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v5: Switched to using the single line BUILD_SALT macro
---
 arch/x86/entry/vdso/vdso-note.S   | 3 +++
 arch/x86/entry/vdso/vdso32/note.S | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/entry/vdso/vdso-note.S b/arch/x86/entry/vdso/vdso-note.S
index 79a071e4357e..79423170118f 100644
--- a/arch/x86/entry/vdso/vdso-note.S
+++ b/arch/x86/entry/vdso/vdso-note.S
@@ -3,6 +3,7 @@
  * Here we can supply some information useful to userland.
  */
 
+#include <linux/build-salt.h>
 #include <linux/uts.h>
 #include <linux/version.h>
 #include <linux/elfnote.h>
@@ -10,3 +11,5 @@
 ELFNOTE_START(Linux, 0, "a")
 	.long LINUX_VERSION_CODE
 ELFNOTE_END
+
+BUILD_SALT
diff --git a/arch/x86/entry/vdso/vdso32/note.S b/arch/x86/entry/vdso/vdso32/note.S
index 9fd51f206314..e78047d119f6 100644
--- a/arch/x86/entry/vdso/vdso32/note.S
+++ b/arch/x86/entry/vdso/vdso32/note.S
@@ -4,6 +4,7 @@
  * Here we can supply some information useful to userland.
  */
 
+#include <linux/build-salt.h>
 #include <linux/version.h>
 #include <linux/elfnote.h>
 
@@ -14,6 +15,8 @@ ELFNOTE_START(Linux, 0, "a")
 	.long LINUX_VERSION_CODE
 ELFNOTE_END
 
+BUILD_SALT
+
 #ifdef CONFIG_XEN
 /*
  * Add a special note telling glibc's dynamic linker a fake hardware
-- 
2.17.1

^ permalink raw reply related

* [PATCHv5 1/4] kbuild: Add build salt to the kernel and modules
From: Laura Abbott @ 2018-07-03 23:34 UTC (permalink / raw)
  To: Andy Lutomirski, mjw, H . J . Lu, Masahiro Yamada,
	Michael Ellerman, Catalin Marinas, Will Deacon
  Cc: Laura Abbott, Linus Torvalds, X86 ML, linux-kernel, Nick Clifton,
	Cary Coutant, linux-kbuild, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20180703233430.14416-1-labbott@redhat.com>


The build id generated from --build-id can be generated in several different
ways, with the default being the sha1 on the output of the linked file. For
distributions, it can be useful to make sure this ID is unique, even if the
actual file contents don't change. The easiest way to do this is to insert
a section with some data.

Add an ELF note to both the kernel and module which contains some data based
off of a config option.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v5: I used S-o-b here since the majority of the code was written
already. Please feel free to change the tag if you think it's not
appropriate. I also tweaked this to take an ascii string instead of just
a hex value since this makes things much easier on the distribution
side.
---
 include/linux/build-salt.h | 20 ++++++++++++++++++++
 init/Kconfig               |  9 +++++++++
 init/version.c             |  3 +++
 scripts/mod/modpost.c      |  3 +++
 4 files changed, 35 insertions(+)
 create mode 100644 include/linux/build-salt.h

diff --git a/include/linux/build-salt.h b/include/linux/build-salt.h
new file mode 100644
index 000000000000..bb007bd05e7a
--- /dev/null
+++ b/include/linux/build-salt.h
@@ -0,0 +1,20 @@
+#ifndef __BUILD_SALT_H
+#define __BUILD_SALT_H
+
+#include <linux/elfnote.h>
+
+#define LINUX_ELFNOTE_BUILD_SALT       0x100
+
+#ifdef __ASSEMBLER__
+
+#define BUILD_SALT \
+       ELFNOTE(Linux, LINUX_ELFNOTE_BUILD_SALT, .asciz CONFIG_BUILD_SALT)
+
+#else
+
+#define BUILD_SALT \
+       ELFNOTE32("Linux", LINUX_ELFNOTE_BUILD_SALT, CONFIG_BUILD_SALT)
+
+#endif
+
+#endif /* __BUILD_SALT_H */
diff --git a/init/Kconfig b/init/Kconfig
index 041f3a022122..8de789f40db9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -107,6 +107,15 @@ config LOCALVERSION_AUTO
 
 	  which is done within the script "scripts/setlocalversion".)
 
+config BUILD_SALT
+       string "Build ID Salt"
+       default "Linux"
+       help
+          The build ID is used to link binaries and their debug info. Setting
+          this option will use the value in the calculation of the build id.
+          This is mostly useful for distributions which want to ensure the
+          build is unique between builds. It's safe to leave the default.
+
 config HAVE_KERNEL_GZIP
 	bool
 
diff --git a/init/version.c b/init/version.c
index bfb4e3f4955e..ef4012ec4375 100644
--- a/init/version.c
+++ b/init/version.c
@@ -7,6 +7,7 @@
  */
 
 #include <generated/compile.h>
+#include <linux/build-salt.h>
 #include <linux/export.h>
 #include <linux/uts.h>
 #include <linux/utsname.h>
@@ -49,3 +50,5 @@ const char linux_proc_banner[] =
 	"%s version %s"
 	" (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
 	" (" LINUX_COMPILER ") %s\n";
+
+BUILD_SALT;
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 1663fb19343a..dc6d714e4dcb 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2125,10 +2125,13 @@ static int check_modname_len(struct module *mod)
  **/
 static void add_header(struct buffer *b, struct module *mod)
 {
+	buf_printf(b, "#include <linux/build-salt.h>\n");
 	buf_printf(b, "#include <linux/module.h>\n");
 	buf_printf(b, "#include <linux/vermagic.h>\n");
 	buf_printf(b, "#include <linux/compiler.h>\n");
 	buf_printf(b, "\n");
+	buf_printf(b, "BUILD_SALT;\n");
+	buf_printf(b, "\n");
 	buf_printf(b, "MODULE_INFO(vermagic, VERMAGIC_STRING);\n");
 	buf_printf(b, "MODULE_INFO(name, KBUILD_MODNAME);\n");
 	buf_printf(b, "\n");
-- 
2.17.1

^ permalink raw reply related

* [PATCHv5 0/4] Salted build ids via ELF notes
From: Laura Abbott @ 2018-07-03 23:34 UTC (permalink / raw)
  To: Andy Lutomirski, mjw, H . J . Lu, Masahiro Yamada,
	Michael Ellerman, Catalin Marinas, Will Deacon
  Cc: Laura Abbott, Linus Torvalds, X86 ML, linux-kernel, Nick Clifton,
	Cary Coutant, linux-kbuild, linuxppc-dev, linux-arm-kernel

Hi,

This is v5 of the series to allow unique build ids in the kernel. As a
reminder of the context:

""
In Fedora, the debug information is packaged separately (foo-debuginfo) and
can be installed separately. There's been a long standing issue where only one
version of a debuginfo info package can be installed at a time. Mark Wielaard
made an effort for Fedora 27 to allow parallel installation of debuginfo (see
https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo for
more details)

Part of the requirement to allow this to work is that build ids are
unique between builds. The existing upstream rpm implementation ensures
this by re-calculating the build-id using the version and release as a
seed. This doesn't work 100% for the kernel because of the vDSO which is
its own binary and doesn't get updated. After poking holes in a few of my
ideas, there was a discussion with some people from the binutils team about
adding --build-id-salt to let ld do the calculation debugedit is doing. There
was a counter proposal made to add in the salt while building. The
easiest proposal was to add an item in the linker script vs. linking in
an object since we need the salt to go in every module as well as the
kernel and vmlinux.
""

v5 uses the approach suggested by Masahiro Yamada which uses the
existing ELF note macro to more easily add the salt (vs previous
approaches which tried to adjust via linker section).

If arch maintainers are okay, I'd like acks for this so this can go
through the kbuild tree.

Thanks,
Laura

Laura Abbott (4):
  kbuild: Add build salt to the kernel and modules
  x86: Add build salt to the vDSO
  powerpc: Add build salt to the vDSO
  arm64: Add build salt to the vDSO

 arch/arm64/kernel/vdso/note.S     |  3 +++
 arch/powerpc/kernel/vdso32/note.S |  3 +++
 arch/x86/entry/vdso/vdso-note.S   |  3 +++
 arch/x86/entry/vdso/vdso32/note.S |  3 +++
 include/linux/build-salt.h        | 20 ++++++++++++++++++++
 init/Kconfig                      |  9 +++++++++
 init/version.c                    |  3 +++
 scripts/mod/modpost.c             |  3 +++
 8 files changed, 47 insertions(+)
 create mode 100644 include/linux/build-salt.h

-- 
2.17.1

^ permalink raw reply

* [PATCHv5 0/4] Salted build ids via ELF notes
From: Laura Abbott @ 2018-07-03 23:21 UTC (permalink / raw)
  To: Andy Lutomirski, mjw, H . J . Lu, Masahiro Yamada,
	Michael Ellerman, Catalin Marinas, Will Deacon
  Cc: Laura Abbott, Linus Torvalds, X86 ML, linux-kernel, Nick Clifton,
	Cary Coutant, linux-kbuild, linuxppc-dev, linux-arm-kernel

Hi,

This is v5 of the series to allow unique build ids in the kernel. As a
reminder of the context:

""
In Fedora, the debug information is packaged separately (foo-debuginfo) and
can be installed separately. There's been a long standing issue where only one
version of a debuginfo info package can be installed at a time. Mark Wielaard
made an effort for Fedora 27 to allow parallel installation of debuginfo (see
https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo for
more details)

Part of the requirement to allow this to work is that build ids are
unique between builds. The existing upstream rpm implementation ensures
this by re-calculating the build-id using the version and release as a
seed. This doesn't work 100% for the kernel because of the vDSO which is
its own binary and doesn't get updated. After poking holes in a few of my
ideas, there was a discussion with some people from the binutils team about
adding --build-id-salt to let ld do the calculation debugedit is doing. There
was a counter proposal made to add in the salt while building. The
easiest proposal was to add an item in the linker script vs. linking in
an object since we need the salt to go in every module as well as the
kernel and vmlinux.
""

v5 uses the approach suggested by Masahiro Yamada which uses the
existing ELF note macro to more easily add the salt (vs previous
approaches which tried to adjust via linker section).

If arch maintainers are okay, I'd like acks for this so this can go
through the kbuild tree.

Thanks,
Laura

Laura Abbott (4):
  kbuild: Add build salt to the kernel and modules
  x86: Add build salt to the vDSO
  powerpc: Add build salt to the vDSO
  arm64: Add build salt to the vDSO

 arch/arm64/kernel/vdso/note.S     |  3 +++
 arch/powerpc/kernel/vdso32/note.S |  3 +++
 arch/x86/entry/vdso/vdso-note.S   |  3 +++
 arch/x86/entry/vdso/vdso32/note.S |  3 +++
 include/linux/build-salt.h        | 20 ++++++++++++++++++++
 init/Kconfig                      |  9 +++++++++
 init/version.c                    |  3 +++
 scripts/mod/modpost.c             |  3 +++
 8 files changed, 47 insertions(+)
 create mode 100644 include/linux/build-salt.h

-- 
2.17.1

^ permalink raw reply

* Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices
From: Pavel Tatashin @ 2018-07-03 17:03 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: kernelfans, LKML, gregkh, rafael.j.wysocki, grygorii.strashko,
	hch, helgaas, dyoung, linux-pci, linuxppc-dev
In-Reply-To: <CAHp75VfEVT7cMOQD71dBkwzuQhZQ8YGik+TWSnf9=pkiUxJO6A@mail.gmail.com>

Thank you Andy for the heads up. I might need to rebase my work
(http://lkml.kernel.org/r/20180629182541.6735-1-pasha.tatashin@oracle.com)
based on this change. But, it is possible it is going to be harder to
parallelize based on device tree. I will need to think about it.

Pavel

On Tue, Jul 3, 2018 at 6:59 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> I think Pavel would be interested to see this as well (he is doing
> some parallel device shutdown stuff)
>
> On Tue, Jul 3, 2018 at 9:50 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > places an assumption of supplier<-consumer order on the process of probe.
> > But it turns out to break down the parent <- child order in some scene.
> > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > have been probed. Then comes the bridge's module, which enables extra
> > feature(such as hotplug) on this bridge. This will break the
> > parent<-children order and cause failure when "kexec -e" in some scenario.
> >
> > The detailed description of the scenario:
> > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > to some issue. For this case, the bridge is moved after its children in
> > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > write back buffer in flight due to the former shutdown of the bridge which
> > clears the BusMaster bit.
> >
> > It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> > order on devices_kset. Take the following scene:
> > step0: before a consumer's probing, (note child_a is supplier of consumer_a)
> >   [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X
> >                                          ^^^^^^^^^^ affected range ^^^^^^^^^^
> > step1: when probing, moving consumer-X after supplier-X
> >   [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X
> > step2: the children of consumer-X should be re-ordered to maintain the seq
> >   [... consumer_a, ..., consumer_z, ....] supplier-X  [consumer-X, child_a, ...., child_z]
> > step3: the consumer_a should be re-ordered to maintain the seq
> >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z]
> >
> > It requires two nested recursion to drain out all out-of-order item in
> > "affected range". To avoid such complicated code, this patch suggests
> > to utilize the info in device tree, instead of using the order of
> > devices_kset during shutdown. It iterates the device tree, and firstly
> > shutdown a device's children and consumers. After this patch, the buggy
> > commit is hollow and left to clean.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: linux-pci@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > ---
> >  drivers/base/core.c    | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> >  include/linux/device.h |  1 +
> >  2 files changed, 44 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index a48868f..684b994 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> >         INIT_LIST_HEAD(&dev->links.consumers);
> >         INIT_LIST_HEAD(&dev->links.suppliers);
> >         dev->links.status = DL_DEV_NO_DRIVER;
> > +       dev->shutdown = false;
> >  }
> >  EXPORT_SYMBOL_GPL(device_initialize);
> >
> > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> >          * lock is to be held
> >          */
> >         parent = get_device(dev->parent);
> > -       get_device(dev);
> >         /*
> >          * Make sure the device is off the kset list, in the
> >          * event that dev->*->shutdown() doesn't remove it.
> > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
> >                         dev_info(dev, "shutdown\n");
> >                 dev->driver->shutdown(dev);
> >         }
> > -
> > +       dev->shutdown = true;
> >         device_unlock(dev);
> >         if (parent)
> >                 device_unlock(parent);
> >
> > -       put_device(dev);
> >         put_device(parent);
> >         spin_lock(&devices_kset->list_lock);
> >  }
> >
> > +/* shutdown dev's children and consumer firstly, then itself */
> > +static int device_for_each_child_shutdown(struct device *dev)
> > +{
> > +       struct klist_iter i;
> > +       struct device *child;
> > +       struct device_link *link;
> > +
> > +       /* already shutdown, then skip this sub tree */
> > +       if (dev->shutdown)
> > +               return 0;
> > +
> > +       if (!dev->p)
> > +               goto check_consumers;
> > +
> > +       /* there is breakage of lock in __device_shutdown(), and the redundant
> > +        * ref++ on srcu protected consumer is harmless since shutdown is not
> > +        * hot path.
> > +        */
> > +       get_device(dev);
> > +
> > +       klist_iter_init(&dev->p->klist_children, &i);
> > +       while ((child = next_device(&i)))
> > +               device_for_each_child_shutdown(child);
> > +       klist_iter_exit(&i);
> > +
> > +check_consumers:
> > +       list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
> > +               if (!link->consumer->shutdown)
> > +                       device_for_each_child_shutdown(link->consumer);
> > +       }
> > +
> > +       __device_shutdown(dev);
> > +       put_device(dev);
> > +       return 0;
> > +}
> > +
> >  /**
> >   * device_shutdown - call ->shutdown() on each device to shutdown.
> >   */
> >  void device_shutdown(void)
> >  {
> >         struct device *dev;
> > +       int idx;
> >
> > +       idx = device_links_read_lock();
> >         spin_lock(&devices_kset->list_lock);
> >         /*
> >          * Walk the devices list backward, shutting down each in turn.
> > @@ -2866,11 +2903,12 @@ void device_shutdown(void)
> >          * devices offline, even as the system is shutting down.
> >          */
> >         while (!list_empty(&devices_kset->list)) {
> > -               dev = list_entry(devices_kset->list.prev, struct device,
> > +               dev = list_entry(devices_kset->list.next, struct device,
> >                                 kobj.entry);
> > -               __device_shutdown(dev);
> > +               device_for_each_child_shutdown(dev);
> >         }
> >         spin_unlock(&devices_kset->list_lock);
> > +       device_links_read_unlock(idx);
> >  }
> >
> >  /*
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 055a69d..8a0f784 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -1003,6 +1003,7 @@ struct device {
> >         bool                    offline:1;
> >         bool                    of_node_reused:1;
> >         bool                    dma_32bit_limit:1;
> > +       bool                    shutdown:1; /* one direction: false->true */
> >  };
> >
> >  static inline struct device *kobj_to_dev(struct kobject *kobj)
> > --
> > 2.7.4
> >
>
>
>
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-07-03 20:57 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <20180703145503.GA1225@MiWiFi-R3L-srv>

On Tue, Jul 3, 2018 at 5:55 PM, Baoquan He <bhe@redhat.com> wrote:
> On 06/12/18 at 05:24pm, Andy Shevchenko wrote:
>> On Tue, Jun 12, 2018 at 5:20 PM, Andy Shevchenko
>> <andy.shevchenko@gmail.com> wrote:

>> > I briefly looked at the code and error codes we have, so, my proposal
>> > is one of the following
>>
>> >  - use -ECANCELED (not the best choice for first occurrence here,
>> > though I can't find better)
>>
>> Actually -ENOTSUPP might suit the first case (although the actual
>> would be something like -EOVERLAP, which we don't have)
>
> Sorry for late reply, and many thanks for your great suggestion.
>

> I am fine to use -ENOTSUPP as the first returned value, and -ECANCELED
> for the 2nd one.

I have no strong opinion, but I like (slightly better) this approach ^^^

> Or define an enum as you suggested inside the function
> or in header file.

>
> Or use -EBUSY for the first case because existing resource is
> overlapping but not fully contained by 'res'; and -EINVAL for
> the 2nd case since didn't find any one resources which is contained by
> 'res', means we passed in a invalid resource.
>
> All is fine to me, I can repost with each of them.

>> >  - use positive integers (or enum), like
>> >   #define RES_REPARENTED 0
>> >   #define RES_OVERLAPPED 1
>> >   #define RES_NOCONFLICT 2

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: GCC strcmp optimizations causing valgrind uninitialized conditional jumps
From: Segher Boessenkool @ 2018-07-03 19:18 UTC (permalink / raw)
  To: William Kennington; +Cc: linuxppc-dev
In-Reply-To: <CAPnigK=HP_F6a3EuM-G8XHH1G_SyLCscT72=yVwtFxDmBZ6WOQ@mail.gmail.com>

On Tue, Jul 03, 2018 at 11:59:14AM -0700, William Kennington wrote:
> Is there a bug tracking the issue?

https://bugs.kde.org/show_bug.cgi?id=386945

> Also, unless your malloc is
> guaranteed to be zeroing out the data or have a strcmp that is writing
> doubleworld aligned data to the string, the strcmp implementation is
> branching based on data existing after the null terminating character
> that may be uninitialized. Both sides of the branch do the right thing
> though, and locate the null terminator, throwing away the calculations
> done on the uninitialized data.

Yes, there is one branch that depends in part on irrelevant data, but
that is handled immediately afterwards.

> -fno-builtin-strcmp or -mstring-compare-inline-limit=0 do work fine
> but we don't control the binaries we are linking against in all cases
> and are seeing the issue pop up there.

Yeah, nasty.  I don't know what to do then (other than fix valgrind, which
isn't so easy either though!)


Segher

^ permalink raw reply

* Re: GCC strcmp optimizations causing valgrind uninitialized conditional jumps
From: William Kennington @ 2018-07-03 18:59 UTC (permalink / raw)
  To: segher; +Cc: linuxppc-dev
In-Reply-To: <20180703184617.GF16221@gate.crashing.org>

Is there a bug tracking the issue? Also, unless your malloc is
guaranteed to be zeroing out the data or have a strcmp that is writing
doubleworld aligned data to the string, the strcmp implementation is
branching based on data existing after the null terminating character
that may be uninitialized. Both sides of the branch do the right thing
though, and locate the null terminator, throwing away the calculations
done on the uninitialized data.

-fno-builtin-strcmp or -mstring-compare-inline-limit=0 do work fine
but we don't control the binaries we are linking against in all cases
and are seeing the issue pop up there.
On Tue, Jul 3, 2018 at 11:46 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Tue, Jul 03, 2018 at 11:26:55AM -0700, William Kennington wrote:
> > I've noticed while trying to do some valgrind testing on code linked
> > against system libraries that have inlined strcmps that valgrind is
> > unhappy about branches depending on uninitialized memory. I've read
>
> The branches here do *not* depend on uninitialised memory.  Valgrind
> does not realise that however.  The valgrind people are aware of this
> problem.
>
> > Any ideas on how to workaround / fix this?
>
> Does -fno-builtin-strcmp do the trick?
>
>
> Segher

^ permalink raw reply

* Re: GCC strcmp optimizations causing valgrind uninitialized conditional jumps
From: Segher Boessenkool @ 2018-07-03 18:46 UTC (permalink / raw)
  To: William Kennington; +Cc: linuxppc-dev
In-Reply-To: <CAPnigKnVTg5UiEfou=ABL2DH31rFyGZ_1qtqevJqPPw-V-E6-A@mail.gmail.com>

Hi!

On Tue, Jul 03, 2018 at 11:26:55AM -0700, William Kennington wrote:
> I've noticed while trying to do some valgrind testing on code linked
> against system libraries that have inlined strcmps that valgrind is
> unhappy about branches depending on uninitialized memory. I've read

The branches here do *not* depend on uninitialised memory.  Valgrind
does not realise that however.  The valgrind people are aware of this
problem.

> Any ideas on how to workaround / fix this?

Does -fno-builtin-strcmp do the trick?


Segher

^ permalink raw reply

* GCC strcmp optimizations causing valgrind uninitialized conditional jumps
From: William Kennington @ 2018-07-03 18:26 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

I've noticed while trying to do some valgrind testing on code linked
against system libraries that have inlined strcmps that valgrind is
unhappy about branches depending on uninitialized memory. I've read
through the generated inline assembly, and it looks like the assembly
will always produce the correct result regardless of the intermediate
compare using uninitialized data. I'm trying to figure out what to do
as next steps since it's non-trivial to suppress this inlined code.

Using the following simple test case which generates the optmizations
using gcc 7.3.0 and gcc 9 devel. This is reproducible using `gcc -O2
-o test test.c` with the following code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
        char *heap_str = malloc(16);
        strcpy(heap_str, "RandString");
        int res = strcmp("RandString", heap_str) == 0;
        free(heap_str);
        return res;
}

A snippet of the inline assembly causing issues is, valgrind correctly
identifies a branch taken based on uninitialized memory for the first
`bne` in `.L7` since it's reading bytes 8-16 from the 11 byte string:
 # test.c:9:    int res = strcmp("RandString", heap_str) == 0;
        .loc 1 9 12 view .LVU20
        li 8,0           # tmp172,
        cmpb 3,9,10      # tmp169, tmp133, tmp134
        cmpb 8,9,8       # tmp170, tmp133, tmp172
        orc 3,8,3        #, tmp169, tmp170, tmp169
        cntlzd 3,3       # tmp171, tmp169
        addi 3,3,8       # tmp171, tmp171,
        rldcl 9,9,3,56   # tmp171, tmp174, tmp133,
        rldcl 3,10,3,56  # tmp171, tmp176, tmp134,
        subf 3,3,9       # tmp132, tmp176, tmp174
        b .L2            #
.L7:
        addi 9,5,8       # tmp190, tmp126,
        addi 10,30,8     # tmp191, heap_str,
        ldbrx 9,0,9      # tmp133, MEM[(void *)"RandString"]
        ldbrx 10,0,10    # tmp134, MEM[(void *)heap_str_5]
        subf. 3,10,9     # tmp132, tmp134, tmp133
        bne 0,.L4        #
        cmpb 10,9,3      # tmp140, tmp133, tmp132
        cmpdi 7,10,0     #, tmp141, tmp140
        bne 7,.L2

Sample valgrind output looks like:
==63162== Memcheck, a memory error detector
==63162== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==63162== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==63162== Command: ./test
==63162==
==63162== Conditional jump or move depends on uninitialised value(s)
==63162==    at 0x100005D4: main (test.c:9)
==63162==  Uninitialised value was created by a heap allocation
==63162==    at 0x4093F00: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==63162==    by 0x10000513: main (test.c:7)
==63162==
==63162== Conditional jump or move depends on uninitialised value(s)
==63162==    at 0x100005E0: main (test.c:9)
==63162==  Uninitialised value was created by a heap allocation
==63162==    at 0x4093F00: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==63162==    by 0x10000513: main (test.c:7)
==63162==
==63162== Syscall param exit_group(status) contains uninitialised byte(s)
==63162==    at 0x41E8C2C: _Exit (_exit.c:31)
==63162==    by 0x4132C67: __run_exit_handlers (exit.c:132)
==63162==    by 0x4104423: generic_start_main.isra.0 (libc-start.c:344)
==63162==    by 0x4104617: (below main) (libc-start.c:116)
==63162==  Uninitialised value was created by a heap allocation
==63162==    at 0x4093F00: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==63162==    by 0x10000513: main (test.c:7)
==63162==
==63162==
==63162== HEAP SUMMARY:
==63162==     in use at exit: 0 bytes in 0 blocks
==63162==   total heap usage: 1 allocs, 1 frees, 16 bytes allocated
==63162==
==63162== All heap blocks were freed -- no leaks are possible
==63162==
==63162== For counts of detected and suppressed errors, rerun with: -v
==63162== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

Any ideas on how to workaround / fix this?

Best,
William

^ permalink raw reply

* Re: [v2 PATCH 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores
From: Murilo Opsfelder Araujo @ 2018-07-03 17:53 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin, linuxppc-dev,
	linux-kernel
In-Reply-To: <573a559dff87da1d68a55bf6ada97b7697102909.1530609795.git.ego@linux.vnet.ibm.com>

On Tue, Jul 03, 2018 at 04:33:51PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> A pair of IBM POWER9 SMT4 cores can be fused together to form a big-core
> with 8 SMT threads. This can be discovered via the "ibm,thread-groups"
> CPU property in the device tree which will indicate which group of
> threads that share the L1 cache, translation cache and instruction data
> flow. If there are multiple such group of threads, then the core is a
> big-core.
> 
> Furthermore, if the thread-ids of the threads of the big-core can be
> obtained by interleaving the thread-ids of the thread-groups
> (component small core), then such a big-core is called an interleaved
> big-core.
> 
> Eg: Threads in the pair of component SMT4 cores of an interleaved
> big-core are numbered {0,2,4,6} and {1,3,5,7} respectively.
> 
> The SMT4 cores forming a big-core are more or less independent
> units. Thus when multiple tasks are scheduled to run on the fused
> core, we get the best performance when the tasks are spread across the
> pair of SMT4 cores.
> 
> This patch enables CPU_FTR_ASYM_SMT bit in the cpu-features on
> detecting the presence of interleaved big-cores at boot up. This will
> will bias the load-balancing of tasks on smaller numbered threads,
> which will automatically result in spreading the tasks uniformly
> across the associated pair of SMT4 cores.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/setup-common.c | 67 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index a78ec66..f63d797 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -537,6 +537,56 @@ int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
>  	return -1;
>  }
>  
> +/*
> + * check_interleaved_big_core - Checks if the thread group tg
> + * corresponds to a big-core whose threads are interleavings of the
> + * threads of the component small cores.
> + *
> + * @tg: A thread-group struct for the core.
> + *
> + * Returns true if the core is a interleaved big-core.
> + * Returns false otherwise.
> + */
> +static inline bool check_interleaved_big_core(struct thread_groups *tg)
> +{
> +	int nr_groups;
> +	int threads_per_group;
> +	int cur_cpu, next_cpu, i, j;
> +
> +	nr_groups = tg->nr_groups;
> +	threads_per_group = tg->threads_per_group;
> +
> +	if (tg->property != 1)
> +		return false;
> +
> +	if (nr_groups < 2 || threads_per_group < 2)
> +		return false;
> +
> +	/*
> +	 * In case of an interleaved big-core, the thread-ids of the
> +	 * big-core can be obtained by interleaving the the thread-ids
> +	 * of the component small
> +	 *
> +	 * Eg: On a 8-thread big-core with two SMT4 small cores, the
> +	 * threads of the two component small cores will be
> +	 * {0, 2, 4, 6} and {1, 3, 5, 7}.
> +	 */
> +	for (i = 0; i < nr_groups; i++) {
> +		int group_start = i * threads_per_group;
> +
> +		for (j = 0; j < threads_per_group - 1; j++) {
> +			int cur_idx = group_start + j;
> +
> +			cur_cpu = tg->thread_list[cur_idx];
> +			next_cpu = tg->thread_list[cur_idx + 1];
> +			if (next_cpu != cur_cpu + nr_groups)
> +				return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
>   *                  cpu_possible_mask
> @@ -560,6 +610,7 @@ void __init smp_setup_cpu_maps(void)
>  	struct device_node *dn;
>  	int cpu = 0;
>  	int nthreads = 1;
> +	bool has_interleaved_big_core = true;
>  
>  	DBG("smp_setup_cpu_maps()\n");
>  
> @@ -613,6 +664,12 @@ void __init smp_setup_cpu_maps(void)
>  		if (parse_thread_groups(dn, &tg) ||
>  		    tg.nr_groups < 1 || tg.property != 1) {
>  			has_big_cores = false;
> +			has_interleaved_big_core = false;
> +		}
> +
> +		if (has_interleaved_big_core) {
> +			has_interleaved_big_core =
> +				check_interleaved_big_core(&tg);
>  		}
>  
>  		if (cpu >= nr_cpu_ids) {
> @@ -669,7 +726,15 @@ void __init smp_setup_cpu_maps(void)
>  	vdso_data->processorCount = num_present_cpus();
>  #endif /* CONFIG_PPC64 */
>  
> -        /* Initialize CPU <=> thread mapping/
> +	if (has_interleaved_big_core) {
> +		int key = __builtin_ctzl(CPU_FTR_ASYM_SMT);
> +
> +		cur_cpu_spec->cpu_features |= CPU_FTR_ASYM_SMT;
> +		static_branch_enable(&cpu_feature_keys[key]);
> +		pr_info("Detected interleaved big-cores\n");
> +	}

Shouldn't we use cpu_has_feature(CPU_FTR_ASYM_SMT) before setting it?

> +
> +	/* Initialize CPU <=> thread mapping/
>  	 *
>  	 * WARNING: We assume that the number of threads is the same for
>  	 * every CPU in the system. If that is not the case, then some code
> -- 
> 1.9.4
> 

-- 
Murilo

^ permalink raw reply

* Re: [v2 PATCH 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"
From: Murilo Opsfelder Araujo @ 2018-07-03 17:16 UTC (permalink / raw)
  To: Gautham R. Shenoy
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Michael Neuling,
	Vaidyanathan Srinivasan, Akshay Adiga, Shilpasri G Bhat,
	Oliver O'Halloran, Nicholas Piggin, linuxppc-dev,
	linux-kernel
In-Reply-To: <b922aa0ea79a83ad733508c861900a08c4fee25a.1530609795.git.ego@linux.vnet.ibm.com>

On Tue, Jul 03, 2018 at 04:33:50PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On IBM POWER9, the device tree exposes a property array identifed by
> "ibm,thread-groups" which will indicate which groups of threads share a
> particular set of resources.
> 
> As of today we only have one form of grouping identifying the group of
> threads in the core that share the L1 cache, translation cache and
> instruction data flow.
> 
> This patch defines the helper function to parse the contents of
> "ibm,thread-groups" and a new structure to contain the parsed output.
> 
> The patch also creates the sysfs file named "small_core_siblings" that
> returns the physical ids of the threads in the core that share the L1
> cache, translation cache and instruction data flow.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   8 ++
>  arch/powerpc/include/asm/cputhreads.h              |  22 +++++
>  arch/powerpc/kernel/setup-common.c                 | 110 +++++++++++++++++++++
>  arch/powerpc/kernel/sysfs.c                        |  35 +++++++
>  4 files changed, 175 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 9c5e7732..53a823a 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -487,3 +487,11 @@ Description:	Information about CPU vulnerabilities
>  		"Not affected"	  CPU is not affected by the vulnerability
>  		"Vulnerable"	  CPU is affected and no mitigation in effect
>  		"Mitigation: $M"  CPU is affected and mitigation $M is in effect
> +
> +What: 		/sys/devices/system/cpu/cpu[0-9]+/small_core_sibings

s/small_core_sibings/small_core_siblings

By the way, big_core_siblings was mentioned in the introductory email.

> +Date:		03-Jul-2018
> +KernelVersion:	v4.18.0
> +Contact:	Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> +Description:	List of Physical ids of CPUs which share the the L1 cache,
> +		translation cache and instruction data-flow with this CPU.
> +Values:		Comma separated list of decimal integers.
> diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
> index d71a909..33226d7 100644
> --- a/arch/powerpc/include/asm/cputhreads.h
> +++ b/arch/powerpc/include/asm/cputhreads.h
> @@ -23,11 +23,13 @@
>  extern int threads_per_core;
>  extern int threads_per_subcore;
>  extern int threads_shift;
> +extern bool has_big_cores;
>  extern cpumask_t threads_core_mask;
>  #else
>  #define threads_per_core	1
>  #define threads_per_subcore	1
>  #define threads_shift		0
> +#define has_big_cores		0
>  #define threads_core_mask	(*get_cpu_mask(0))
>  #endif
>  
> @@ -69,12 +71,32 @@ static inline cpumask_t cpu_online_cores_map(void)
>  	return cpu_thread_mask_to_cores(cpu_online_mask);
>  }
>  
> +#define MAX_THREAD_LIST_SIZE	8
> +struct thread_groups {
> +	unsigned int property;
> +	unsigned int nr_groups;
> +	unsigned int threads_per_group;
> +	unsigned int thread_list[MAX_THREAD_LIST_SIZE];
> +};
> +
>  #ifdef CONFIG_SMP
>  int cpu_core_index_of_thread(int cpu);
>  int cpu_first_thread_of_core(int core);
> +int parse_thread_groups(struct device_node *dn, struct thread_groups *tg);
> +int get_cpu_thread_group_start(int cpu, struct thread_groups *tg);
>  #else
>  static inline int cpu_core_index_of_thread(int cpu) { return cpu; }
>  static inline int cpu_first_thread_of_core(int core) { return core; }
> +static inline int parse_thread_groups(struct device_node *dn,
> +				      struct thread_groups *tg)
> +{
> +	return -ENODATA;
> +}
> +
> +static inline int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> +{
> +	return -1;
> +}
>  #endif
>  
>  static inline int cpu_thread_in_core(int cpu)
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 40b44bb..a78ec66 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -402,10 +402,12 @@ void __init check_for_initrd(void)
>  #ifdef CONFIG_SMP
>  
>  int threads_per_core, threads_per_subcore, threads_shift;
> +bool has_big_cores = true;
>  cpumask_t threads_core_mask;
>  EXPORT_SYMBOL_GPL(threads_per_core);
>  EXPORT_SYMBOL_GPL(threads_per_subcore);
>  EXPORT_SYMBOL_GPL(threads_shift);
> +EXPORT_SYMBOL_GPL(has_big_cores);
>  EXPORT_SYMBOL_GPL(threads_core_mask);
>  
>  static void __init cpu_init_thread_core_maps(int tpc)
> @@ -433,6 +435,108 @@ static void __init cpu_init_thread_core_maps(int tpc)
>  
>  u32 *cpu_to_phys_id = NULL;
>  
> +/*
> + * parse_thread_groups: Parses the "ibm,thread-groups" device tree
> + *                      property for the CPU device node dn and stores
> + *                      the parsed output in the thread_groups
> + *                      structure tg.

Perhaps document the arguments of this function, as done in the second
patch?

> + *
> + * ibm,thread-groups[0..N-1] array defines which group of threads in
> + * the CPU-device node can be grouped together based on the property.
> + *
> + * ibm,thread-groups[0] tells us the property based on which the
> + * threads are being grouped together. If this value is 1, it implies
> + * that the threads in the same group share L1, translation cache.
> + *
> + * ibm,thread-groups[1] tells us how many such thread groups exist.
> + *
> + * ibm,thread-groups[2] tells us the number of threads in each such
> + * group.
> + *
> + * ibm,thread-groups[3..N-1] is the list of threads identified by
> + * "ibm,ppc-interrupt-server#s" arranged as per their membership in
> + * the grouping.
> + *
> + * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
> + * implies that there are 2 groups of 4 threads each, where each group
> + * of threads share L1, translation cache.
> + *
> + * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
> + * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
> + * 11, 12} structure
> + *
> + * Returns 0 on success, -EINVAL if the property does not exist,
> + * -ENODATA if property does not have a value, and -EOVERFLOW if the
> + * property data isn't large enough.
> + */
> +int parse_thread_groups(struct device_node *dn,
> +			struct thread_groups *tg)
> +{
> +	unsigned int nr_groups, threads_per_group, property;
> +	int i;
> +	u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> +	u32 *thread_list;
> +	size_t total_threads;
> +	int ret;
> +
> +	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> +					 thread_group_array, 3);
> +
> +	if (ret)
> +		return ret;
> +
> +	property = thread_group_array[0];
> +	nr_groups = thread_group_array[1];
> +	threads_per_group = thread_group_array[2];
> +	total_threads = nr_groups * threads_per_group;
> +
> +	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> +					 thread_group_array,
> +					 3 + total_threads);
> +	if (ret)
> +		return ret;
> +
> +	thread_list = &thread_group_array[3];
> +
> +	for (i = 0 ; i < total_threads; i++)
> +		tg->thread_list[i] = thread_list[i];
> +
> +	tg->property = property;
> +	tg->nr_groups = nr_groups;
> +	tg->threads_per_group = threads_per_group;
> +
> +	return 0;
> +}
> +
> +/*
> + * get_cpu_thread_group_start : Searches the thread group in tg->thread_list
> + *                              that @cpu belongs to.

Same here.

> + *
> + * Returns the index to tg->thread_list that points to the the start
> + * of the thread_group that @cpu belongs to.
> + *
> + * Returns -1 if cpu doesn't belong to any of the groups pointed
> + * to by tg->thread_list.
> + */
> +int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> +{
> +	int hw_cpu_id = get_hard_smp_processor_id(cpu);
> +	int i, j;
> +
> +	for (i = 0; i < tg->nr_groups; i++) {
> +		int group_start = i * tg->threads_per_group;
> +
> +		for (j = 0; j < tg->threads_per_group; j++) {
> +			int idx = group_start + j;
> +
> +			if (tg->thread_list[idx] == hw_cpu_id)
> +				return group_start;
> +		}
> +	}
> +
> +	return -1;
> +}
> +
>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
>   *                  cpu_possible_mask
> @@ -467,6 +571,7 @@ void __init smp_setup_cpu_maps(void)
>  		const __be32 *intserv;
>  		__be32 cpu_be;
>  		int j, len;
> +		struct thread_groups tg = {.nr_groups = 0};

We assume has_big_cores = true but here we initialize .nr_groups
otherwise.  It's kind of contradictory.

What if has_big_cores is assumed false and members of tg are initialized
with zeroes?

>  
>  		DBG("  * %pOF...\n", dn);
>  
> @@ -505,6 +610,11 @@ void __init smp_setup_cpu_maps(void)
>  			cpu++;
>  		}
>  
> +		if (parse_thread_groups(dn, &tg) ||
> +		    tg.nr_groups < 1 || tg.property != 1) {
> +			has_big_cores = false;
> +		}
> +

parse_thread_groups() returns before setting tg.property if property
doesn't exist.  Are we confident that tg.property won't contain any
garbage that could lead to a false positive here?  Shouldn't we also
initialize .property when declaring tg?

What if this logic is encapsulated in a function?  For example:

    has_big_cores = dt_has_big_cores(dn, &tg);

>  		if (cpu >= nr_cpu_ids) {
>  			of_node_put(dn);
>  			break;
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 755dc98..f5717de 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -18,6 +18,7 @@
>  #include <asm/smp.h>
>  #include <asm/pmc.h>
>  #include <asm/firmware.h>
> +#include <asm/cputhreads.h>
>  
>  #include "cacheinfo.h"
>  #include "setup.h"
> @@ -1025,6 +1026,33 @@ static ssize_t show_physical_id(struct device *dev,
>  }
>  static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
>  
> +static ssize_t show_small_core_siblings(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	struct cpu *cpu = container_of(dev, struct cpu, dev);
> +	struct device_node *dn = of_get_cpu_node(cpu->dev.id, NULL);
> +	struct thread_groups tg;
> +	int i, j;
> +	ssize_t ret = 0;
> +
> +	if (parse_thread_groups(dn, &tg))
> +		return -ENODATA;
> +
> +	i = get_cpu_thread_group_start(cpu->dev.id, &tg);
> +
> +	if (i == -1)
> +		return -ENODATA;
> +
> +	for (j = 0; j < tg.threads_per_group - 1; j++)
> +		ret += sprintf(buf + ret, "%d,", tg.thread_list[i + j]);
> +
> +	ret += sprintf(buf + ret, "%d\n", tg.thread_list[i + j]);
> +
> +	return ret;
> +}
> +static DEVICE_ATTR(small_core_siblings, 0444, show_small_core_siblings, NULL);
> +
>  static int __init topology_init(void)
>  {
>  	int cpu, r;
> @@ -1048,6 +1076,13 @@ static int __init topology_init(void)
>  			register_cpu(c, cpu);
>  
>  			device_create_file(&c->dev, &dev_attr_physical_id);
> +
> +			if (has_big_cores) {
> +				const struct device_attribute *attr =
> +				       &dev_attr_small_core_siblings;
> +
> +			       device_create_file(&c->dev, attr);
> +			}
>  		}
>  	}
>  	r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
> -- 
> 1.9.4
> 

Cheers
Murilo

^ permalink raw reply

* Re: Oops in kmem_cache_free() via bioset_exit() (was Re: [next-20180601][nvme][ppc] Kernel Oops is triggered when creating lvm snapshots on nvme disks)
From: Abdul Haleem @ 2018-07-03 16:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-fsdevel, linux-next, linux-kernel, linux-scsi,
	Stephen Rothwell, sachinp, sim, manvanth, Brian King, linux-block,
	Kent Overstreet, Jens Axboe
In-Reply-To: <87a7rf55vy.fsf@concordia.ellerman.id.au>

On Fri, 2018-06-29 at 00:42 +1000, Michael Ellerman wrote:
> Kent, Jens,
> 
> This looks like it might be related to the recent bioset changes?
> 
> cheers
> 
> Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
> > On Tue, 2018-06-26 at 23:36 +1000, Michael Ellerman wrote:
> >> Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
> ...
> > I was able to reproduce again with slub_debug=FZP and DEBUG_INFO enabled
> > on 4.17.0-rc7-next-20180601, but not much traces other than the Oops stack trace
> 
> Are you still testing on that revision? It's nearly a month old.
> 
> Please try to reproduce on mainline or today's linux-next.

Problem is not reproducible on 4.18.0-rc3 mainline and today's next
kernel

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

^ permalink raw reply

* RE: [PATCH] QE GPIO: Add qe_gpio_set_multiple
From: Leo Li @ 2018-07-03 16:35 UTC (permalink / raw)
  To: York Sun, jocke@infinera.com
  Cc: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au, Qiang Zhao
In-Reply-To: <VI1PR04MB2078C55EBFAEED9BC3FCE1989A420@VI1PR04MB2078.eurprd04.prod.outlook.com>

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogWW9yayBTdW4NCj4gU2Vu
dDogVHVlc2RheSwgSnVseSAzLCAyMDE4IDEwOjI3IEFNDQo+IFRvOiBqb2NrZUBpbmZpbmVyYS5j
b20gPGpvYWtpbS50amVybmx1bmRAaW5maW5lcmEuY29tPjsgTGVvIExpDQo+IDxsZW95YW5nLmxp
QG54cC5jb20+DQo+IENjOiBsaW51eHBwYy1kZXZAbGlzdHMub3psYWJzLm9yZzsgbXBlQGVsbGVy
bWFuLmlkLmF1OyBRaWFuZyBaaGFvDQo+IDxxaWFuZy56aGFvQG54cC5jb20+DQo+IFN1YmplY3Q6
IFJlOiBbUEFUQ0hdIFFFIEdQSU86IEFkZCBxZV9ncGlvX3NldF9tdWx0aXBsZQ0KPiANCj4gK0xl
bw0KPiANCj4gT24gMDcvMDMvMjAxOCAwMzozMCBBTSwgSm9ha2ltIFRqZXJubHVuZCB3cm90ZToN
Cj4gPiBPbiBUdWUsIDIwMTgtMDYtMjYgYXQgMjM6NDEgKzEwMDAsIE1pY2hhZWwgRWxsZXJtYW4g
d3JvdGU6DQo+ID4+DQo+ID4+IEpvYWtpbSBUamVybmx1bmQgPEpvYWtpbS5UamVybmx1bmRAaW5m
aW5lcmEuY29tPiB3cml0ZXM6DQo+ID4+PiBPbiBUaHUsIDIwMTgtMDYtMjEgYXQgMDI6MzggKzAw
MDAsIFFpYW5nIFpoYW8gd3JvdGU6DQo+ID4+Pj4gT24gMDYvMTkvMjAxOCAwOToyMiBBTSwgSm9h
a2ltIFRqZXJubHVuZCB3cm90ZToNCj4gPj4+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0K
PiA+Pj4+IEZyb206IExpbnV4cHBjLWRldiBbbWFpbHRvOmxpbnV4cHBjLWRldi0NCj4gYm91bmNl
cytxaWFuZy56aGFvPW54cC5jb21AbGlzdHMub3psYWJzLm9yZ10gT24gQmVoYWxmIE9mIEpvYWtp
bQ0KPiBUamVybmx1bmQNCj4gPj4+PiBTZW50OiAyMDE4xOo21MIyMMjVIDA6MjINCj4gPj4+PiBU
bzogWW9yayBTdW4gPHlvcmsuc3VuQG54cC5jb20+OyBsaW51eHBwYy1kZXYgPGxpbnV4cHBjLQ0K
PiBkZXZAbGlzdHMub3psYWJzLm9yZz4NCj4gPj4+PiBTdWJqZWN0OiBbUEFUQ0hdIFFFIEdQSU86
IEFkZCBxZV9ncGlvX3NldF9tdWx0aXBsZQ0KPiA+Pj4+DQo+ID4+Pj4gVGhpcyBjb3VzaW4gdG8g
Z3Bpby1tcGM4eHh4IHdhcyBsYWNraW5nIGEgbXVsdGlwbGUgcGlucyBtZXRob2QsIGFkZA0KPiBv
bmUuDQo+ID4+Pj4NCj4gPj4+PiBTaWduZWQtb2ZmLWJ5OiBKb2FraW0gVGplcm5sdW5kIDxqb2Fr
aW0udGplcm5sdW5kQGluZmluZXJhLmNvbT4NCj4gPj4+PiAtLS0NCj4gPj4+PiAgZHJpdmVycy9z
b2MvZnNsL3FlL2dwaW8uYyB8IDI4ICsrKysrKysrKysrKysrKysrKysrKysrKysrKysNCj4gPj4+
PiAgMSBmaWxlIGNoYW5nZWQsIDI4IGluc2VydGlvbnMoKykNCj4gPj4NCj4gPj4gLi4uDQo+ID4+
Pj4gIHN0YXRpYyBpbnQgcWVfZ3Bpb19kaXJfaW4oc3RydWN0IGdwaW9fY2hpcCAqZ2MsIHVuc2ln
bmVkIGludCBncGlvKSAgew0KPiA+Pj4+ICAgICAgICAgc3RydWN0IG9mX21tX2dwaW9fY2hpcCAq
bW1fZ2MgPSB0b19vZl9tbV9ncGlvX2NoaXAoZ2MpOyBAQCAtDQo+IDI5OCw2ICszMjUsNyBAQCBz
dGF0aWMgaW50IF9faW5pdCBxZV9hZGRfZ3Bpb2NoaXBzKHZvaWQpDQo+ID4+Pj4gICAgICAgICAg
ICAgICAgIGdjLT5kaXJlY3Rpb25fb3V0cHV0ID0gcWVfZ3Bpb19kaXJfb3V0Ow0KPiA+Pj4+ICAg
ICAgICAgICAgICAgICBnYy0+Z2V0ID0gcWVfZ3Bpb19nZXQ7DQo+ID4+Pj4gICAgICAgICAgICAg
ICAgIGdjLT5zZXQgPSBxZV9ncGlvX3NldDsNCj4gPj4+PiArICAgICAgICAgICAgICAgZ2MtPnNl
dF9tdWx0aXBsZSA9IHFlX2dwaW9fc2V0X211bHRpcGxlOw0KPiA+Pj4+DQo+ID4+Pj4gICAgICAg
ICAgICAgICAgIHJldCA9IG9mX21tX2dwaW9jaGlwX2FkZF9kYXRhKG5wLCBtbV9nYywgcWVfZ2Mp
Ow0KPiA+Pj4+ICAgICAgICAgICAgICAgICBpZiAocmV0KQ0KPiA+Pj4+DQo+ID4+Pj4gUmV2aWV3
ZWQtYnk6IFFpYW5nIFpoYW8gPHFpYW5nLnpoYW9AbnhwLmNvbT4NCj4gPj4+Pg0KPiA+Pj4NCj4g
Pj4+IFdobyBwaWNrcyB1cCB0aGlzIHBhdGNoID8gSXMgaXQgcXVldWVkIHNvbWV3aGVyZSBhbHJl
YWR5Pw0KPiA+Pg0KPiA+PiBOb3QgbWUuDQo+ID4NCj4gPiBZb3JrPyBZb3Ugc2VlbSB0byBiZSB0
aGUgb25seSBvbmUgbGVmdC4NCj4gPg0KPiANCj4gSSBhbSBub3QgYSBMaW51eCBtYWludGFpbmVy
LiBFdmVuIEkgd2FudCB0bywgSSBjYW4ndCBtZXJnZSB0aGlzIHBhdGNoLg0KPiANCj4gTGVvLCB3
aG8gY2FuIG1lcmdlIHRoaXMgcGF0Y2ggYW5kIHJlcXVlc3QgYSBwdWxsPw0KDQpTaW5jZSBpdCBm
YWxscyB1bmRlciB0aGUgZHJpdmVyL3NvYy9mc2wvIGZvbGRlci4gIEkgY2FuIHRha2UgaXQuDQoN
ClJlZ2FyZHMsDQpMZW8NCg==

^ permalink raw reply

* Re: [PATCH 7/7 v5] arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc
From: Robin Murphy @ 2018-07-03 16:35 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-8-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> fsl-mc bus support the new iommu-map property. Comply to this binding
> for fsl_mc bus.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> Reviewed-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
>   arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
> index 137ef4d..6010505 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
> @@ -184,6 +184,7 @@
>   		#address-cells = <2>;
>   		#size-cells = <2>;
>   		ranges;
> +		dma-ranges = <0x0 0x0 0x0 0x0 0x10000 0x00000000>;
>   
>   		clockgen: clocking@1300000 {
>   			compatible = "fsl,ls2080a-clockgen";
> @@ -357,6 +358,8 @@
>   			reg = <0x00000008 0x0c000000 0 0x40>,	 /* MC portal base */
>   			      <0x00000000 0x08340000 0 0x40000>; /* MC control reg */
>   			msi-parent = <&its>;
> +			iommu-map = <0 &smmu 0 0>;	/* This is fixed-up by u-boot */
> +			dma-coherent;
>   			#address-cells = <3>;
>   			#size-cells = <1>;
>   
> @@ -460,6 +463,8 @@
>   			compatible = "arm,mmu-500";
>   			reg = <0 0x5000000 0 0x800000>;
>   			#global-interrupts = <12>;
> +			#iommu-cells = <1>;
> +			stream-match-mask = <0x7C00>;
>   			interrupts = <0 13 4>, /* global secure fault */
>   				     <0 14 4>, /* combined secure interrupt */
>   				     <0 15 4>, /* global non-secure fault */
> @@ -502,7 +507,6 @@
>   				     <0 204 4>, <0 205 4>,
>   				     <0 206 4>, <0 207 4>,
>   				     <0 208 4>, <0 209 4>;
> -			mmu-masters = <&fsl_mc 0x300 0>;

Since we're in here, is the SMMU itself also coherent? If it is, you 
probably want to say so and avoid the overhead of pointlessly cleaning 
cache lines on every page table update.

Robin.

>   		};
>   
>   		dspi: dspi@2100000 {
> 

^ permalink raw reply

* Re: [PATCH 6/7 v5] bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus
From: Robin Murphy @ 2018-07-03 16:15 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-7-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> of_dma_configure() API expects coherent_dma_mask to be correctly
> set in the devices. This patch does the needful.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>   drivers/bus/fsl-mc/fsl-mc-bus.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index fa43c7d..624828b 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -627,6 +627,7 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
>   		mc_dev->icid = parent_mc_dev->icid;
>   		mc_dev->dma_mask = FSL_MC_DEFAULT_DMA_MASK;
>   		mc_dev->dev.dma_mask = &mc_dev->dma_mask;
> +		mc_dev->dev.coherent_dma_mask = mc_dev->dma_mask;
>   		dev_set_msi_domain(&mc_dev->dev,
>   				   dev_get_msi_domain(&parent_mc_dev->dev));
>   	}
> 

^ permalink raw reply

* Re: [PATCH 5/7 v5] bus: fsl-mc: support dma configure for devices on fsl-mc bus
From: Robin Murphy @ 2018-07-03 16:14 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-6-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> This patch adds support of dma configuration for devices on fsl-mc
> bus using 'dma_configure' callback for busses. Also, directly calling
> arch_setup_dma_ops is removed from the fsl-mc bus.

Looks like this is the final arch_setup_dma_ops offender, yay!

> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> Reviewed-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
> ---
>   drivers/bus/fsl-mc/fsl-mc-bus.c | 15 +++++++++++----
>   1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
> index 5d8266c..fa43c7d 100644
> --- a/drivers/bus/fsl-mc/fsl-mc-bus.c
> +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
> @@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct kobj_uevent_env *env)
>   	return 0;
>   }
>   
> +static int fsl_mc_dma_configure(struct device *dev)
> +{
> +	struct device *dma_dev = dev;
> +
> +	while (dev_is_fsl_mc(dma_dev))
> +		dma_dev = dma_dev->parent;
> +
> +	return of_dma_configure(dev, dma_dev->of_node, 0);
> +}
> +
>   static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
>   			     char *buf)
>   {
> @@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = {
>   	.name = "fsl-mc",
>   	.match = fsl_mc_bus_match,
>   	.uevent = fsl_mc_bus_uevent,
> +	.dma_configure  = fsl_mc_dma_configure,
>   	.dev_groups = fsl_mc_dev_groups,
>   };
>   EXPORT_SYMBOL_GPL(fsl_mc_bus_type);
> @@ -633,10 +644,6 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
>   			goto error_cleanup_dev;
>   	}
>   
> -	/* Objects are coherent, unless 'no shareability' flag set. */
> -	if (!(obj_desc->flags & FSL_MC_OBJ_FLAG_NO_MEM_SHAREABILITY))

Although it seems we do end up without any handling of this 
"non-coherent object behind coherent MC" case, and I'm not sure how 
easily that could be accommodated by generic code... :/ How important is 
the quirk?

Robin.

> -		arch_setup_dma_ops(&mc_dev->dev, 0, 0, NULL, true);
> -
>   	/*
>   	 * The device-specific probe callback will get invoked by device_add()
>   	 */
> 

^ permalink raw reply

* Re: [PATCH 4/7 v5] iommu: arm-smmu: Add support for the fsl-mc bus
From: Robin Murphy @ 2018-07-03 16:01 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-5-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> Implement bus specific support for the fsl-mc bus including
> registering arm_smmu_ops and bus specific device add operations.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>   drivers/iommu/arm-smmu.c |  7 +++++++
>   drivers/iommu/iommu.c    | 21 +++++++++++++++++++++
>   include/linux/fsl/mc.h   |  8 ++++++++
>   include/linux/iommu.h    |  2 ++
>   4 files changed, 38 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60..e1d5090 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -52,6 +52,7 @@
>   #include <linux/spinlock.h>
>   
>   #include <linux/amba/bus.h>
> +#include <linux/fsl/mc.h>
>   
>   #include "io-pgtable.h"
>   #include "arm-smmu-regs.h"
> @@ -1459,6 +1460,8 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
>   
>   	if (dev_is_pci(dev))
>   		group = pci_device_group(dev);
> +	else if (dev_is_fsl_mc(dev))
> +		group = fsl_mc_device_group(dev);
>   	else
>   		group = generic_device_group(dev);
>   
> @@ -2037,6 +2040,10 @@ static void arm_smmu_bus_init(void)
>   		bus_set_iommu(&pci_bus_type, &arm_smmu_ops);
>   	}
>   #endif
> +#ifdef CONFIG_FSL_MC_BUS
> +	if (!iommu_present(&fsl_mc_bus_type))
> +		bus_set_iommu(&fsl_mc_bus_type, &arm_smmu_ops);
> +#endif
>   }
>   
>   static int arm_smmu_device_probe(struct platform_device *pdev)
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d2aa2320..6d4ce35 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -32,6 +32,7 @@
>   #include <linux/pci.h>
>   #include <linux/bitops.h>
>   #include <linux/property.h>
> +#include <linux/fsl/mc.h>
>   #include <trace/events/iommu.h>
>   
>   static struct kset *iommu_group_kset;
> @@ -987,6 +988,26 @@ struct iommu_group *pci_device_group(struct device *dev)
>   	return iommu_group_alloc();
>   }
>   
> +/* Get the IOMMU group for device on fsl-mc bus */
> +struct iommu_group *fsl_mc_device_group(struct device *dev)
> +{
> +	struct device *cont_dev = fsl_mc_cont_dev(dev);
> +	struct iommu_group *group;
> +
> +	/* Container device is responsible for creating the iommu group */
> +	if (fsl_mc_is_cont_dev(dev)) {

Why duplicate what fsl_mc_cont_dev() has already figured out?

AFAICS the overall operation here boils down to something like:

	cont_dev = fsl_mc_cont_dev(dev);
	group = iommu_group_get(cont_dev);
	if (!group)
		group = iommu_group_alloc();
	return group;

> +		group = iommu_group_alloc();
> +		if (IS_ERR(group))
> +			return NULL;

iommu_group_get_for_dev() expects a PTR_ERR, so I don't think munging 
the return here is the right thing to do.

> +	} else {
> +		get_device(cont_dev);

If races are a concern, isn't this a bit late? Maybe in that case you 
want {get,put}_fsl_mc_cont_dev() routines instead of the simple macro 
below. But on the other hand if dev already has cont_dev as its parent, 
wouldn't the reference taken in device_add() be sufficient to prevent it 
from vanishing unexpectedly in this timescale?

Robin.

> +		group = iommu_group_get(cont_dev);
> +		put_device(cont_dev);
> +	}
> +
> +	return group;
> +}
> +
>   /**
>    * iommu_group_get_for_dev - Find or create the IOMMU group for a device
>    * @dev: target device
> diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
> index f27cb14..dddaca1 100644
> --- a/include/linux/fsl/mc.h
> +++ b/include/linux/fsl/mc.h
> @@ -351,6 +351,14 @@ struct fsl_mc_io {
>   #define dev_is_fsl_mc(_dev) (0)
>   #endif
>   
> +/* Macro to check if a device is a container device */
> +#define fsl_mc_is_cont_dev(_dev) (to_fsl_mc_device(_dev)->flags & \
> +	FSL_MC_IS_DPRC)
> +
> +/* Macro to get the container device of a MC device */
> +#define fsl_mc_cont_dev(_dev) (fsl_mc_is_cont_dev(_dev) ? \
> +	(_dev) : (_dev)->parent)
> +
>   /*
>    * module_fsl_mc_driver() - Helper macro for drivers that don't do
>    * anything special in module init/exit.  This eliminates a lot of
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee..2981200 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -389,6 +389,8 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain,
>   extern struct iommu_group *pci_device_group(struct device *dev);
>   /* Generic device grouping function */
>   extern struct iommu_group *generic_device_group(struct device *dev);
> +/* FSL-MC device grouping function */
> +struct iommu_group *fsl_mc_device_group(struct device *dev);
>   
>   /**
>    * struct iommu_fwspec - per-device IOMMU instance data
> 

^ permalink raw reply

* Re: [PATCH 3/7 v5] iommu: support iommu configuration for fsl-mc devices
From: Robin Murphy @ 2018-07-03 15:30 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-4-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> With of_pci_map_rid available for all the busses, use the function
> for configuration of devices on fsl-mc bus

FWIW I had a quick hack at factoring out the commonality with 
of_pci_iommu_init(), at which point I reckon this change is easier to 
follow as-is for the moment, so:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> ---
>   drivers/iommu/of_iommu.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 811e160..284474d 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -24,6 +24,7 @@
>   #include <linux/of_iommu.h>
>   #include <linux/of_pci.h>
>   #include <linux/slab.h>
> +#include <linux/fsl/mc.h>
>   
>   #define NO_IOMMU	1
>   
> @@ -159,6 +160,23 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
>   	return err;
>   }
>   
> +static int of_fsl_mc_iommu_init(struct fsl_mc_device *mc_dev,
> +				struct device_node *master_np)
> +{
> +	struct of_phandle_args iommu_spec = { .args_count = 1 };
> +	int err;
> +
> +	err = of_map_rid(master_np, mc_dev->icid, "iommu-map",
> +			 "iommu-map-mask", &iommu_spec.np,
> +			 iommu_spec.args);
> +	if (err)
> +		return err == -ENODEV ? NO_IOMMU : err;
> +
> +	err = of_iommu_xlate(&mc_dev->dev, &iommu_spec);
> +	of_node_put(iommu_spec.np);
> +	return err;
> +}
> +
>   const struct iommu_ops *of_iommu_configure(struct device *dev,
>   					   struct device_node *master_np)
>   {
> @@ -190,6 +208,8 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
>   
>   		err = pci_for_each_dma_alias(to_pci_dev(dev),
>   					     of_pci_iommu_init, &info);
> +	} else if (dev_is_fsl_mc(dev)) {
> +		err = of_fsl_mc_iommu_init(to_fsl_mc_device(dev), master_np);
>   	} else {
>   		struct of_phandle_args iommu_spec;
>   		int idx = 0;
> 

^ permalink raw reply

* Re: [PATCH] QE GPIO: Add qe_gpio_set_multiple
From: York Sun @ 2018-07-03 15:26 UTC (permalink / raw)
  To: jocke@infinera.com, Leo Li
  Cc: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au, Qiang Zhao
In-Reply-To: <b7c4d55627e474a868ad7284672fbbd14dcf8d39.camel@infinera.com>

+Leo=0A=
=0A=
On 07/03/2018 03:30 AM, Joakim Tjernlund wrote:=0A=
> On Tue, 2018-06-26 at 23:41 +1000, Michael Ellerman wrote:=0A=
>>=0A=
>> Joakim Tjernlund <Joakim.Tjernlund@infinera.com> writes:=0A=
>>> On Thu, 2018-06-21 at 02:38 +0000, Qiang Zhao wrote:=0A=
>>>> On 06/19/2018 09:22 AM, Joakim Tjernlund wrote:=0A=
>>>> -----Original Message-----=0A=
>>>> From: Linuxppc-dev [mailto:linuxppc-dev-bounces+qiang.zhao=3Dnxp.com@l=
ists.ozlabs.org] On Behalf Of Joakim Tjernlund=0A=
>>>> Sent: 2018=1B$BG/=1B(B6=1B$B7n=1B(B20=1B$BF|=1B(B 0:22=0A=
>>>> To: York Sun <york.sun@nxp.com>; linuxppc-dev <linuxppc-dev@lists.ozla=
bs.org>=0A=
>>>> Subject: [PATCH] QE GPIO: Add qe_gpio_set_multiple=0A=
>>>>=0A=
>>>> This cousin to gpio-mpc8xxx was lacking a multiple pins method, add on=
e.=0A=
>>>>=0A=
>>>> Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>=0A=
>>>> ---=0A=
>>>>  drivers/soc/fsl/qe/gpio.c | 28 ++++++++++++++++++++++++++++=0A=
>>>>  1 file changed, 28 insertions(+)=0A=
>>=0A=
>> ...=0A=
>>>>  static int qe_gpio_dir_in(struct gpio_chip *gc, unsigned int gpio)  {=
=0A=
>>>>         struct of_mm_gpio_chip *mm_gc =3D to_of_mm_gpio_chip(gc); @@ -=
298,6 +325,7 @@ static int __init qe_add_gpiochips(void)=0A=
>>>>                 gc->direction_output =3D qe_gpio_dir_out;=0A=
>>>>                 gc->get =3D qe_gpio_get;=0A=
>>>>                 gc->set =3D qe_gpio_set;=0A=
>>>> +               gc->set_multiple =3D qe_gpio_set_multiple;=0A=
>>>>=0A=
>>>>                 ret =3D of_mm_gpiochip_add_data(np, mm_gc, qe_gc);=0A=
>>>>                 if (ret)=0A=
>>>>=0A=
>>>> Reviewed-by: Qiang Zhao <qiang.zhao@nxp.com>=0A=
>>>>=0A=
>>>=0A=
>>> Who picks up this patch ? Is it queued somewhere already?=0A=
>>=0A=
>> Not me.=0A=
> =0A=
> York? You seem to be the only one left.=0A=
> =0A=
=0A=
I am not a Linux maintainer. Even I want to, I can't merge this patch.=0A=
=0A=
Leo, who can merge this patch and request a pull?=0A=
=0A=
York=0A=

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Baoquan He @ 2018-07-03 14:55 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <CAHp75Vdaqj+n=F7wSB9PFGpf9Ok2XZzKcq_H0DzmtjmN-4UUfw@mail.gmail.com>

Hi Andy,

On 06/12/18 at 05:24pm, Andy Shevchenko wrote:
> On Tue, Jun 12, 2018 at 5:20 PM, Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >> Hmm, I just copied it from arch/powerpc/kernel/pci-common.c. The
> >> function interface expects an integer returned value, not sure what a
> >> real error codes look like, could you give more hints? Will change
> >> accordingly.
> >
> > I briefly looked at the code and error codes we have, so, my proposal
> > is one of the following
> 
> >  - use -ECANCELED (not the best choice for first occurrence here,
> > though I can't find better)
> 
> Actually -ENOTSUPP might suit the first case (although the actual
> would be something like -EOVERLAP, which we don't have)

Sorry for late reply, and many thanks for your great suggestion.

I am fine to use -ENOTSUPP as the first returned value, and -ECANCELED
for the 2nd one. Or define an enum as you suggested inside the function
or in header file.

Or use -EBUSY for the first case because existing resource is
overlapping but not fully contained by 'res'; and -EINVAL for
the 2nd case since didn't find any one resources which is contained by
'res', means we passed in a invalid resource. 

All is fine to me, I can repost with each of them.

Thanks
Baoquan

> 
> >  - use positive integers (or enum), like
> >   #define RES_REPARENTED 0
> >   #define RES_OVERLAPPED 1
> >   #define RES_NOCONFLICT 2
> >
> >
> >>> > +               if (firstpp == NULL)
> >>> > +                       firstpp = pp;
> >>> > +       }
> >>>
> >>> > +       if (firstpp == NULL)
> >>> > +               return -1;      /* didn't find any conflicting entries? */
> >>>
> >>> Ditto.
> >
> > Ditto.
> >
> >>>
> >>> > +}
> >>> > +EXPORT_SYMBOL(reparent_resources);
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
> 
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply

* Re: [PATCH 2/7 v5] iommu: of: make of_pci_map_rid() available for other devices too
From: Robin Murphy @ 2018-07-03 14:41 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-3-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> iommu-map property is also used by devices with fsl-mc. This
> patch moves the of_pci_map_rid to generic location, so that it
> can be used by other busses too.
> 
> 'of_pci_map_rid' is renamed here to 'of_map_rid' and there is no
> functional change done in the API.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/iommu/of_iommu.c |   5 +--
>   drivers/of/base.c        | 102 +++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/of/irq.c         |   5 +--
>   drivers/pci/of.c         | 101 ----------------------------------------------
>   include/linux/of.h       |  11 +++++
>   include/linux/of_pci.h   |  10 -----
>   6 files changed, 117 insertions(+), 117 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 5c36a8b..811e160 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -149,9 +149,8 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data)
>   	struct of_phandle_args iommu_spec = { .args_count = 1 };
>   	int err;
>   
> -	err = of_pci_map_rid(info->np, alias, "iommu-map",
> -			     "iommu-map-mask", &iommu_spec.np,
> -			     iommu_spec.args);
> +	err = of_map_rid(info->np, alias, "iommu-map", "iommu-map-mask",
> +			 &iommu_spec.np, iommu_spec.args);
>   	if (err)
>   		return err == -ENODEV ? NO_IOMMU : err;
>   
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 848f549..c7aac81 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1995,3 +1995,105 @@ int of_find_last_cache_level(unsigned int cpu)
>   
>   	return cache_level;
>   }
> +
> +/**
> + * of_map_rid - Translate a requester ID through a downstream mapping.
> + * @np: root complex device node.
> + * @rid: device requester ID to map.
> + * @map_name: property name of the map to use.
> + * @map_mask_name: optional property name of the mask to use.
> + * @target: optional pointer to a target device node.
> + * @id_out: optional pointer to receive the translated ID.
> + *
> + * Given a device requester ID, look up the appropriate implementation-defined
> + * platform ID and/or the target device which receives transactions on that
> + * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or
> + * @id_out may be NULL if only the other is required. If @target points to
> + * a non-NULL device node pointer, only entries targeting that node will be
> + * matched; if it points to a NULL value, it will receive the device node of
> + * the first matching target phandle, with a reference held.
> + *
> + * Return: 0 on success or a standard error code on failure.
> + */
> +int of_map_rid(struct device_node *np, u32 rid,
> +	       const char *map_name, const char *map_mask_name,
> +	       struct device_node **target, u32 *id_out)
> +{
> +	u32 map_mask, masked_rid;
> +	int map_len;
> +	const __be32 *map = NULL;
> +
> +	if (!np || !map_name || (!target && !id_out))
> +		return -EINVAL;
> +
> +	map = of_get_property(np, map_name, &map_len);
> +	if (!map) {
> +		if (target)
> +			return -ENODEV;
> +		/* Otherwise, no map implies no translation */
> +		*id_out = rid;
> +		return 0;
> +	}
> +
> +	if (!map_len || map_len % (4 * sizeof(*map))) {
> +		pr_err("%pOF: Error: Bad %s length: %d\n", np,
> +			map_name, map_len);
> +		return -EINVAL;
> +	}
> +
> +	/* The default is to select all bits. */
> +	map_mask = 0xffffffff;
> +
> +	/*
> +	 * Can be overridden by "{iommu,msi}-map-mask" property.
> +	 * If of_property_read_u32() fails, the default is used.
> +	 */
> +	if (map_mask_name)
> +		of_property_read_u32(np, map_mask_name, &map_mask);
> +
> +	masked_rid = map_mask & rid;
> +	for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
> +		struct device_node *phandle_node;
> +		u32 rid_base = be32_to_cpup(map + 0);
> +		u32 phandle = be32_to_cpup(map + 1);
> +		u32 out_base = be32_to_cpup(map + 2);
> +		u32 rid_len = be32_to_cpup(map + 3);
> +
> +		if (rid_base & ~map_mask) {
> +			pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores rid-base (0x%x)\n",
> +				np, map_name, map_name,
> +				map_mask, rid_base);
> +			return -EFAULT;
> +		}
> +
> +		if (masked_rid < rid_base || masked_rid >= rid_base + rid_len)
> +			continue;
> +
> +		phandle_node = of_find_node_by_phandle(phandle);
> +		if (!phandle_node)
> +			return -ENODEV;
> +
> +		if (target) {
> +			if (*target)
> +				of_node_put(phandle_node);
> +			else
> +				*target = phandle_node;
> +
> +			if (*target != phandle_node)
> +				continue;
> +		}
> +
> +		if (id_out)
> +			*id_out = masked_rid - rid_base + out_base;
> +
> +		pr_debug("%pOF: %s, using mask %08x, rid-base: %08x, out-base: %08x, length: %08x, rid: %08x -> %08x\n",
> +			np, map_name, map_mask, rid_base, out_base,
> +			rid_len, rid, masked_rid - rid_base + out_base);
> +		return 0;
> +	}
> +
> +	pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
> +		np, map_name, rid, target && *target ? *target : NULL);
> +	return -EFAULT;
> +}
> +EXPORT_SYMBOL_GPL(of_map_rid);
> diff --git a/drivers/of/irq.c b/drivers/of/irq.c
> index 02ad93a..e1f6f39 100644
> --- a/drivers/of/irq.c
> +++ b/drivers/of/irq.c
> @@ -22,7 +22,6 @@
>   #include <linux/module.h>
>   #include <linux/of.h>
>   #include <linux/of_irq.h>
> -#include <linux/of_pci.h>
>   #include <linux/string.h>
>   #include <linux/slab.h>
>   
> @@ -588,8 +587,8 @@ static u32 __of_msi_map_rid(struct device *dev, struct device_node **np,
>   	 * "msi-map" property.
>   	 */
>   	for (parent_dev = dev; parent_dev; parent_dev = parent_dev->parent)
> -		if (!of_pci_map_rid(parent_dev->of_node, rid_in, "msi-map",
> -				    "msi-map-mask", np, &rid_out))
> +		if (!of_map_rid(parent_dev->of_node, rid_in, "msi-map",
> +				"msi-map-mask", np, &rid_out))
>   			break;
>   	return rid_out;
>   }
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index a28355c..d2cebbe 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -362,107 +362,6 @@ int of_pci_get_host_bridge_resources(struct device_node *dev,
>   EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources);
>   #endif /* CONFIG_OF_ADDRESS */
>   
> -/**
> - * of_pci_map_rid - Translate a requester ID through a downstream mapping.
> - * @np: root complex device node.
> - * @rid: PCI requester ID to map.
> - * @map_name: property name of the map to use.
> - * @map_mask_name: optional property name of the mask to use.
> - * @target: optional pointer to a target device node.
> - * @id_out: optional pointer to receive the translated ID.
> - *
> - * Given a PCI requester ID, look up the appropriate implementation-defined
> - * platform ID and/or the target device which receives transactions on that
> - * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or
> - * @id_out may be NULL if only the other is required. If @target points to
> - * a non-NULL device node pointer, only entries targeting that node will be
> - * matched; if it points to a NULL value, it will receive the device node of
> - * the first matching target phandle, with a reference held.
> - *
> - * Return: 0 on success or a standard error code on failure.
> - */
> -int of_pci_map_rid(struct device_node *np, u32 rid,
> -		   const char *map_name, const char *map_mask_name,
> -		   struct device_node **target, u32 *id_out)
> -{
> -	u32 map_mask, masked_rid;
> -	int map_len;
> -	const __be32 *map = NULL;
> -
> -	if (!np || !map_name || (!target && !id_out))
> -		return -EINVAL;
> -
> -	map = of_get_property(np, map_name, &map_len);
> -	if (!map) {
> -		if (target)
> -			return -ENODEV;
> -		/* Otherwise, no map implies no translation */
> -		*id_out = rid;
> -		return 0;
> -	}
> -
> -	if (!map_len || map_len % (4 * sizeof(*map))) {
> -		pr_err("%pOF: Error: Bad %s length: %d\n", np,
> -			map_name, map_len);
> -		return -EINVAL;
> -	}
> -
> -	/* The default is to select all bits. */
> -	map_mask = 0xffffffff;
> -
> -	/*
> -	 * Can be overridden by "{iommu,msi}-map-mask" property.
> -	 * If of_property_read_u32() fails, the default is used.
> -	 */
> -	if (map_mask_name)
> -		of_property_read_u32(np, map_mask_name, &map_mask);
> -
> -	masked_rid = map_mask & rid;
> -	for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
> -		struct device_node *phandle_node;
> -		u32 rid_base = be32_to_cpup(map + 0);
> -		u32 phandle = be32_to_cpup(map + 1);
> -		u32 out_base = be32_to_cpup(map + 2);
> -		u32 rid_len = be32_to_cpup(map + 3);
> -
> -		if (rid_base & ~map_mask) {
> -			pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores rid-base (0x%x)\n",
> -				np, map_name, map_name,
> -				map_mask, rid_base);
> -			return -EFAULT;
> -		}
> -
> -		if (masked_rid < rid_base || masked_rid >= rid_base + rid_len)
> -			continue;
> -
> -		phandle_node = of_find_node_by_phandle(phandle);
> -		if (!phandle_node)
> -			return -ENODEV;
> -
> -		if (target) {
> -			if (*target)
> -				of_node_put(phandle_node);
> -			else
> -				*target = phandle_node;
> -
> -			if (*target != phandle_node)
> -				continue;
> -		}
> -
> -		if (id_out)
> -			*id_out = masked_rid - rid_base + out_base;
> -
> -		pr_debug("%pOF: %s, using mask %08x, rid-base: %08x, out-base: %08x, length: %08x, rid: %08x -> %08x\n",
> -			np, map_name, map_mask, rid_base, out_base,
> -			rid_len, rid, masked_rid - rid_base + out_base);
> -		return 0;
> -	}
> -
> -	pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
> -		np, map_name, rid, target && *target ? *target : NULL);
> -	return -EFAULT;
> -}
> -
>   #if IS_ENABLED(CONFIG_OF_IRQ)
>   /**
>    * of_irq_parse_pci - Resolve the interrupt for a PCI device
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 4d25e4f..f4251c3 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -545,6 +545,10 @@ const __be32 *of_prop_next_u32(struct property *prop, const __be32 *cur,
>   
>   extern int of_cpu_node_to_id(struct device_node *np);
>   
> +int of_map_rid(struct device_node *np, u32 rid,
> +	       const char *map_name, const char *map_mask_name,
> +	       struct device_node **target, u32 *id_out);
> +
>   #else /* CONFIG_OF */
>   
>   static inline void of_core_init(void)
> @@ -931,6 +935,13 @@ static inline int of_cpu_node_to_id(struct device_node *np)
>   	return -ENODEV;
>   }
>   
> +static inline int of_map_rid(struct device_node *np, u32 rid,
> +			     const char *map_name, const char *map_mask_name,
> +			     struct device_node **target, u32 *id_out)
> +{
> +	return -EINVAL;
> +}
> +
>   #define of_match_ptr(_ptr)	NULL
>   #define of_match_node(_matches, _node)	NULL
>   #endif /* CONFIG_OF */
> diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h
> index 091033a..a23b44a 100644
> --- a/include/linux/of_pci.h
> +++ b/include/linux/of_pci.h
> @@ -17,9 +17,6 @@ struct device_node *of_pci_find_child_device(struct device_node *parent,
>   int of_get_pci_domain_nr(struct device_node *node);
>   int of_pci_get_max_link_speed(struct device_node *node);
>   void of_pci_check_probe_only(void);
> -int of_pci_map_rid(struct device_node *np, u32 rid,
> -		   const char *map_name, const char *map_mask_name,
> -		   struct device_node **target, u32 *id_out);
>   #else
>   static inline struct device_node *of_pci_find_child_device(struct device_node *parent,
>   					     unsigned int devfn)
> @@ -44,13 +41,6 @@ static inline int of_pci_get_devfn(struct device_node *np)
>   	return -1;
>   }
>   
> -static inline int of_pci_map_rid(struct device_node *np, u32 rid,
> -			const char *map_name, const char *map_mask_name,
> -			struct device_node **target, u32 *id_out)
> -{
> -	return -EINVAL;
> -}
> -
>   static inline int
>   of_pci_get_max_link_speed(struct device_node *node)
>   {
> 

^ permalink raw reply

* Re: [PATCH 1/7 v5] Docs: dt: add fsl-mc iommu-map device-tree binding
From: Robin Murphy @ 2018-07-03 14:39 UTC (permalink / raw)
  To: Nipun Gupta, will.deacon, robh+dt, robh, mark.rutland,
	catalin.marinas, gregkh, laurentiu.tudor, bhelgaas
  Cc: hch, joro, m.szyprowski, shawnguo, frowand.list, iommu,
	linux-kernel, devicetree, linux-arm-kernel, linuxppc-dev,
	linux-pci, bharat.bhushan, stuyoder, leoyang.li
In-Reply-To: <1526824191-7000-2-git-send-email-nipun.gupta@nxp.com>

On 20/05/18 14:49, Nipun Gupta wrote:
> The existing IOMMU bindings cannot be used to specify the relationship
> between fsl-mc devices and IOMMUs. This patch adds a generic binding for
> mapping fsl-mc devices to IOMMUs, using iommu-map property.
> 
> Signed-off-by: Nipun Gupta <nipun.gupta@nxp.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> ---
>   .../devicetree/bindings/misc/fsl,qoriq-mc.txt      | 39 ++++++++++++++++++++++
>   1 file changed, 39 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> index 6611a7c..8cbed4f 100644
> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
> @@ -9,6 +9,25 @@ blocks that can be used to create functional hardware objects/devices
>   such as network interfaces, crypto accelerator instances, L2 switches,
>   etc.
>   
> +For an overview of the DPAA2 architecture and fsl-mc bus see:
> +drivers/staging/fsl-mc/README.txt

Nit: Looks like that's Documentation/networking/dpaa2/overview.rst now.

> +
> +As described in the above overview, all DPAA2 objects in a DPRC share the
> +same hardware "isolation context" and a 10-bit value called an ICID
> +(isolation context id) is expressed by the hardware to identify
> +the requester.
> +
> +The generic 'iommus' property is insufficient to describe the relationship
> +between ICIDs and IOMMUs, so an iommu-map property is used to define
> +the set of possible ICIDs under a root DPRC and how they map to
> +an IOMMU.
> +
> +For generic IOMMU bindings, see
> +Documentation/devicetree/bindings/iommu/iommu.txt.
> +
> +For arm-smmu binding, see:
> +Documentation/devicetree/bindings/iommu/arm,smmu.txt.
> +
>   Required properties:
>   
>       - compatible
> @@ -88,14 +107,34 @@ Sub-nodes:
>                 Value type: <phandle>
>                 Definition: Specifies the phandle to the PHY device node associated
>                             with the this dpmac.
> +Optional properties:
> +
> +- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier
> +  data.
> +
> +  The property is an arbitrary number of tuples of
> +  (icid-base,iommu,iommu-base,length).
> +
> +  Any ICID i in the interval [icid-base, icid-base + length) is
> +  associated with the listed IOMMU, with the iommu-specifier
> +  (i - icid-base + iommu-base).
>   
>   Example:
>   
> +        smmu: iommu@5000000 {
> +               compatible = "arm,mmu-500";
> +               #iommu-cells = <2>;

This should be 1 if stream-match-mask is present. Bad example is bad :)

Robin.

> +               stream-match-mask = <0x7C00>;
> +               ...
> +        };
> +
>           fsl_mc: fsl-mc@80c000000 {
>                   compatible = "fsl,qoriq-mc";
>                   reg = <0x00000008 0x0c000000 0 0x40>,    /* MC portal base */
>                         <0x00000000 0x08340000 0 0x40000>; /* MC control reg */
>                   msi-parent = <&its>;
> +                /* define map for ICIDs 23-64 */
> +                iommu-map = <23 &smmu 23 41>;
>                   #address-cells = <3>;
>                   #size-cells = <1>;
>   
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox