LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v6 12/17] powerpc/pseries/vas: Integrate API with open/close windows
From: Haren Myneni @ 2021-06-18  7:49 UTC (permalink / raw)
  To: Nicholas Piggin, herbert, linux-crypto, linuxppc-dev, mpe
In-Reply-To: <1623971609.844odc55aw.astroid@bobo.none>

On Fri, 2021-06-18 at 09:22 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of June 18, 2021 6:36 am:
> > This patch adds VAS window allocatioa/close with the corresponding
> > hcalls. Also changes to integrate with the existing user space VAS
> > API and provide register/unregister functions to NX pseries driver.
> > 
> > The driver register function is used to create the user space
> > interface (/dev/crypto/nx-gzip) and unregister to remove this
> > entry.
> > 
> > The user space process opens this device node and makes an ioctl
> > to allocate VAS window. The close interface is used to deallocate
> > window.
> > 
> > Signed-off-by: Haren Myneni <haren@linux.ibm.com>
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Unless there is some significant performance reason it might be
> simplest
> to take the mutex for the duration of the allocate and frees rather
> than 
> taking it several times, covering the atomic with the lock instead.
> 
> You have a big lock, might as well use it and not have to wonder what
> if 
> things race here or there.

Using mutex to protect allocate/deallocate window and setup/free IRQ,
also to protect updating the list. We do not need lock for modify
window hcall and other things. Hence taking mutex several times. Also
used atomic for counters (used_lpar_creds) which can be exported in
sysfs (this patch will be added later in next enhancement seris). 

Genarlly applications open window initially, do continuous copy/paste
operations and close window later. But possible that the library /
application to open/close window for each request. Also may be opening
or closing multiple windows (say 1000 depends on cores on the system)
at the same time. These cases may affect the application performance.

Thanks
Haren

> 
> But don't rework that now, maybe just something to consider for
> later.
> 
> Thanks,
> Nick
> 


^ permalink raw reply

* Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64
From: Daniel Axtens @ 2021-06-18  8:00 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: Jordan Niethe, npiggin, aneesh.kumar
In-Reply-To: <20210517061658.194708-5-jniethe5@gmail.com>


> +#ifdef CONFIG_PPC64
> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> +{
> +	struct page *page;
> +
> +	page = virt_to_page(addr);
> +	if (protect)
> +		__kernel_map_pages(page, 1, 0);
> +	else
> +		__kernel_map_pages(page, 1, 1);

I lose track of the type conversions and code conventions involved, but
can we do something like __kernel_map_pages(page, 1, !!protect)?

Apart from that, this seems good.

Kind regards,
Daniel

^ permalink raw reply

* Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64
From: Daniel Axtens @ 2021-06-18  8:02 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: Jordan Niethe, npiggin, aneesh.kumar
In-Reply-To: <87czsjsitg.fsf@dja-thinkpad.axtens.net>

Daniel Axtens <dja@axtens.net> writes:

>> +#ifdef CONFIG_PPC64
>> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
>> +{
>> +	struct page *page;
>> +
>> +	page = virt_to_page(addr);
>> +	if (protect)
>> +		__kernel_map_pages(page, 1, 0);
>> +	else
>> +		__kernel_map_pages(page, 1, 1);
>
> I lose track of the type conversions and code conventions involved, but
> can we do something like __kernel_map_pages(page, 1, !!protect)?

Ah, I missed that the if changed the truth/falsity. !protect, not
!!protect :P
>
> Apart from that, this seems good.
>
> Kind regards,
> Daniel

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Gautham R Shenoy @ 2021-06-18  9:02 UTC (permalink / raw)
  To: Pratik R. Sampat
  Cc: pratik.r.sampat, linux-kernel, kvm-ppc, paulus, linuxppc-dev
In-Reply-To: <20210616134240.62195-2-psampat@linux.ibm.com>

On Wed, Jun 16, 2021 at 07:12:40PM +0530, Pratik R. Sampat wrote:
> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
> 
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
> 
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
>   uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
>   uint64 flags,           // Per the flag request
>   uint64 firstAttributeId,// The attribute id
>   uint64 bufferAddress,   // Guest physical address of the output buffer
>   uint64 bufferSize       // The size in bytes of the output buffer
> );
> 
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id


For a single attribute, the “firstAttributeId” must be set by the
caller to the attribute id to retrieve and the “singleAttribute” field
in “flags” must also be set to a 1.

If we don't set the "flags" to 1, while specifying a firstAttributeId,
the hypervisor will populate the buffer with the details pertaining to
all the attributes beginning with firstAttributeId.



> 
> The output buffer consists of the following
> 1. number of attributes              - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info                      - 1 byte
> 4. A data array of size num attributes, which contains the following:
>   a. attribute ID              - 8 bytes
>   b. attribute value in number - 8 bytes
>   c. attribute name in string  - 64 bytes
>   d. attribute value in string - 64 bytes
> 
[..snip..]

> +
> +static ssize_t papr_show_value(struct kobject *kobj,
> +				struct kobj_attribute *attr,
> +				char *buf)
> +{
> +	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> +	struct hv_energy_scale_buffer *t_buf;
> +	struct energy_scale_attributes *t_ea;
> +	int data_offset, ret = 0;
> +
> +	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> +	if (t_buf == NULL)
> +		return -ENOMEM;
> +
> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +				 pattr->id, virt_to_phys(t_buf),
> +				 sizeof(*t_buf));
> +


In this case, since we are interested in only one attribute, we can
make the call

	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 1,
				 pattr->id, virt_to_phys(t_buf),
				 sizeof(*t_buf));

setting flags=1.

Same in the function papr_show_value_desc() below

--
Thanks and Regards
gautham.


> +	if (ret != H_SUCCESS) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}
> +
> +	data_offset = be64_to_cpu(t_buf->array_offset) -
> +			(sizeof(t_buf->num_attr) +
> +			sizeof(t_buf->array_offset) +
> +			sizeof(t_buf->data_header_version));
> +
> +	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> +	ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
> +	if (ret < 0)
> +		ret = -EIO;
> +out:
> +	kfree(t_buf);
> +
> +	return ret;
> +}
> +
> +static ssize_t papr_show_value_desc(struct kobject *kobj,
> +				     struct kobj_attribute *attr,
> +				     char *buf)
> +{
> +	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> +	struct hv_energy_scale_buffer *t_buf;
> +	struct energy_scale_attributes *t_ea;
> +	int data_offset, ret = 0;
> +
> +	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> +	if (t_buf == NULL)
> +		return -ENOMEM;
> +
> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +				 pattr->id, virt_to_phys(t_buf),
> +				 sizeof(*t_buf));
> +
> +	if (ret != H_SUCCESS) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}
> +
> +	data_offset = be64_to_cpu(t_buf->array_offset) -
> +			(sizeof(t_buf->num_attr) +
> +			sizeof(t_buf->array_offset) +
> +			sizeof(t_buf->data_header_version));
> +
> +	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> +	ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
> +	if (ret < 0)
> +		ret = -EIO;
> +out:
> +	kfree(t_buf);
> +
> +	return ret;
> +}
> +
> +static struct papr_ops_info {
> +	const char *attr_name;
> +	ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
> +			char *buf);
> +} ops_info[] = {
> +	{ "desc", papr_show_desc },
> +	{ "value", papr_show_value },
> +	{ "value_desc", papr_show_value_desc },
> +};
> +
> +static void add_attr(u64 id, int index, struct papr_attr *attr)
> +{
> +	attr->id = id;
> +	sysfs_attr_init(&attr->attr.attr);
> +	attr->attr.attr.name = ops_info[index].attr_name;
> +	attr->attr.attr.mode = 0444;
> +	attr->attr.show = ops_info[index].show;
> +}
> +
> +static int add_attr_group(u64 id, int len, struct papr_group *pg,
> +			  bool show_val_desc)
> +{
> +	int i;
> +
> +	for (i = 0; i < len; i++) {
> +		if (!strcmp(ops_info[i].attr_name, "value_desc") &&
> +		    !show_val_desc) {
> +			continue;
> +		}
> +		add_attr(id, i, &pg->pgattrs[i]);
> +		pg->pg.attrs[i] = &pg->pgattrs[i].attr.attr;
> +	}
> +
> +	return sysfs_create_group(escale_kobj, &pg->pg);
> +}
> +
> +
> +static int __init papr_init(void)
> +{
> +	uint64_t num_attr;
> +	int ret, idx, i, data_offset;
> +
> +	em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
> +	if (em_buf == NULL)
> +		return -ENOMEM;
> +	/*
> +	 * hcall(
> +	 * uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
> +	 * uint64 flags,            // Per the flag request
> +	 * uint64 firstAttributeId, // The attribute id
> +	 * uint64 bufferAddress,    // Guest physical address of the output buffer
> +	 * uint64 bufferSize);      // The size in bytes of the output buffer
> +	 */
> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
> +				 virt_to_phys(em_buf), sizeof(*em_buf));
> +
> +	if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
> +	    em_buf->data_header_version != 0x1) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}
> +
> +	num_attr = be64_to_cpu(em_buf->num_attr);
> +
> +	/*
> +	 * Typecast the energy buffer to the attribute structure at the offset
> +	 * specified in the buffer
> +	 */
> +	data_offset = be64_to_cpu(em_buf->array_offset) -
> +			(sizeof(em_buf->num_attr) +
> +			sizeof(em_buf->array_offset) +
> +			sizeof(em_buf->data_header_version));
> +
> +	ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
> +
> +	pgs = kcalloc(num_attr, sizeof(*pgs), GFP_KERNEL);
> +	if (!pgs)
> +		goto out_pgs;
> +
> +	papr_kobj = kobject_create_and_add("papr", firmware_kobj);
> +	if (!papr_kobj) {
> +		pr_warn("kobject_create_and_add papr failed\n");
> +		goto out_kobj;
> +	}
> +
> +	escale_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
> +	if (!escale_kobj) {
> +		pr_warn("kobject_create_and_add energy_scale_info failed\n");
> +		goto out_ekobj;
> +	}
> +
> +	for (idx = 0; idx < num_attr; idx++) {
> +		char buf[4];
> +		bool show_val_desc = true;
> +
> +		pgs[idx].pgattrs = kcalloc(MAX_ATTRS,
> +					   sizeof(*pgs[idx].pgattrs),
> +					   GFP_KERNEL);
> +		if (!pgs[idx].pgattrs)
> +			goto out_kobj;
> +
> +		pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
> +					    sizeof(*pgs[idx].pg.attrs),
> +					    GFP_KERNEL);
> +		if (!pgs[idx].pg.attrs) {
> +			kfree(pgs[idx].pgattrs);
> +			goto out_kobj;
> +		}
> +
> +		sprintf(buf, "%lld", be64_to_cpu(ea[idx].attr_id));
> +		pgs[idx].pg.name = buf;
> +
> +		/* Do not add the value description if it does not exist */
> +		if (strlen(ea[idx].attr_value_desc) == 0)
> +			show_val_desc = false;
> +
> +		if (add_attr_group(be64_to_cpu(ea[idx].attr_id),
> +				   MAX_ATTRS, &pgs[idx], show_val_desc)) {
> +			pr_warn("Failed to create papr attribute group %s\n",
> +				pgs[idx].pg.name);
> +			goto out_pgattrs;
> +		}
> +	}
> +
> +	return 0;
> +
> +out_pgattrs:
> +	for (i = 0; i < MAX_ATTRS; i++) {
> +		kfree(pgs[i].pgattrs);
> +		kfree(pgs[i].pg.attrs);
> +	}
> +out_ekobj:
> +	kobject_put(escale_kobj);
> +out_kobj:
> +	kobject_put(papr_kobj);
> +out_pgs:
> +	kfree(pgs);
> +out:
> +	kfree(em_buf);
> +
> +	return -ENOMEM;
> +}
> +
> +machine_device_initcall(pseries, papr_init);
> -- 
> 2.30.2
> 

^ permalink raw reply

* linux-next: manual merge of the akpm-current tree with the powerpc tree
From: Stephen Rothwell @ 2021-06-18  9:44 UTC (permalink / raw)
  To: Andrew Morton, Michael Ellerman, PowerPC
  Cc: Linux Next Mailing List, Nicholas Piggin,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/powerpc/kernel/smp.c

between commit:

  86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C")

from the powerpc tree and commit:

  103e676c91d0 ("lazy tlb: introduce lazy mm refcount helper functions")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/kernel/smp.c
index b83a59ce9beb,b289f1d213f8..000000000000
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@@ -1541,11 -1541,7 +1541,11 @@@ void start_secondary(void *unused
  {
  	unsigned int cpu = raw_smp_processor_id();
  
 +	/* PPC64 calls setup_kup() in early_setup_secondary() */
 +	if (IS_ENABLED(CONFIG_PPC32))
 +		setup_kup();
 +
- 	mmgrab(&init_mm);
+ 	mmgrab_lazy_tlb(&init_mm);
  	current->active_mm = &init_mm;
  
  	smp_store_cpu_info(cpu);

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Shengjiu Wang @ 2021-06-18  9:51 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, perex, tiwai,
	alsa-devel
  Cc: linuxppc-dev, linux-kernel

There is an unhandled interrupt after suspend, which cause endless
interrupt when system resume, so system may hang.

Disable all interrupts in runtime suspend callback to avoid above
issue.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 sound/soc/fsl/fsl_xcvr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index df7c189d97dd..37b4fdb236ee 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -1233,6 +1233,11 @@ static __maybe_unused int fsl_xcvr_runtime_suspend(struct device *dev)
 	struct fsl_xcvr *xcvr = dev_get_drvdata(dev);
 	int ret;
 
+	ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
+				 FSL_XCVR_IRQ_EARC_ALL, 0);
+	if (ret < 0)
+		dev_err(dev, "Failed to clear IER0: %d\n", ret);
+
 	/* Assert M0+ reset */
 	ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_CTRL,
 				 FSL_XCVR_EXT_CTRL_CORE_RESET,
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Fabio Estevam @ 2021-06-18 11:20 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, linux-kernel, Takashi Iwai,
	Jaroslav Kysela, Nicolin Chen, Mark Brown, linuxppc-dev
In-Reply-To: <1624009876-3076-1-git-send-email-shengjiu.wang@nxp.com>

Hi Shengjiu,

On Fri, Jun 18, 2021 at 7:10 AM Shengjiu Wang <shengjiu.wang@nxp.com> wrote:
>
> There is an unhandled interrupt after suspend, which cause endless
> interrupt when system resume, so system may hang.
>
> Disable all interrupts in runtime suspend callback to avoid above
> issue.

Fixe tag?

> +       ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
> +                                FSL_XCVR_IRQ_EARC_ALL, 0);
> +       if (ret < 0)
> +               dev_err(dev, "Failed to clear IER0: %d\n", ret);
> +

The operations in _suspend() are usually balanced with the ones in _resume().

Shouldn't you enable the interrupts in resume() then?

I see that the interrupts are currently enabled inside fsl_xcvr_prepare().

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Shengjiu Wang @ 2021-06-18 11:30 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, Shengjiu Wang, linux-kernel,
	Takashi Iwai, Nicolin Chen, Mark Brown, linuxppc-dev
In-Reply-To: <CAOMZO5DYLZmz6f0yjrOpaL4B_wicq0ofrYpW6QqzNPEc0j407Q@mail.gmail.com>

Hi Fabio

On Fri, Jun 18, 2021 at 7:21 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Shengjiu,
>
> On Fri, Jun 18, 2021 at 7:10 AM Shengjiu Wang <shengjiu.wang@nxp.com> wrote:
> >
> > There is an unhandled interrupt after suspend, which cause endless
> > interrupt when system resume, so system may hang.
> >
> > Disable all interrupts in runtime suspend callback to avoid above
> > issue.
>
> Fixe tag?

ok, I will add it.

>
> > +       ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
> > +                                FSL_XCVR_IRQ_EARC_ALL, 0);
> > +       if (ret < 0)
> > +               dev_err(dev, "Failed to clear IER0: %d\n", ret);
> > +
>
> The operations in _suspend() are usually balanced with the ones in _resume().
>
> Shouldn't you enable the interrupts in resume() then?

No,  as you said below, the interrupts are enabled in fsl_xcvr_prepare().
so this change should not block anything.

>
> I see that the interrupts are currently enabled inside fsl_xcvr_prepare().

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Mark Brown @ 2021-06-18 11:33 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, linuxppc-dev, Shengjiu Wang,
	linux-kernel, Takashi Iwai, Nicolin Chen, Fabio Estevam
In-Reply-To: <CAA+D8AOiL2otCBRyP3q7EWd2C7RnUhWZjRtxcJWQyXXXydf_ZQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On Fri, Jun 18, 2021 at 07:30:25PM +0800, Shengjiu Wang wrote:
> On Fri, Jun 18, 2021 at 7:21 PM Fabio Estevam <festevam@gmail.com> wrote:

> > The operations in _suspend() are usually balanced with the ones in _resume().

> > Shouldn't you enable the interrupts in resume() then?

> No,  as you said below, the interrupts are enabled in fsl_xcvr_prepare().
> so this change should not block anything.

Might be worth a comment explaining why there's the asymmetry.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 5/9] powerpc/microwatt: Use standard 16550 UART for console
From: Paul Mackerras @ 2021-06-18 12:12 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <1624001539.de8wj3qkjv.astroid@bobo.none>

On Fri, Jun 18, 2021 at 05:40:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Paul Mackerras's message of June 18, 2021 1:46 pm:
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > 
> > This adds support to the Microwatt platform to use the standard
> > 16550-style UART which available in the standalone Microwatt FPGA.
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
...
> > +#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
> > +
> > +#define UDBG_UART_MW_ADDR	((void __iomem *)0xc0002000)
> > +
> > +static u8 udbg_uart_in_isa300_rm(unsigned int reg)
> > +{
> > +	uint64_t msr = mfmsr();
> > +	uint8_t  c;
> > +
> > +	mtmsr(msr & ~(MSR_EE|MSR_DR));
> > +	isync();
> > +	eieio();
> > +	c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
> > +	mtmsr(msr);
> > +	isync();
> > +	return c;
> > +}
> 
> Why is realmode required? No cache inhibited mappings yet?

Because it's EARLY debug, for use in the very early stages of boot
when the kernel's radix tree may or may not have been initialized.
The easiest way to make a function that works correctly whether or not
the radix tree has been initialized and the MMU turned on is to
temporarily turn off the MMU for data accesses and use lbzcix/stbcix
(which Microwatt has, even though it doesn't implement hypervisor
mode).

(I don't know which "yet" you meant - "yet" in the process of booting a
kernel, or "yet" in the process of Microwatt's development?  Microwatt
certainly does have cache-inhibited mappings and has done since the
MMU was first introduced.)

In fact the defconfig I add later in the series doesn't enable
CONFIG_PPC_EARLY_DEBUG_MICROWATT, but it's there if it's needed for
debugging.

> mtmsrd with L=0 is defined to be context synchronizing in isa 3, so I 
> don't think the isync would be required. There is a bit of code around 
> arch/powerpc that does this, maybe it used to be needed or some other
> implementations needed it?
> 
> That's just for my curiosity, it doesn't really hurt to have them
> there.

Right, and in fact mtmsrd is marked as a single-issue instruction in
Microwatt, so it should work with no isyncs or eieios.  Presumably Ben
copied the isync/eieio pattern from somewhere else.

Paul.

^ permalink raw reply

* [PATCH v2] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Shengjiu Wang @ 2021-06-18 12:38 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, perex, tiwai,
	alsa-devel
  Cc: linuxppc-dev, linux-kernel

There is an unhandled interrupt after suspend, which cause endless
interrupt when system resume, so system may hang.

Disable all interrupts in runtime suspend callback to avoid above
issue.

Fixes: 28564486866f ("ASoC: fsl_xcvr: Add XCVR ASoC CPU DAI driver")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
changes in v2:
- Add Fixes tag
- Add comments for the change

 sound/soc/fsl/fsl_xcvr.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index df7c189d97dd..92dd99258edf 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -1233,6 +1233,16 @@ static __maybe_unused int fsl_xcvr_runtime_suspend(struct device *dev)
 	struct fsl_xcvr *xcvr = dev_get_drvdata(dev);
 	int ret;
 
+	/*
+	 * Clear interrupts, when streams starts or resumes after
+	 * suspend, interrupts are enabled in prepare(), so no need
+	 * to enable interrupts in resume().
+	 */
+	ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
+				 FSL_XCVR_IRQ_EARC_ALL, 0);
+	if (ret < 0)
+		dev_err(dev, "Failed to clear IER0: %d\n", ret);
+
 	/* Assert M0+ reset */
 	ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_CTRL,
 				 FSL_XCVR_EXT_CTRL_CORE_RESET,
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH v2] ASoC: fsl_xcvr: disable all interrupts when suspend happens
From: Fabio Estevam @ 2021-06-18 13:45 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, linux-kernel, Takashi Iwai,
	Jaroslav Kysela, Nicolin Chen, Mark Brown, linuxppc-dev
In-Reply-To: <1624019913-3380-1-git-send-email-shengjiu.wang@nxp.com>

On Fri, Jun 18, 2021 at 9:57 AM Shengjiu Wang <shengjiu.wang@nxp.com> wrote:
>
> There is an unhandled interrupt after suspend, which cause endless
> interrupt when system resume, so system may hang.
>
> Disable all interrupts in runtime suspend callback to avoid above
> issue.
>
> Fixes: 28564486866f ("ASoC: fsl_xcvr: Add XCVR ASoC CPU DAI driver")
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> ---
> changes in v2:
> - Add Fixes tag
> - Add comments for the change

Reviewed-by: Fabio Estevam <festevam@gmail.com>

^ permalink raw reply

* Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
From: Tom Lendacky @ 2021-06-18 14:09 UTC (permalink / raw)
  To: Claire Chang, Stefano Stabellini
  Cc: heikki.krogerus, thomas.hellstrom, peterz, joonas.lahtinen,
	dri-devel, chris, grant.likely, paulus, Frank Rowand, mingo,
	Marek Szyprowski, Nicolas Boichat, Saravana Kannan, Joerg Roedel,
	Rafael J . Wysocki, Christoph Hellwig, Bartosz Golaszewski,
	bskeggs, linux-pci, xen-devel, Thierry Reding, intel-gfx,
	matthew.auld, linux-devicetree, Jianxiong Gao, Daniel Vetter,
	Will Deacon, Konrad Rzeszutek Wilk, maarten.lankhorst, airlied,
	Dan Williams, linuxppc-dev, jani.nikula, Rob Herring,
	rodrigo.vivi, Bjorn Helgaas, boris.ostrovsky, Andy Shevchenko,
	jgross, Greg KH, Randy Dunlap, lkml, Tomasz Figa,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, xypron.glpk,
	Robin Murphy, bauerman
In-Reply-To: <CALiNf29SJ0jXirWVDhJw4BUNvkjUeGPyGNJK9m8c30OPX41=5Q@mail.gmail.com>

On 6/18/21 1:25 AM, Claire Chang wrote:
> On Fri, Jun 18, 2021 at 7:30 AM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
>>
>> On Thu, 17 Jun 2021, Claire Chang wrote:
>>> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
>>> initialization to make the code reusable.
>>>
>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> Tested-by: Stefano Stabellini <sstabellini@kernel.org>
>>> Tested-by: Will Deacon <will@kernel.org>
>>> ---
>>>  kernel/dma/swiotlb.c | 50 ++++++++++++++++++++++----------------------
>>>  1 file changed, 25 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 52e2ac526757..47bb2a766798 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>>>       memset(vaddr, 0, bytes);
>>>  }
>>>
>>> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>>> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
>>> +                                 unsigned long nslabs, bool late_alloc)
>>>  {
>>> +     void *vaddr = phys_to_virt(start);
>>>       unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>> +
>>> +     mem->nslabs = nslabs;
>>> +     mem->start = start;
>>> +     mem->end = mem->start + bytes;
>>> +     mem->index = 0;
>>> +     mem->late_alloc = late_alloc;
>>> +     spin_lock_init(&mem->lock);
>>> +     for (i = 0; i < mem->nslabs; i++) {
>>> +             mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> +             mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> +             mem->slots[i].alloc_size = 0;
>>> +     }
>>> +     memset(vaddr, 0, bytes);
>>> +}
>>> +
>>> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>>> +{
>>>       struct io_tlb_mem *mem;
>>>       size_t alloc_size;
>>>
>>> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>>>       if (!mem)
>>>               panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
>>>                     __func__, alloc_size, PAGE_SIZE);
>>> -     mem->nslabs = nslabs;
>>> -     mem->start = __pa(tlb);
>>> -     mem->end = mem->start + bytes;
>>> -     mem->index = 0;
>>> -     spin_lock_init(&mem->lock);
>>> -     for (i = 0; i < mem->nslabs; i++) {
>>> -             mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> -             mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> -             mem->slots[i].alloc_size = 0;
>>> -     }
>>> +
>>> +     swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>>>
>>>       io_tlb_default_mem = mem;
>>>       if (verbose)
>>> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>>>  int
>>>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>>>  {
>>> -     unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>>       struct io_tlb_mem *mem;
>>> +     unsigned long bytes = nslabs << IO_TLB_SHIFT;
>>>
>>>       if (swiotlb_force == SWIOTLB_NO_FORCE)
>>>               return 0;
>>> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>>>       if (!mem)
>>>               return -ENOMEM;
>>>
>>> -     mem->nslabs = nslabs;
>>> -     mem->start = virt_to_phys(tlb);
>>> -     mem->end = mem->start + bytes;
>>> -     mem->index = 0;
>>> -     mem->late_alloc = 1;
>>> -     spin_lock_init(&mem->lock);
>>> -     for (i = 0; i < mem->nslabs; i++) {
>>> -             mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> -             mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> -             mem->slots[i].alloc_size = 0;
>>> -     }
>>> -
>>> +     memset(mem, 0, sizeof(*mem));
>>> +     swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>>>       set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
>>> -     memset(tlb, 0, bytes);
>>
>> This is good for swiotlb_late_init_with_tbl. However I have just noticed
>> that mem could also be allocated from swiotlb_init_with_tbl, in which
>> case the zeroing is missing. I think we need another memset in
>> swiotlb_init_with_tbl as well. Or maybe it could be better to have a
>> single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to
>> you.
> 
> swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> swiotlb_init_with_tbl is also good.
> I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> it's clearer and safer.

On x86, if the memset is done before set_memory_decrypted() and memory
encryption is active, then the memory will look like ciphertext afterwards
and not be zeroes. If zeroed memory is required, then a memset must be
done after the set_memory_decrypted() calls.

Thanks,
Tom

> 
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.13-rc6%2Fsource%2Finclude%2Flinux%2Fmemblock.h%23L407&amp;data=04%7C01%7Cthomas.lendacky%40amd.com%7C3e33e04212b84f9e4ed108d932230511%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637595948355050693%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TGBDj18KuSHTb45EBz%2Bypfbr4Xgqb1aGTRDCTIpIgJo%3D&amp;reserved=0
> [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.13-rc6%2Fsource%2Fmm%2Fmemblock.c%23L1555&amp;data=04%7C01%7Cthomas.lendacky%40amd.com%7C3e33e04212b84f9e4ed108d932230511%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637595948355060689%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=K%2FWbN6iKN9JNtwDSkIaKH2BVLdDTWhn8tPfNdCOVkSA%3D&amp;reserved=0
> 

^ permalink raw reply

* Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
From: Christoph Hellwig @ 2021-06-18 14:32 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: heikki.krogerus, thomas.hellstrom, peterz, joonas.lahtinen,
	dri-devel, chris, grant.likely, paulus, Frank Rowand, mingo,
	Marek Szyprowski, Stefano Stabellini, Saravana Kannan,
	Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
	Bartosz Golaszewski, bskeggs, linux-pci, xen-devel,
	Thierry Reding, intel-gfx, matthew.auld, linux-devicetree,
	Jianxiong Gao, Daniel Vetter, Will Deacon, Konrad Rzeszutek Wilk,
	maarten.lankhorst, airlied, Dan Williams, linuxppc-dev,
	jani.nikula, Rob Herring, rodrigo.vivi, Bjorn Helgaas,
	Claire Chang, boris.ostrovsky, Andy Shevchenko, jgross,
	Nicolas Boichat, Greg KH, Randy Dunlap, lkml, Tomasz Figa,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, xypron.glpk,
	Robin Murphy, bauerman
In-Reply-To: <741a34cc-547c-984d-8af4-2f309880acfa@amd.com>

On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote:
> > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> > swiotlb_init_with_tbl is also good.
> > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> > it's clearer and safer.
> 
> On x86, if the memset is done before set_memory_decrypted() and memory
> encryption is active, then the memory will look like ciphertext afterwards
> and not be zeroes. If zeroed memory is required, then a memset must be
> done after the set_memory_decrypted() calls.

Which should be fine - we don't care that the memory is cleared to 0,
just that it doesn't leak other data.  Maybe a comment would be useful,
though,

^ permalink raw reply

* Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation
From: Christophe Leroy @ 2021-06-18 15:27 UTC (permalink / raw)
  To: Andy Lutomirski, Nicholas Piggin, x86
  Cc: Dave Hansen, Peter Zijlstra, Catalin Marinas, linuxppc-dev, LKML,
	stable, linux-mm, Mathieu Desnoyers, Paul Mackerras,
	Andrew Morton, Will Deacon, linux-arm-kernel
In-Reply-To: <1e248763-9372-6e4e-5dea-cda999000aeb@kernel.org>



Le 16/06/2021 à 20:52, Andy Lutomirski a écrit :
> On 6/15/21 9:45 PM, Nicholas Piggin wrote:
>> Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm:
>>> The old sync_core_before_usermode() comments suggested that a non-icache-syncing
>>> return-to-usermode instruction is x86-specific and that all other
>>> architectures automatically notice cross-modified code on return to
>>> userspace.
> 
>>> +/*
>>> + * XXX: can a powerpc person put an appropriate comment here?
>>> + */
>>> +static inline void membarrier_sync_core_before_usermode(void)
>>> +{
>>> +}
>>> +
>>> +#endif /* _ASM_POWERPC_SYNC_CORE_H */
>>
>> powerpc's can just go in asm/membarrier.h
> 
> $ ls arch/powerpc/include/asm/membarrier.h
> ls: cannot access 'arch/powerpc/include/asm/membarrier.h': No such file
> or directory

https://github.com/torvalds/linux/blob/master/arch/powerpc/include/asm/membarrier.h


Was added by https://github.com/torvalds/linux/commit/3ccfebedd8cf54e291c809c838d8ad5cc00f5688

> 
> 
>>
>> /*
>>   * The RFI family of instructions are context synchronising, and
>>   * that is how we return to userspace, so nothing is required here.
>>   */
> 
> Thanks!
> 

^ permalink raw reply

* Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation
From: Mathieu Desnoyers @ 2021-06-18 16:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Will Deacon, linux-mm, Peter Zijlstra, x86,
	linux-kernel, Nicholas Piggin, Dave Hansen, Paul Mackerras,
	stable, Andrew Morton, linuxppc-dev, linux-arm-kernel
In-Reply-To: <26196903-4aee-33c4-bed8-8bf8c7b46793@kernel.org>

----- On Jun 17, 2021, at 8:12 PM, Andy Lutomirski luto@kernel.org wrote:

> On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
> 
>> Please change back this #ifndef / #else / #endif within function for
>> 
>> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>>   ...
>> } else {
>>   ...
>> }
>> 
>> I don't think mixing up preprocessor and code logic makes it more readable.
> 
> I agree, but I don't know how to make the result work well.
> membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
> case, so either I need to fake up a definition or use #ifdef.
> 
> If I faked up a definition, I would want to assert, at build time, that
> it isn't called.  I don't think we can do:
> 
> static void membarrier_sync_core_before_usermode()
> {
>    BUILD_BUG_IF_REACHABLE();
> }

Let's look at the context here:

static void ipi_sync_core(void *info)
{
    [....]
    membarrier_sync_core_before_usermode()
}

^ this can be within #ifdef / #endif

static int membarrier_private_expedited(int flags, int cpu_id)
[...]
               if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
                        return -EINVAL;
                if (!(atomic_read(&mm->membarrier_state) &
                      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
                        return -EPERM;
                ipi_func = ipi_sync_core;

All we need to make the line above work is to define an empty ipi_sync_core
function in the #else case after the ipi_sync_core() function definition.

Or am I missing your point ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
From: Christophe Leroy @ 2021-06-18 17:13 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: Maged Michael, Dave Watson, Will Deacon, Andrew Hunter,
	David Sehr, Paul Mackerras, H . Peter Anvin, linux-arch, x86,
	Russell King, Greg Hackmann, Linus Torvalds, Alan Stern,
	Paul E . McKenney, Andrea Parri, Avi Kivity, Boqun Feng,
	Nicholas Piggin, Alexander Viro, Andy Lutomirski, linux-api,
	linux-kernel, linuxppc-dev
In-Reply-To: <20180129202020.8515-3-mathieu.desnoyers@efficios.com>



Le 29/01/2018 à 21:20, Mathieu Desnoyers a écrit :
> Allow PowerPC to skip the full memory barrier in switch_mm(), and
> only issue the barrier when scheduling into a task belonging to a
> process that has registered to use expedited private.
> 
> Threads targeting the same VM but which belong to different thread
> groups is a tricky case. It has a few consequences:
> 
> It turns out that we cannot rely on get_nr_threads(p) to count the
> number of threads using a VM. We can use
> (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
> instead to skip the synchronize_sched() for cases where the VM only has
> a single user, and that user only has a single thread.
> 
> It also turns out that we cannot use for_each_thread() to set
> thread flags in all threads using a VM, as it only iterates on the
> thread group.
> 
> Therefore, test the membarrier state variable directly rather than
> relying on thread flags. This means
> membarrier_register_private_expedited() needs to set the
> MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
> only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
> private expedited membarrier commands to succeed.
> membarrier_arch_switch_mm() now tests for the
> MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Looking at switch_mm_irqs_off(), I found it more complex than expected and found that this patch is 
the reason for that complexity.

Before the patch (ie in kernel 4.14), we have:

00000000 <switch_mm_irqs_off>:
    0:	81 24 01 c8 	lwz     r9,456(r4)
    4:	71 29 00 01 	andi.   r9,r9,1
    8:	40 82 00 1c 	bne     24 <switch_mm_irqs_off+0x24>
    c:	39 24 01 c8 	addi    r9,r4,456
   10:	39 40 00 01 	li      r10,1
   14:	7d 00 48 28 	lwarx   r8,0,r9
   18:	7d 08 53 78 	or      r8,r8,r10
   1c:	7d 00 49 2d 	stwcx.  r8,0,r9
   20:	40 c2 ff f4 	bne-    14 <switch_mm_irqs_off+0x14>
   24:	7c 04 18 40 	cmplw   r4,r3
   28:	81 24 00 24 	lwz     r9,36(r4)
   2c:	91 25 04 4c 	stw     r9,1100(r5)
   30:	4d 82 00 20 	beqlr
   34:	48 00 00 00 	b       34 <switch_mm_irqs_off+0x34>
			34: R_PPC_REL24	switch_mmu_context


After the patch (ie in 5.13-rc6), that now is:

00000000 <switch_mm_irqs_off>:
    0:	81 24 02 18 	lwz     r9,536(r4)
    4:	71 29 00 01 	andi.   r9,r9,1
    8:	41 82 00 24 	beq     2c <switch_mm_irqs_off+0x2c>
    c:	7c 04 18 40 	cmplw   r4,r3
   10:	81 24 00 24 	lwz     r9,36(r4)
   14:	91 25 04 d0 	stw     r9,1232(r5)
   18:	4d 82 00 20 	beqlr
   1c:	81 24 00 28 	lwz     r9,40(r4)
   20:	71 29 00 0a 	andi.   r9,r9,10
   24:	40 82 00 34 	bne     58 <switch_mm_irqs_off+0x58>
   28:	48 00 00 00 	b       28 <switch_mm_irqs_off+0x28>
			28: R_PPC_REL24	switch_mmu_context
   2c:	39 24 02 18 	addi    r9,r4,536
   30:	39 40 00 01 	li      r10,1
   34:	7d 00 48 28 	lwarx   r8,0,r9
   38:	7d 08 53 78 	or      r8,r8,r10
   3c:	7d 00 49 2d 	stwcx.  r8,0,r9
   40:	40 a2 ff f4 	bne     34 <switch_mm_irqs_off+0x34>
   44:	7c 04 18 40 	cmplw   r4,r3
   48:	81 24 00 24 	lwz     r9,36(r4)
   4c:	91 25 04 d0 	stw     r9,1232(r5)
   50:	4d 82 00 20 	beqlr
   54:	48 00 00 00 	b       54 <switch_mm_irqs_off+0x54>
			54: R_PPC_REL24	switch_mmu_context
   58:	2c 03 00 00 	cmpwi   r3,0
   5c:	41 82 ff cc 	beq     28 <switch_mm_irqs_off+0x28>
   60:	48 00 00 00 	b       60 <switch_mm_irqs_off+0x60>
			60: R_PPC_REL24	switch_mmu_context


Especially, the comparison of 'prev' to 0 is pointless as both cases end up with just branching to 
'switch_mmu_context'

I don't understand all that complexity to just replace a simple 'smp_mb__after_unlock_lock()'.

#define smp_mb__after_unlock_lock()	smp_mb()
#define smp_mb()	barrier()
# define barrier() __asm__ __volatile__("": : :"memory")


Am I missing some subtility ?

Thanks
Christophe

^ permalink raw reply

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
From: Mathieu Desnoyers @ 2021-06-18 17:26 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: maged michael, Peter Zijlstra, Dave Watson, Will Deacon,
	Andrew Hunter, David Sehr, Paul Mackerras, H. Peter Anvin,
	linux-arch, x86, Russell King, ARM Linux, Greg Hackmann,
	Linus Torvalds, Ingo Molnar, Alan Stern, Paul, Andrea Parri,
	Avi Kivity, Boqun Feng, Nicholas Piggin, Alexander Viro,
	Andy Lutomirski, Thomas Gleixner, linux-api, linux-kernel,
	linuxppc-dev
In-Reply-To: <8b200dd5-f37b-b208-82fb-2775df7bcd49@csgroup.eu>

----- On Jun 18, 2021, at 1:13 PM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
[...]
> 
> I don't understand all that complexity to just replace a simple
> 'smp_mb__after_unlock_lock()'.
> 
> #define smp_mb__after_unlock_lock()	smp_mb()
> #define smp_mb()	barrier()
> # define barrier() __asm__ __volatile__("": : :"memory")
> 
> 
> Am I missing some subtility ?

On powerpc CONFIG_SMP, smp_mb() is actually defined as:

#define smp_mb()        __smp_mb()
#define __smp_mb()      mb()
#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

So the original motivation here was to skip a "sync" instruction whenever
switching between threads which are part of the same process. But based on
recent discussions, I suspect my implementation may be inaccurately doing
so though.

Thanks,

Mathieu


> 
> Thanks
> Christophe

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [PATCH 01/18] mm: add a kunmap_local_dirty helper
From: Ira Weiny @ 2021-06-18 18:12 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Jens Axboe, linux-block, Thomas Bogendoerfer, Mike Snitzer,
	David S. Miller, Geoff Levand, Christoph Lameter, ceph-devel,
	linux-mips, Dongsheng Yang, linux-kernel, James E.J. Bottomley,
	dm-devel, linux-arch, Thomas Gleixner, Ilya Dryomov, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <20210618033728.GA16787@gondor.apana.org.au>

On Fri, Jun 18, 2021 at 11:37:28AM +0800, Herbert Xu wrote:
> On Thu, Jun 17, 2021 at 08:01:57PM -0700, Ira Weiny wrote:
> >
> > > +		flush_kernel_dcache_page(__page);		\
> > 
> > Is this required on 32bit systems?  Why is kunmap_flush_on_unmap() not
> > sufficient on 64bit systems?  The normal kunmap_local() path does that.
> > 
> > I'm sorry but I did not see a conclusion to my query on V1. Herbert implied the
> > he just copied from the crypto code.[1]  I'm concerned that this _dirty() call
> > is just going to confuse the users of kmap even more.  So why can't we get to
> > the bottom of why flush_kernel_dcache_page() needs so much logic around it
> > before complicating the general kernel users.
> > 
> > I would like to see it go away if possible.
> 
> This thread may be related:
> 
> https://lwn.net/Articles/240249/

Interesting!  Thanks!

Digging around a bit more I found:

https://lore.kernel.org/patchwork/patch/439637/

Auditing all the flush_dcache_page() arch code reveals that the mapping field
is either unused, or is checked for NULL.  Furthermore, all the implementations
call page_mapping_file() which further limits the page to not be a swap page.

All flush_kernel_dcache_page() implementations appears to operate the same way
in all arch's which define that call.

So I'm confident now that additional !PageSlab(__page) checks are not needed
and this patch is unnecessary.   Christoph, can we leave this out of the kmap
API and just fold the flush_kernel_dcache_page() calls back into the bvec code?

Unfortunately, I'm not convinced this can be handled completely by
kunmap_local() nor the mem*_page() calls because there is a difference between
flush_dcache_page() and flush_kernel_dcache_page() in most archs...  [parisc
being an exception which falls back to flush_kernel_dcache_page()]...

It seems like the generic unmap path _should_ be able to determine which call
to make based on the page but I'd have to look at that more.

Ira

^ permalink raw reply

* Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation
From: Andy Lutomirski @ 2021-06-18 19:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Catalin Marinas, Will Deacon, linux-mm, Peter Zijlstra (Intel),
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Nicholas Piggin, Dave Hansen, Paul Mackerras, stable,
	Andrew Morton, linuxppc-dev, linux-arm-kernel
In-Reply-To: <593983567.12619.1624033908849.JavaMail.zimbra@efficios.com>



On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
> ----- On Jun 17, 2021, at 8:12 PM, Andy Lutomirski luto@kernel.org wrote:
> 
> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
> > 
> >> Please change back this #ifndef / #else / #endif within function for
> >> 
> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
> >>   ...
> >> } else {
> >>   ...
> >> }
> >> 
> >> I don't think mixing up preprocessor and code logic makes it more readable.
> > 
> > I agree, but I don't know how to make the result work well.
> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
> > case, so either I need to fake up a definition or use #ifdef.
> > 
> > If I faked up a definition, I would want to assert, at build time, that
> > it isn't called.  I don't think we can do:
> > 
> > static void membarrier_sync_core_before_usermode()
> > {
> >    BUILD_BUG_IF_REACHABLE();
> > }
> 
> Let's look at the context here:
> 
> static void ipi_sync_core(void *info)
> {
>     [....]
>     membarrier_sync_core_before_usermode()
> }
> 
> ^ this can be within #ifdef / #endif
> 
> static int membarrier_private_expedited(int flags, int cpu_id)
> [...]
>                if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
>                         return -EINVAL;
>                 if (!(atomic_read(&mm->membarrier_state) &
>                       MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>                         return -EPERM;
>                 ipi_func = ipi_sync_core;
> 
> All we need to make the line above work is to define an empty ipi_sync_core
> function in the #else case after the ipi_sync_core() function definition.
> 
> Or am I missing your point ?

Maybe?

My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the core.  I would be fine with that if I could have the compiler statically verify that it’s not called, but I’m uncomfortable having it there if the implementation is actively incorrect.

^ permalink raw reply

* Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation
From: Mathieu Desnoyers @ 2021-06-18 20:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Catalin Marinas, Will Deacon, linux-mm, Peter Zijlstra, x86,
	linux-kernel, Nicholas Piggin, Dave Hansen, Paul Mackerras,
	stable, Andrew Morton, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1d617df2-57fa-4953-9c75-6de3909a6f14@www.fastmail.com>

----- On Jun 18, 2021, at 3:58 PM, Andy Lutomirski luto@kernel.org wrote:

> On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
>> ----- On Jun 17, 2021, at 8:12 PM, Andy Lutomirski luto@kernel.org wrote:
>> 
>> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
>> > 
>> >> Please change back this #ifndef / #else / #endif within function for
>> >> 
>> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>> >>   ...
>> >> } else {
>> >>   ...
>> >> }
>> >> 
>> >> I don't think mixing up preprocessor and code logic makes it more readable.
>> > 
>> > I agree, but I don't know how to make the result work well.
>> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
>> > case, so either I need to fake up a definition or use #ifdef.
>> > 
>> > If I faked up a definition, I would want to assert, at build time, that
>> > it isn't called.  I don't think we can do:
>> > 
>> > static void membarrier_sync_core_before_usermode()
>> > {
>> >    BUILD_BUG_IF_REACHABLE();
>> > }
>> 
>> Let's look at the context here:
>> 
>> static void ipi_sync_core(void *info)
>> {
>>     [....]
>>     membarrier_sync_core_before_usermode()
>> }
>> 
>> ^ this can be within #ifdef / #endif
>> 
>> static int membarrier_private_expedited(int flags, int cpu_id)
>> [...]
>>                if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
>>                         return -EINVAL;
>>                 if (!(atomic_read(&mm->membarrier_state) &
>>                       MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>>                         return -EPERM;
>>                 ipi_func = ipi_sync_core;
>> 
>> All we need to make the line above work is to define an empty ipi_sync_core
>> function in the #else case after the ipi_sync_core() function definition.
>> 
>> Or am I missing your point ?
> 
> Maybe?
> 
> My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the core.
> I would be fine with that if I could have the compiler statically verify that
> it’s not called, but I’m uncomfortable having it there if the implementation is
> actively incorrect.

I see. Another approach would be to implement a "setter" function to populate
"ipi_func". That setter function would return -EINVAL in its #ifndef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
implementation.

Would that be better ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Fabiano Rosas @ 2021-06-18 20:58 UTC (permalink / raw)
  To: Pratik R. Sampat, mpe, benh, paulus, linuxppc-dev, kvm-ppc,
	linux-kernel, psampat, pratik.r.sampat
In-Reply-To: <20210616134240.62195-2-psampat@linux.ibm.com>

"Pratik R. Sampat" <psampat@linux.ibm.com> writes:

> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
>
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
>
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
>   uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
>   uint64 flags,           // Per the flag request
>   uint64 firstAttributeId,// The attribute id
>   uint64 bufferAddress,   // Guest physical address of the output buffer
>   uint64 bufferSize       // The size in bytes of the output buffer
> );
>
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id
>
> The output buffer consists of the following
> 1. number of attributes              - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info                      - 1 byte
> 4. A data array of size num attributes, which contains the following:
>   a. attribute ID              - 8 bytes
>   b. attribute value in number - 8 bytes
>   c. attribute name in string  - 64 bytes
>   d. attribute value in string - 64 bytes
>
> The new H_CALL exports information in direct string value format, hence
> a new interface has been introduced in
> /sys/firmware/papr/energy_scale_info to export this information to
> userspace in an extensible pass-through format.
>
> The H_CALL returns the name, numeric value and string value (if exists)
>
> The format of exposing the sysfs information is as follows:
> /sys/firmware/papr/energy_scale_info/
>    |-- <id>/
>      |-- desc
>      |-- value
>      |-- value_desc (if exists)
>    |-- <id>/
>      |-- desc
>      |-- value
>      |-- value_desc (if exists)
> ...
>
> The energy information that is exported is useful for userspace tools
> such as powerpc-utils. Currently these tools infer the
> "power_mode_data" value in the lparcfg, which in turn is obtained from
> the to be deprecated H_GET_EM_PARMS H_CALL.
> On future platforms, such userspace utilities will have to look at the
> data returned from the new H_CALL being populated in this new sysfs
> interface and report this information directly without the need of
> interpretation.
>
> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
> ---
>  .../sysfs-firmware-papr-energy-scale-info     |  26 ++
>  arch/powerpc/include/asm/hvcall.h             |  21 +-
>  arch/powerpc/kvm/trace_hv.h                   |   1 +
>  arch/powerpc/platforms/pseries/Makefile       |   3 +-
>  .../pseries/papr_platform_attributes.c        | 292 ++++++++++++++++++
>  5 files changed, 341 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>  create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>
> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> new file mode 100644
> index 000000000000..499bc1ae173a
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> @@ -0,0 +1,26 @@
> +What:		/sys/firmware/papr/energy_scale_info
> +Date:		June 2021
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	Director hosting a set of platform attributes on Linux
> +		running as a PAPR guest.

This still refers to papr attributes. We want energy/frequency, etc. instead.

> +
> +		Each file in a directory contains a platform
> +		attribute hierarchy pertaining to performance/
> +		energy-savings mode and processor frequency.
> +
> +What:		/sys/firmware/papr/energy_scale_info/<id>
> +		/sys/firmware/papr/energy_scale_info/<id>/desc
> +		/sys/firmware/papr/energy_scale_info/<id>/value
> +		/sys/firmware/papr/energy_scale_info/<id>/value_desc
> +Date:		June 2021
> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description:	PAPR attributes directory for POWERVM servers

Same here.

> +
> +		This directory provides PAPR information. It

And here.

> +		contains below sysfs attributes:
> +
> +		- desc: File contains the name of attribute <id>

desc: String description of the attribute <id>

> +
> +		- value: Numeric value of attribute <id>
> +
> +		- value_desc: String value of attribute <id>
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index e3b29eda8074..19a2a8c77a49 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -316,7 +316,8 @@
>  #define H_SCM_PERFORMANCE_STATS 0x418
>  #define H_RPT_INVALIDATE	0x448
>  #define H_SCM_FLUSH		0x44C
> -#define MAX_HCALL_OPCODE	H_SCM_FLUSH
> +#define H_GET_ENERGY_SCALE_INFO	0x450
> +#define MAX_HCALL_OPCODE	H_GET_ENERGY_SCALE_INFO
>
>  /* Scope args for H_SCM_UNBIND_ALL */
>  #define H_UNBIND_SCOPE_ALL (0x1)
> @@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
>  	uint8_t bytes[HGPCI_MAX_DATA_BYTES];
>  } __packed;
>
> +#define MAX_EM_ATTRS	10
> +#define MAX_EM_DATA_BYTES \
> +	(sizeof(struct energy_scale_attributes) * MAX_EM_ATTRS)

EM doesn't really mean anything in the context of this code. Maybe we
could standardize the names with 'energy_scale_info' and 'esi' for
short.

> +struct energy_scale_attributes {

s/attributes/attribute/

> +	__be64 attr_id;
> +	__be64 attr_value;
> +	unsigned char attr_desc[64];
> +	unsigned char attr_value_desc[64];

These attr_ prefixes could be dropped. I will be clear from the context
where they are used.

> +} __packed;
> +
> +struct hv_energy_scale_buffer {
> +	__be64 num_attr;

s/num_attr/num_attrs/

> +	__be64 array_offset;
> +	__u8 data_header_version;
> +	unsigned char data[MAX_EM_DATA_BYTES];
> +} __packed;

Your code is correct with the current assumptions, but I think we got
the assumptions wrong, see if you agree:

My understanding of the hypercall is that it is designed around a header
+ data concept. If the header is versioned and backwards compatible,
then it means that it could increase in size. The offset to the data is
what ensures backward compatibility so that we can always access the
data, no matter what the header contains. So this leads me to conclude:

1- We should stop aborting in case the version changes. Nothing should
really break if that happens. Both the kernel and userspace would read
less data than the hypercall is returning, but that is it.

With the current design, if the hypervisor changes the header version,
the attributes stop being exposed in sysfs (because this code aborts),
which will break the userspace setup.

2- We cannot have 'data' in the struct. There is no array there. The
array is wherever the offset indicates. If the header increases in size,
then the 'data' would move forward in the buffer.

3- The concept of MAX_EM_ATTRS is misleading. It is the maximum number
of attributes that fit in the buffer *for this version of the hcall*. A
subsequent version could fit fewer attributes and our MAX_EM_ATTRS would
land us out of bounds. So the number of attributes must be defined
exclusively by num_attrs.

If we change the above, we only need to touch this code again if the
header version changes *and* we want to expose the extra information
brought by the change. An unexpected change in version should not cause
this code to fail.

So what I suggest is we keep a 'header' struct:

struct h_energy_scale_info_hdr {
    __be64 num_attrs;
    __be64 data_offset;
    __u8 version;
};

and we define an arbitrary size for the attributes array and allocate
that much more memory:

/*
 * Note: we allocate space for 10 attributes, but the HV can move the data
 * array further in the buffer, so it could return fewer attributes.
 */
#define MAX_BUF_SZ (sizeof(struct h_energy_scale_info_hdr) + \
                    sizeof(struct energy_scale_attributes) * 10)
...
em_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);



The suggestions from here on apply even if my analysis above is wrong:

> +
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_HVCALL_H */
> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
> index 830a126e095d..38cd0ed0a617 100644
> --- a/arch/powerpc/kvm/trace_hv.h
> +++ b/arch/powerpc/kvm/trace_hv.h
> @@ -115,6 +115,7 @@
>  	{H_VASI_STATE,			"H_VASI_STATE"}, \
>  	{H_ENABLE_CRQ,			"H_ENABLE_CRQ"}, \
>  	{H_GET_EM_PARMS,		"H_GET_EM_PARMS"}, \
> +	{H_GET_ENERGY_SCALE_INFO,	"H_GET_ENERGY_SCALE_INFO"}, \
>  	{H_SET_MPP,			"H_SET_MPP"}, \
>  	{H_GET_MPP,			"H_GET_MPP"}, \
>  	{H_HOME_NODE_ASSOCIATIVITY,	"H_HOME_NODE_ASSOCIATIVITY"}, \
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index c8a2b0b05ac0..d14fca89ac25 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -6,7 +6,8 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
>  			   of_helpers.o \
>  			   setup.o iommu.o event_sources.o ras.o \
>  			   firmware.o power.o dlpar.o mobility.o rng.o \
> -			   pci.o pci_dlpar.o eeh_pseries.o msi.o
> +			   pci.o pci_dlpar.o eeh_pseries.o msi.o \
> +			   papr_platform_attributes.o
>  obj-$(CONFIG_SMP)	+= smp.o
>  obj-$(CONFIG_SCANLOG)	+= scanlog.o
>  obj-$(CONFIG_KEXEC_CORE)	+= kexec.o
> diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> new file mode 100644
> index 000000000000..498c74a5e9ab
> --- /dev/null
> +++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> @@ -0,0 +1,292 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * PAPR platform energy attributes driver
> + *
> + * This driver creates a sys file at /sys/firmware/papr/ which contains
> + * files keyword - value pairs that specify energy configuration of the system.

This description needs updating.

> + *
> + * Copyright 2021 IBM Corp.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/seq_file.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/hugetlb.h>
> +#include <asm/lppaca.h>
> +#include <asm/hvcall.h>
> +#include <asm/firmware.h>
> +#include <asm/time.h>
> +#include <asm/prom.h>
> +#include <asm/vdso_datapage.h>
> +#include <asm/vio.h>
> +#include <asm/mmu.h>
> +#include <asm/machdep.h>
> +#include <asm/drmem.h>
> +
> +#include "pseries.h"
> +
> +#define MAX_ATTRS	3

It might be good to link this with ops_info size somehow to make sure we
don't update one without the other.

> +#define MAX_NAME_LEN	16
> +
> +struct papr_attr {
> +	u64 id;
> +	struct kobj_attribute attr;

We have attr everywhere. I would use kobj_attr here to improve
readability.

> +};
> +struct papr_group {
> +	char name[MAX_NAME_LEN];
> +	struct attribute_group pg;
> +	struct papr_attr *pgattrs;
> +} *pgs;
> +
> +struct kobject *papr_kobj;
> +struct kobject *escale_kobj;

Nitpick: esi_kobj

> +struct hv_energy_scale_buffer *em_buf;

Could replace the "em" here.

> +struct energy_scale_attributes *ea;

Nitpick: esi_attrs.

> +
> +static ssize_t papr_show_desc(struct kobject *kobj,
> +			       struct kobj_attribute *attr,
> +			       char *buf)
> +{
> +	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> +	int idx, ret = 0;
> +
> +	/*
> +	 * We do not expect the name to change, hence use the old value
> +	 * and save a HCALL
> +	 */
> +	for (idx = 0; idx < be64_to_cpu(em_buf->num_attr); idx++) {
> +		if (pattr->id == be64_to_cpu(ea[idx].attr_id)) {
> +			ret = sprintf(buf, "%s\n", ea[idx].attr_desc);
> +			if (ret < 0)
> +				ret = -EIO;
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static ssize_t papr_show_value(struct kobject *kobj,
> +				struct kobj_attribute *attr,
> +				char *buf)
> +{
> +	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> +	struct hv_energy_scale_buffer *t_buf;
> +	struct energy_scale_attributes *t_ea;
> +	int data_offset, ret = 0;
> +
> +	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> +	if (t_buf == NULL)
> +		return -ENOMEM;
> +
> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +				 pattr->id, virt_to_phys(t_buf),
> +				 sizeof(*t_buf));
> +
> +	if (ret != H_SUCCESS) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}
> +
> +	data_offset = be64_to_cpu(t_buf->array_offset) -
> +			(sizeof(t_buf->num_attr) +
> +			sizeof(t_buf->array_offset) +
> +			sizeof(t_buf->data_header_version));
> +
> +	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];

t_ea = (struct energy_scale_attributes *)(t_buf + be64_to_cpu(t_buf->array_offset));

Right? If array_offset "points" to data, then data_offset would always
be 0. So there is no point doing this arithmetic.

> +
> +	ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
> +	if (ret < 0)
> +		ret = -EIO;
> +out:
> +	kfree(t_buf);
> +
> +	return ret;
> +}
> +
> +static ssize_t papr_show_value_desc(struct kobject *kobj,
> +				     struct kobj_attribute *attr,
> +				     char *buf)
> +{
> +	struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> +	struct hv_energy_scale_buffer *t_buf;
> +	struct energy_scale_attributes *t_ea;
> +	int data_offset, ret = 0;
> +
> +	t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> +	if (t_buf == NULL)
> +		return -ENOMEM;
> +
> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +				 pattr->id, virt_to_phys(t_buf),
> +				 sizeof(*t_buf));
> +
> +	if (ret != H_SUCCESS) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}
> +
> +	data_offset = be64_to_cpu(t_buf->array_offset) -
> +			(sizeof(t_buf->num_attr) +
> +			sizeof(t_buf->array_offset) +
> +			sizeof(t_buf->data_header_version));
> +
> +	t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> +	ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
> +	if (ret < 0)
> +		ret = -EIO;
> +out:
> +	kfree(t_buf);
> +
> +	return ret;
> +}
> +
> +static struct papr_ops_info {
> +	const char *attr_name;
> +	ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
> +			char *buf);
> +} ops_info[] = {
> +	{ "desc", papr_show_desc },
> +	{ "value", papr_show_value },
> +	{ "value_desc", papr_show_value_desc },
> +};
> +
> +static void add_attr(u64 id, int index, struct papr_attr *attr)
> +{
> +	attr->id = id;
> +	sysfs_attr_init(&attr->attr.attr);
> +	attr->attr.attr.name = ops_info[index].attr_name;
> +	attr->attr.attr.mode = 0444;
> +	attr->attr.show = ops_info[index].show;
> +}
> +
> +static int add_attr_group(u64 id, int len, struct papr_group *pg,
> +			  bool show_val_desc)
> +{
> +	int i;
> +
> +	for (i = 0; i < len; i++) {
> +		if (!strcmp(ops_info[i].attr_name, "value_desc") &&
> +		    !show_val_desc) {
> +			continue;
> +		}
> +		add_attr(id, i, &pg->pgattrs[i]);
> +		pg->pg.attrs[i] = &pg->pgattrs[i].attr.attr;
> +	}
> +
> +	return sysfs_create_group(escale_kobj, &pg->pg);
> +}
> +
> +
> +static int __init papr_init(void)
> +{
> +	uint64_t num_attr;
> +	int ret, idx, i, data_offset;
> +
> +	em_buf = kmalloc(sizeof(*em_buf), GFP_KERNEL);
> +	if (em_buf == NULL)
> +		return -ENOMEM;
> +	/*
> +	 * hcall(
> +	 * uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
> +	 * uint64 flags,            // Per the flag request
> +	 * uint64 firstAttributeId, // The attribute id
> +	 * uint64 bufferAddress,    // Guest physical address of the output buffer
> +	 * uint64 bufferSize);      // The size in bytes of the output buffer
> +	 */

Since the flags are well defined, we could have:
#define ESI_FLAGS_ALL PPC_BIT(0)
#define ESI_FLAGS_SINGLE PPC_BIT(1)

I assume the bits are IBM-order. But you get my point...

> +	ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0, 0,
> +				 virt_to_phys(em_buf), sizeof(*em_buf));
> +
> +	if (!firmware_has_feature(FW_FEATURE_LPAR) || ret != H_SUCCESS ||
> +	    em_buf->data_header_version != 0x1) {
> +		pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> +		goto out;
> +	}

I'd make the FW_FEATURE_LPAR check an early return at the start of the
function.

> +
> +	num_attr = be64_to_cpu(em_buf->num_attr);
> +
> +	/*
> +	 * Typecast the energy buffer to the attribute structure at the offset
> +	 * specified in the buffer
> +	 */
> +	data_offset = be64_to_cpu(em_buf->array_offset) -
> +			(sizeof(em_buf->num_attr) +
> +			sizeof(em_buf->array_offset) +
> +			sizeof(em_buf->data_header_version));
> +
> +	ea = (struct energy_scale_attributes *) &em_buf->data[data_offset];
> +
> +	pgs = kcalloc(num_attr, sizeof(*pgs), GFP_KERNEL);
> +	if (!pgs)
> +		goto out_pgs;
> +
> +	papr_kobj = kobject_create_and_add("papr", firmware_kobj);
> +	if (!papr_kobj) {
> +		pr_warn("kobject_create_and_add papr failed\n");
> +		goto out_kobj;
> +	}
> +
> +	escale_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
> +	if (!escale_kobj) {
> +		pr_warn("kobject_create_and_add energy_scale_info failed\n");
> +		goto out_ekobj;
> +	}
> +
> +	for (idx = 0; idx < num_attr; idx++) {
> +		char buf[4];
> +		bool show_val_desc = true;
> +
> +		pgs[idx].pgattrs = kcalloc(MAX_ATTRS,
> +					   sizeof(*pgs[idx].pgattrs),
> +					   GFP_KERNEL);
> +		if (!pgs[idx].pgattrs)
> +			goto out_kobj;
> +
> +		pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
> +					    sizeof(*pgs[idx].pg.attrs),
> +					    GFP_KERNEL);
> +		if (!pgs[idx].pg.attrs) {
> +			kfree(pgs[idx].pgattrs);
> +			goto out_kobj;
> +		}
> +
> +		sprintf(buf, "%lld", be64_to_cpu(ea[idx].attr_id));
> +		pgs[idx].pg.name = buf;
> +
> +		/* Do not add the value description if it does not exist */
> +		if (strlen(ea[idx].attr_value_desc) == 0)
> +			show_val_desc = false;
> +
> +		if (add_attr_group(be64_to_cpu(ea[idx].attr_id),
> +				   MAX_ATTRS, &pgs[idx], show_val_desc)) {
> +			pr_warn("Failed to create papr attribute group %s\n",
> +				pgs[idx].pg.name);
> +			goto out_pgattrs;
> +		}
> +	}
> +
> +	return 0;
> +
> +out_pgattrs:
> +	for (i = 0; i < MAX_ATTRS; i++) {
> +		kfree(pgs[i].pgattrs);
> +		kfree(pgs[i].pg.attrs);
> +	}
> +out_ekobj:
> +	kobject_put(escale_kobj);
> +out_kobj:
> +	kobject_put(papr_kobj);
> +out_pgs:
> +	kfree(pgs);
> +out:
> +	kfree(em_buf);
> +
> +	return -ENOMEM;
> +}
> +
> +machine_device_initcall(pseries, papr_init);

^ permalink raw reply

* Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks
From: Dan Williams @ 2021-06-18 22:18 UTC (permalink / raw)
  To: Ondrej Mosnacek
  Cc: linux-efi, Linux PCI, linux-cxl, Steffen Klassert, Herbert Xu,
	X86 ML, James Morris, Linux ACPI, Ingo Molnar, linux-serial,
	Linux-pm mailing list, selinux, Steven Rostedt, Casey Schaufler,
	Paul Moore, Netdev, Stephen Smalley, Kexec Mailing List,
	Linux Kernel Mailing List, linux-security-module, linux-fsdevel,
	bpf, linuxppc-dev, David S . Miller
In-Reply-To: <20210616085118.1141101-1-omosnace@redhat.com>

On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> lockdown") added an implementation of the locked_down LSM hook to
> SELinux, with the aim to restrict which domains are allowed to perform
> operations that would breach lockdown.
>
> However, in several places the security_locked_down() hook is called in
> situations where the current task isn't doing any action that would
> directly breach lockdown, leading to SELinux checks that are basically
> bogus.
>
> To fix this, add an explicit struct cred pointer argument to
> security_lockdown() and define NULL as a special value to pass instead
> of current_cred() in such situations. LSMs that take the subject
> credentials into account can then fall back to some default or ignore
> such calls altogether. In the SELinux lockdown hook implementation, use
> SECINITSID_KERNEL in case the cred argument is NULL.
>
> Most of the callers are updated to pass current_cred() as the cred
> pointer, thus maintaining the same behavior. The following callers are
> modified to pass NULL as the cred pointer instead:
> 1. arch/powerpc/xmon/xmon.c
>      Seems to be some interactive debugging facility. It appears that
>      the lockdown hook is called from interrupt context here, so it
>      should be more appropriate to request a global lockdown decision.
> 2. fs/tracefs/inode.c:tracefs_create_file()
>      Here the call is used to prevent creating new tracefs entries when
>      the kernel is locked down. Assumes that locking down is one-way -
>      i.e. if the hook returns non-zero once, it will never return zero
>      again, thus no point in creating these files. Also, the hook is
>      often called by a module's init function when it is loaded by
>      userspace, where it doesn't make much sense to do a check against
>      the current task's creds, since the task itself doesn't actually
>      use the tracing functionality (i.e. doesn't breach lockdown), just
>      indirectly makes some new tracepoints available to whoever is
>      authorized to use them.
> 3. net/xfrm/xfrm_user.c:copy_to_user_*()
>      Here a cryptographic secret is redacted based on the value returned
>      from the hook. There are two possible actions that may lead here:
>      a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
>         task context is relevant, since the dumped data is sent back to
>         the current task.
>      b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
>         dumped SA is broadcasted to tasks subscribed to XFRM events -
>         here the current task context is not relevant as it doesn't
>         represent the tasks that could potentially see the secret.
>      It doesn't seem worth it to try to keep using the current task's
>      context in the a) case, since the eventual data leak can be
>      circumvented anyway via b), plus there is no way for the task to
>      indicate that it doesn't care about the actual key value, so the
>      check could generate a lot of "false alert" denials with SELinux.
>      Thus, let's pass NULL instead of current_cred() here faute de
>      mieux.
>
> Improvements-suggested-by: Casey Schaufler <casey@schaufler-ca.com>
> Improvements-suggested-by: Paul Moore <paul@paul-moore.com>
> Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[..]
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2acc6173da36..c1747b6555c7 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
>         if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
>                 return false;
>
> -       if (security_locked_down(LOCKDOWN_NONE))
> +       if (security_locked_down(current_cred(), LOCKDOWN_NONE))

Acked-by: Dan Williams <dan.j.williams@intel.com>

...however that usage looks wrong. The expectation is that if kernel
integrity protections are enabled then raw command access should be
disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
in terms of the command capabilities to filter.

^ permalink raw reply

* Re: [PATCH v4 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
From: Leonardo Brás @ 2021-06-18 22:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Joel Stanley, Christophe Leroy, Nicolin Chen,
	Niklas Schnelle
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <e4984fa0-2afe-a987-4fb8-61b878431b1f@ozlabs.ru>

Hello Alexey, thanks for this feedback!

On Mon, 2021-05-10 at 17:34 +1000, Alexey Kardashevskiy wrote:
> 
> 
> This does not apply anymore as it conflicts with my 4be518d838809e2135.

ok, rebasing on top of torvalds/master

> 
> 
> > ---
> >   arch/powerpc/platforms/pseries/iommu.c | 100 ++++++++++++----------
> > ---
> >   1 file changed, 50 insertions(+), 50 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 5a70ecd579b8..89cb6e9e9f31 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -53,6 +53,11 @@ enum {
> >         DDW_EXT_QUERY_OUT_SIZE = 2
> >   };
> >   
> > +#ifdef CONFIG_IOMMU_API
> > +static int tce_exchange_pseries(struct iommu_table *tbl, long index,
> > unsigned long *tce,
> > +                               enum dma_data_direction *direction,
> > bool realmode);
> > +#endif
> 
> 
> Instead of declaring this so far from the the code which needs it, may 
> be add
> 
> struct iommu_table_ops iommu_table_lpar_multi_ops;
> 
> right before iommu_table_setparms() (as the sctruct is what you
> actually 
> want there),
>  and you won't need to move iommu_table_pseries_ops as well.

Oh, I was not aware I could declare a variable and then define it
statically. 
I mean, it does make sense, but I never thought of that.

I will change that :)

> 
> 
> > +
> >   static struct iommu_table *iommu_pseries_alloc_table(int node)
> >   {
> >         struct iommu_table *tbl;
> > @@ -501,6 +506,28 @@ static int
> > tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
> >         return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
> >   }
> >   
> > +static inline void _iommu_table_setparms(struct iommu_table *tbl,
> > unsigned long busno,
> 
> 
> The underscore is confusing, may be iommu_table_do_setparms()? 
> iommu_table_setparms_common()? Not sure. I cannot recall a single 
> function with just one leading underscore, I suspect I was pushed back 
> when I tried adding one ages ago :) "inline" seems excessive, the 
> compiler will probably figure it out anyway.
> 
> 

sure, done.


> 
> > +                                        unsigned long liobn,
> > unsigned long win_addr,
> > +                                        unsigned long window_size,
> > unsigned long page_shift,
> > +                                        unsigned long base, struct
> > iommu_table_ops *table_ops)
> 
> 
> Make "base" a pointer. Or, better, just keep setting it directly in 
> iommu_table_setparms() rather than passing 0 around.
> 
> The same comment about "liobn" - set it in iommu_table_setparms_lpar().
> The reviewer will see what field atters in what situation imho.
> 

The idea here was to keep all tbl parameters setting in
_iommu_table_setparms (or iommu_table_setparms_common).

I understand the idea that each one of those is optional in the other
case, but should we keep whatever value is present in the other
variable (not zeroing the other variable), or do someting like:

tbl->it_index = 0;
tbl->it_base = basep;
(in iommu_table_setparms)

tbl->it_index = liobn;
tbl->it_base = 0;
(in iommu_table_setparms_lpar)


> 
> > +{
> > +       tbl->it_busno = busno;
> > +       tbl->it_index = liobn;
> > +       tbl->it_offset = win_addr >> page_shift;
> > +       tbl->it_size = window_size >> page_shift;
> > +       tbl->it_page_shift = page_shift;
> > +       tbl->it_base = base;
> > +       tbl->it_blocksize = 16;
> > +       tbl->it_type = TCE_PCI;
> > +       tbl->it_ops = table_ops;
> > +}
> > +
> > +struct iommu_table_ops iommu_table_pseries_ops = {
> > +       .set = tce_build_pSeries,
> > +       .clear = tce_free_pSeries,
> > +       .get = tce_get_pseries
> > +};
> > +
> >   static void iommu_table_setparms(struct pci_controller *phb,
> >                                  struct device_node *dn,
> >                                  struct iommu_table *tbl)
> > @@ -509,8 +536,13 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> >         const unsigned long *basep;
> >         const u32 *sizep;
> >   
> > -       node = phb->dn;
> > +       /* Test if we are going over 2GB of DMA space */
> > +       if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G)
> > {
> > +               udbg_printf("PCI_DMA: Unexpected number of IOAs under
> > this PHB.\n");
> > +               panic("PCI_DMA: Unexpected number of IOAs under this
> > PHB.\n");
> > +       }
> >   
> > +       node = phb->dn;
> >         basep = of_get_property(node, "linux,tce-base", NULL);
> >         sizep = of_get_property(node, "linux,tce-size", NULL);
> >         if (basep == NULL || sizep == NULL) {
> > @@ -519,33 +551,25 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> >                 return;
> >         }
> >   
> > -       tbl->it_base = (unsigned long)__va(*basep);
> > +       _iommu_table_setparms(tbl, phb->bus->number, 0, phb-
> > >dma_window_base_cur,
> > +                             phb->dma_window_size,
> > IOMMU_PAGE_SHIFT_4K,
> > +                             (unsigned long)__va(*basep),
> > &iommu_table_pseries_ops);
> >   
> >         if (!is_kdump_kernel())
> >                 memset((void *)tbl->it_base, 0, *sizep);
> >   
> > -       tbl->it_busno = phb->bus->number;
> > -       tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > -
> > -       /* Units of tce entries */
> > -       tbl->it_offset = phb->dma_window_base_cur >> tbl-
> > >it_page_shift;
> > -
> > -       /* Test if we are going over 2GB of DMA space */
> > -       if (phb->dma_window_base_cur + phb->dma_window_size >
> > 0x80000000ul) {
> > -               udbg_printf("PCI_DMA: Unexpected number of IOAs under
> > this PHB.\n");
> > -               panic("PCI_DMA: Unexpected number of IOAs under this
> > PHB.\n");
> > -       }
> > -
> >         phb->dma_window_base_cur += phb->dma_window_size;
> > -
> > -       /* Set the tce table size - measured in entries */
> > -       tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> > -
> > -       tbl->it_index = 0;
> > -       tbl->it_blocksize = 16;
> > -       tbl->it_type = TCE_PCI;
> >   }
> >   
> > +struct iommu_table_ops iommu_table_lpar_multi_ops = {
> > +       .set = tce_buildmulti_pSeriesLP,
> > +#ifdef CONFIG_IOMMU_API
> > +       .xchg_no_kill = tce_exchange_pseries,
> > +#endif
> > +       .clear = tce_freemulti_pSeriesLP,
> > +       .get = tce_get_pSeriesLP
> > +};
> > +
> >   /*
> >    * iommu_table_setparms_lpar
> >    *
> > @@ -557,28 +581,17 @@ static void iommu_table_setparms_lpar(struct
> > pci_controller *phb,
> >                                       struct iommu_table_group
> > *table_group,
> >                                       const __be32 *dma_window)
> >   {
> > -       unsigned long offset, size;
> > +       unsigned long offset, size, liobn;
> >   
> > -       of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset,
> > &size);
> > +       of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
> >   
> > -       tbl->it_busno = phb->bus->number;
> > -       tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > -       tbl->it_base   = 0;
> > -       tbl->it_blocksize  = 16;
> > -       tbl->it_type = TCE_PCI;
> > -       tbl->it_offset = offset >> tbl->it_page_shift;
> > -       tbl->it_size = size >> tbl->it_page_shift;
> > +       _iommu_table_setparms(tbl, phb->bus->number, liobn, offset,
> > size, IOMMU_PAGE_SHIFT_4K, 0,
> > +                             &iommu_table_lpar_multi_ops);
> >   
> >         table_group->tce32_start = offset;
> >         table_group->tce32_size = size;
> >   }
> >   
> > -struct iommu_table_ops iommu_table_pseries_ops = {
> > -       .set = tce_build_pSeries,
> > -       .clear = tce_free_pSeries,
> > -       .get = tce_get_pseries
> > -};
> > -
> >   static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
> >   {
> >         struct device_node *dn;
> > @@ -647,7 +660,6 @@ static void pci_dma_bus_setup_pSeries(struct
> > pci_bus *bus)
> >         tbl = pci->table_group->tables[0];
> >   
> >         iommu_table_setparms(pci->phb, dn, tbl);
> > -       tbl->it_ops = &iommu_table_pseries_ops;
> >         iommu_init_table(tbl, pci->phb->node, 0, 0);
> >   
> >         /* Divide the rest (1.75GB) among the children */
> > @@ -664,7 +676,7 @@ static int tce_exchange_pseries(struct
> > iommu_table *tbl, long index, unsigned
> >                                 bool realmode)
> >   {
> >         long rc;
> > -       unsigned long ioba = (unsigned long) index << tbl-
> > >it_page_shift;
> > +       unsigned long ioba = (unsigned long)index << tbl-
> > >it_page_shift;
> 
> 
> Unrelated change, why, did checkpatch.pl complain?

My bad, this one could pass my git-add unnoticed.
Reverting.

> 
> >         unsigned long flags, oldtce = 0;
> >         u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> >         unsigned long newtce = *tce | proto_tce;
> > @@ -686,15 +698,6 @@ static int tce_exchange_pseries(struct
> > iommu_table *tbl, long index, unsigned
> >   }
> >   #endif
> >   
> > -struct iommu_table_ops iommu_table_lpar_multi_ops = {
> > -       .set = tce_buildmulti_pSeriesLP,
> > -#ifdef CONFIG_IOMMU_API
> > -       .xchg_no_kill = tce_exchange_pseries,
> > -#endif
> > -       .clear = tce_freemulti_pSeriesLP,
> > -       .get = tce_get_pSeriesLP
> > -};
> > -
> >   static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> >   {
> >         struct iommu_table *tbl;
> > @@ -729,7 +732,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct
> > pci_bus *bus)
> >                 tbl = ppci->table_group->tables[0];
> >                 iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
> >                                 ppci->table_group, dma_window);
> > -               tbl->it_ops = &iommu_table_lpar_multi_ops;
> >                 iommu_init_table(tbl, ppci->phb->node, 0, 0);
> >                 iommu_register_group(ppci->table_group,
> >                                 pci_domain_nr(bus), 0);
> > @@ -758,7 +760,6 @@ static void pci_dma_dev_setup_pSeries(struct
> > pci_dev *dev)
> >                 PCI_DN(dn)->table_group =
> > iommu_pseries_alloc_group(phb->node);
> >                 tbl = PCI_DN(dn)->table_group->tables[0];
> >                 iommu_table_setparms(phb, dn, tbl);
> > -               tbl->it_ops = &iommu_table_pseries_ops;
> >                 iommu_init_table(tbl, phb->node, 0, 0);
> >                 set_iommu_table_base(&dev->dev, tbl);
> >                 return;
> > @@ -1436,7 +1437,6 @@ static void
> > pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> >                 tbl = pci->table_group->tables[0];
> >                 iommu_table_setparms_lpar(pci->phb, pdn, tbl,
> >                                 pci->table_group, dma_window);
> > -               tbl->it_ops = &iommu_table_lpar_multi_ops;
> >                 iommu_init_table(tbl, pci->phb->node, 0, 0);
> >                 iommu_register_group(pci->table_group,
> >                                 pci_domain_nr(pci->phb->bus), 0);
> > 
> 

Best regards,
Leonardo Bras


^ permalink raw reply

* [Bug 213069] kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:147! Oops: Exception in kernel mode, sig: 5 [#1]
From: bugzilla-daemon @ 2021-06-18 23:18 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-213069-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=213069

Erhard F. (erhard_f@mailbox.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |PATCH_ALREADY_AVAILABLE

--- Comment #5 from Erhard F. (erhard_f@mailbox.org) ---
Works now with Anshuman's patch applied:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c0dd65c2e295762fc5ebd11f8da3ac33f1d30788

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox