LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 4/6] PCI: Provide wrapper to access a pci_dev's bound driver
From: Andy Shevchenko @ 2021-08-03 14:37 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Mark Rutland, Giovanni Cabiddu, Rafał Miłecki,
	Peter Zijlstra, linux-pci, Alexander Duyck, Sathya Prakash,
	oss-drivers, Paul Mackerras, H. Peter Anvin, Jiri Olsa,
	Boris Ostrovsky, linux-perf-users, Stefano Stabellini, Herbert Xu,
	linux-scsi, Ido Schimmel, x86, qat-linux, Alexander Shishkin,
	Ingo Molnar, linux-wireless, Jakub Kicinski, Mathias Nyman,
	Yisen Zhuang, Fiona Trahe, Andrew Donnellan, Arnd Bergmann,
	Konrad Rzeszutek Wilk, Suganath Prabu Subramani, Simon Horman,
	Arnaldo Carvalho de Melo, Borislav Petkov, Michael Buesch,
	Jiri Pirko, Bjorn Helgaas, Namhyung Kim, Thomas Gleixner,
	Juergen Gross, Salil Mehta, Sreekanth Reddy, xen-devel,
	Vadym Kochan, MPT-FusionLinux.pdl, Greg Kroah-Hartman, linux-usb,
	Wojciech Ziemba, linux-kernel, Taras Chornyi, Zhou Wang,
	linux-crypto, kernel, netdev, Frederic Barrat,
	Oliver O'Halloran, linuxppc-dev, David S. Miller
In-Reply-To: <20210803100150.1543597-5-u.kleine-koenig@pengutronix.de>

On Tue, Aug 03, 2021 at 12:01:48PM +0200, Uwe Kleine-König wrote:
> Which driver a device is bound to is available twice: In struct
> pci_dev::dev->driver and in struct pci_dev::driver. To get rid of the
> duplication introduce a wrapper to access struct pci_dev's driver
> member. Once all users are converted the wrapper can be changed to
> calculate the driver using pci_dev::dev->driver.

...

>  #define	to_pci_driver(drv) container_of(drv, struct pci_driver, driver)
> +#define pci_driver_of_dev(pdev) ((pdev)->driver)

Seems like above is (mis)using TAB instead of space after #define. Not sure if
it's good to have them different.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 5/6] PCI: Adapt all code locations to not use struct pci_dev::driver directly
From: Andy Shevchenko @ 2021-08-03 14:38 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Mark Rutland, Giovanni Cabiddu, Rafał Miłecki,
	Peter Zijlstra, linux-pci, Alexander Duyck, Sathya Prakash,
	oss-drivers, Paul Mackerras, H. Peter Anvin, Jiri Olsa,
	Boris Ostrovsky, linux-perf-users, Stefano Stabellini, Herbert Xu,
	linux-scsi, Ido Schimmel, x86, qat-linux, Alexander Shishkin,
	Ingo Molnar, linux-wireless, Jakub Kicinski, Mathias Nyman,
	Yisen Zhuang, Fiona Trahe, Andrew Donnellan, Arnd Bergmann,
	Konrad Rzeszutek Wilk, Suganath Prabu Subramani, Simon Horman,
	Arnaldo Carvalho de Melo, Borislav Petkov, Michael Buesch,
	Jiri Pirko, Bjorn Helgaas, Namhyung Kim, Thomas Gleixner,
	Juergen Gross, Salil Mehta, Sreekanth Reddy, xen-devel,
	Vadym Kochan, MPT-FusionLinux.pdl, Greg Kroah-Hartman, linux-usb,
	Wojciech Ziemba, linux-kernel, Taras Chornyi, Zhou Wang,
	linux-crypto, kernel, netdev, Frederic Barrat,
	Oliver O'Halloran, linuxppc-dev, David S. Miller
In-Reply-To: <20210803100150.1543597-6-u.kleine-koenig@pengutronix.de>

On Tue, Aug 03, 2021 at 12:01:49PM +0200, Uwe Kleine-König wrote:
> This prepares removing the driver member of struct pci_dev which holds the
> same information than struct pci_dev::dev->driver.

...

> +	struct pci_driver *pdrv;

Missed blank line here and everywhere else. I don't remember if it's a
checkpatch who complains on this.

> +	return (pdev && (pdrv = pci_driver_of_dev(pdev))) ? pdrv->name : "<null>";

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 5/6] PCI: Adapt all code locations to not use struct pci_dev::driver directly
From: Ido Schimmel @ 2021-08-03 14:46 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Mark Rutland, Giovanni Cabiddu, Rafał Miłecki,
	Peter Zijlstra, linux-pci, Alexander Duyck, Sathya Prakash,
	oss-drivers, Paul Mackerras, H. Peter Anvin, Jiri Olsa,
	Boris Ostrovsky, linux-perf-users, Stefano Stabellini, Herbert Xu,
	linux-scsi, Ido Schimmel, x86, qat-linux, Alexander Shishkin,
	Ingo Molnar, linux-wireless, Jakub Kicinski, Mathias Nyman,
	Yisen Zhuang, Fiona Trahe, Andrew Donnellan, Arnd Bergmann,
	Konrad Rzeszutek Wilk, Suganath Prabu Subramani, Simon Horman,
	Arnaldo Carvalho de Melo, Borislav Petkov, Michael Buesch,
	Jiri Pirko, Bjorn Helgaas, Namhyung Kim, Thomas Gleixner,
	Andy Shevchenko, Juergen Gross, Salil Mehta, Sreekanth Reddy,
	xen-devel, Vadym Kochan, MPT-FusionLinux.pdl, Greg Kroah-Hartman,
	linux-usb, Wojciech Ziemba, linux-kernel, Taras Chornyi,
	Zhou Wang, linux-crypto, kernel, netdev, Frederic Barrat,
	Oliver O'Halloran, linuxppc-dev, David S. Miller
In-Reply-To: <20210803100150.1543597-6-u.kleine-koenig@pengutronix.de>

On Tue, Aug 03, 2021 at 12:01:49PM +0200, Uwe Kleine-König wrote:
> This prepares removing the driver member of struct pci_dev which holds the
> same information than struct pci_dev::dev->driver.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  arch/powerpc/include/asm/ppc-pci.h            |  3 +-
>  arch/powerpc/kernel/eeh_driver.c              | 12 ++++---
>  arch/x86/events/intel/uncore.c                |  2 +-
>  arch/x86/kernel/probe_roms.c                  |  2 +-
>  drivers/bcma/host_pci.c                       |  6 ++--
>  drivers/crypto/hisilicon/qm.c                 |  2 +-
>  drivers/crypto/qat/qat_common/adf_aer.c       |  2 +-
>  drivers/message/fusion/mptbase.c              |  4 +--
>  drivers/misc/cxl/guest.c                      | 21 +++++------
>  drivers/misc/cxl/pci.c                        | 25 +++++++------
>  .../ethernet/hisilicon/hns3/hns3_ethtool.c    |  2 +-
>  .../ethernet/marvell/prestera/prestera_pci.c  |  2 +-
>  drivers/net/ethernet/mellanox/mlxsw/pci.c     |  2 +-
>  .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  2 +-
>  drivers/pci/iov.c                             | 23 +++++++-----
>  drivers/pci/pci-driver.c                      | 28 ++++++++-------
>  drivers/pci/pci.c                             | 10 +++---
>  drivers/pci/pcie/err.c                        | 35 ++++++++++---------
>  drivers/pci/xen-pcifront.c                    |  3 +-
>  drivers/ssb/pcihost_wrapper.c                 |  7 ++--
>  drivers/usb/host/xhci-pci.c                   |  3 +-
>  21 files changed, 112 insertions(+), 84 deletions(-)

For mlxsw:

Tested-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH 00/38] Replace deprecated CPU-hotplug
From: Hans de Goede @ 2021-08-03 15:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: Juri Lelli, Ben Segall, Paul Mackerras, Pavel Machek, linux-acpi,
	Mel Gorman, Guenter Roeck, Petr Mladek, Jean Delvare, linux-pm,
	Arnaldo Carvalho de Melo, Pekka Paalanen, Andy Lutomirski, tglx,
	Dietmar Eggemann, Greg Kroah-Hartman, Rafael J. Wysocki,
	Karol Herbst, linux-crypto, Andrew Morton, Mark Rutland,
	linux-doc, nouveau, Dave Hansen, virtualization, Ingo Molnar,
	linux-s390, Davidlohr Bueso, Daniel Lezcano, Julian Wiedmann,
	Daniel Jordan, Gonglei, Len Brown, Mike Travis, coresight,
	kvm-ppc, John Stultz, cgroups, linux-arm-kernel, Mathieu Poirier,
	Daniel Bristot de Oliveira, Alexander Shishkin, Jason Wang,
	Amit Kucheria, Lai Jiangshan, platform-driver-x86, Joel Fernandes,
	Mark Gross, Jonathan Corbet, Michael S. Tsirkin,
	Christian Borntraeger, Arnd Bergmann, Jiri Kosina, Josh Triplett,
	rcu, Mathieu Desnoyers, linux-hwmon, David S. Miller, Len Brown,
	Peter Zijlstra, linux-mm, H. Peter Anvin, live-patching,
	Miroslav Benes, Jiri Olsa, Herbert Xu, x86, Ingo Molnar,
	Jakub Kicinski, Mike Leach, Paul E. McKenney, Heiko Carstens,
	Johannes Weiner, linux-raid, Joe Lawrence, Josh Poimboeuf,
	Namhyung Kim, linux-edac, netdev, linux-mips, Leo Yan,
	Borislav Petkov, linuxppc-dev, Karsten Graul
In-Reply-To: <20210803141621.780504-1-bigeasy@linutronix.de>

Hi Sebastien,

On 8/3/21 4:15 PM, Sebastian Andrzej Siewior wrote:
> This is a tree wide replacement of the deprecated CPU hotplug functions
> which are only wrappers around the actual functions.
> 
> Each patch is independent and can be picked up by the relevant maintainer.

Ok; and I take it that then also is the plan for merging these ?

FWIW I'm fine with the drivers/platform/x86 patch going upstream
through some other tree if its easier to keep the set together ...

Regards,

Hans



> 
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Amit Kucheria <amitk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: cgroups@vger.kernel.org
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: coresight@lists.linaro.org
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Gonglei <arei.gonglei@huawei.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Hans de Goede <hdegoede@redhat.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Jean Delvare <jdelvare@suse.com>
> Cc: Jiri Kosina <jikos@kernel.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Joe Lawrence <joe.lawrence@redhat.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Julian Wiedmann <jwi@linux.ibm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Karol Herbst <karolherbst@gmail.com>
> Cc: Karsten Graul <kgraul@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: linux-acpi@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-edac@vger.kernel.org
> Cc: linux-hwmon@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mips@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-pm@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-raid@vger.kernel.org
> Cc: linux-s390@vger.kernel.org
> Cc: live-patching@vger.kernel.org
> Cc: Mark Gross <mgross@linux.intel.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Mike Leach <mike.leach@linaro.org>
> Cc: Mike Travis <mike.travis@hpe.com>
> Cc: Miroslav Benes <mbenes@suse.cz>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: netdev@vger.kernel.org
> Cc: nouveau@lists.freedesktop.org
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Pekka Paalanen <ppaalanen@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: platform-driver-x86@vger.kernel.org
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: rcu@vger.kernel.org
> Cc: Robin Holt <robinmholt@gmail.com>
> Cc: Song Liu <song@kernel.org>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Steve Wahl <steve.wahl@hpe.com>
> Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: virtualization@lists.linux-foundation.org
> Cc: x86@kernel.org
> Cc: Zefan Li <lizefan.x@bytedance.com>
> Cc: Zhang Rui <rui.zhang@intel.com>
> 
> Sebastian
> 


^ permalink raw reply

* Re: [PATCH 00/38] Replace deprecated CPU-hotplug
From: Sebastian Andrzej Siewior @ 2021-08-03 16:10 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Juri Lelli, Ben Segall, Paul Mackerras, Pavel Machek, linux-acpi,
	Mel Gorman, Guenter Roeck, Petr Mladek, Jean Delvare, linux-pm,
	Arnaldo Carvalho de Melo, Pekka Paalanen, Andy Lutomirski, tglx,
	Dietmar Eggemann, Greg Kroah-Hartman, Rafael J. Wysocki,
	Karol Herbst, linux-kernel, linux-crypto, Andrew Morton,
	Mark Rutland, linux-doc, nouveau, Dave Hansen, virtualization,
	Ingo Molnar, linux-s390, Davidlohr Bueso, Daniel Lezcano,
	Julian Wiedmann, Daniel Jordan, Gonglei, Len Brown, Mike Travis,
	coresight, kvm-ppc, John Stultz, cgroups, linux-arm-kernel,
	Mathieu Poirier, Daniel Bristot de Oliveira, Alexander Shishkin,
	Jason Wang, Amit Kucheria, Lai Jiangshan, platform-driver-x86,
	Joel Fernandes, Mark Gross, Jonathan Corbet, Michael S. Tsirkin,
	Christian Borntraeger, Arnd Bergmann, Jiri Kosina, Josh Triplett,
	rcu, Mathieu Desnoyers, linux-hwmon, David S. Miller, Len Brown,
	Peter Zijlstra, linux-mm, H. Peter Anvin, live-patching,
	Miroslav Benes, Jiri Olsa, Herbert Xu, x86, Ingo Molnar,
	Jakub Kicinski, Mike Leach, Paul E. McKenney, Heiko Carstens,
	Johannes Weiner, linux-raid, Joe Lawrence, Josh Poimboeuf,
	Namhyung Kim, linux-edac, netdev, linux-mips, Leo Yan,
	Borislav Petkov, linuxppc-dev, Karsten Graul
In-Reply-To: <83dc5dfd-1ed0-f428-826f-fb819911fc89@redhat.com>

On 2021-08-03 17:30:40 [+0200], Hans de Goede wrote:
> Hi Sebastien,
Hi Hans,

> On 8/3/21 4:15 PM, Sebastian Andrzej Siewior wrote:
> > This is a tree wide replacement of the deprecated CPU hotplug functions
> > which are only wrappers around the actual functions.
> > 
> > Each patch is independent and can be picked up by the relevant maintainer.
> 
> Ok; and I take it that then also is the plan for merging these ?
> 
> FWIW I'm fine with the drivers/platform/x86 patch going upstream
> through some other tree if its easier to keep the set together ...

There is no need to keep that set together since each patch is
independent. Please merge it through your tree.

> Regards,
> 
> Hans

Sebastian

^ permalink raw reply

* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
From: Nicholas Piggin @ 2021-08-04  5:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe
In-Reply-To: <20210803143725.615186-1-aneesh.kumar@linux.ibm.com>

Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
> With shared mapping, even though we are unmapping a large range, the kernel
> will force a TLB flush with ptl lock held to avoid the race mentioned in
> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
> This results in the kernel issuing a high number of TLB flushes even for a large
> range. This can be improved by making sure the kernel switch to pid based flush if the
> kernel is unmapping a 2M range.

It would be good to have a bit more description here.

In any patch that changes a heuristic like this, I would like to see 
some justification or reasoning that could be refuted or used as a 
supporting argument if we ever wanted to change the heuristic later.
Ideally with some of the obvious downsides listed as well.

This "improves" things here, but what if it hurt things elsewhere, how 
would we come in later and decide to change it back?

THP flushes for example, I think now they'll do PID flushes (if they 
have to be broadcast, which they will tend to be when khugepaged does
them). So now that might increase jitter for THP and cause it to be a
loss for more workloads.

So where do you notice this? What's the benefit?

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
> index aefc100d79a7..21d0f098e43b 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>   * invalidating a full PID, so it has a far lower threshold to change from
>   * individual page flushes to full-pid flushes.
>   */
> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>  
>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>  	if (fullmm)
>  		flush_pid = true;
>  	else if (type == FLUSH_TYPE_GLOBAL)
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;

Arguably >= is nicer than > here, but this shouldn't be in the same 
patch as the value change.

>  	else
>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;

And it should change everything to be consistent. Although I'm not sure 
it's worth changing even though I highly doubt any administrator would
be tweaking this.

Thanks,
Nick

>  	/*
> @@ -1335,7 +1335,7 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
>  	if (fullmm)
>  		flush_pid = true;
>  	else if (type == FLUSH_TYPE_GLOBAL)
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>  	else
>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
>  
> @@ -1505,7 +1505,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
>  			continue;
>  
>  		nr_pages = (end - start) >> def->shift;
> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>  
>  		/*
>  		 * If the number of pages spanning the range is above
> -- 
> 2.31.1
> 
> 

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Christophe Leroy @ 2021-08-04  5:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <c2525223-ad29-6126-afa1-d70b8e29d721@csgroup.eu>

Gentle ping Michael ?

Le 25/06/2021 à 16:41, Christophe Leroy a écrit :
> Hi Michael,
> 
> What happened to this series ? It has been flagged 'under review' in Patchwork since mid April but I 
> never saw it in next-test.
> 
> Thanks
> Christophe
> 
> Le 12/04/2021 à 18:26, Christophe Leroy a écrit :
>> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
>>
>> For catching simple conditions like a variable having value 0, this
>> is efficient because it does the test and the trap at the same time.
>> But most conditions used with BUG_ON or WARN_ON are more complex and
>> forces GCC to format the condition into a 0 or 1 value in a register.
>> This will usually require 2 to 3 instructions.
>>
>> The most efficient solution would be to use __builtin_trap() because
>> GCC is able to optimise the use of the different trap instructions
>> based on the requested condition, but this is complex if not
>> impossible for the following reasons:
>> - __builtin_trap() is a non-recoverable instruction, so it can't be
>> used for WARN_ON
>> - Knowing which line of code generated the trap would require the
>> analysis of DWARF information. This is not a feature we have today.
>>
>> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
>> the way WARN_ON() is implemented is suboptimal. That commit also
>> mentions an issue with 'long long' condition. It fixed it for
>> WARN_ON() but the same problem still exists today with BUG_ON() on
>> PPC32. It will be fixed by using the generic implementation.
>>
>> By using the generic implementation, gcc will naturally generate a
>> branch to the unconditional trap generated by BUG().
>>
>> As modern powerpc implement zero-cycle branch,
>> that's even more efficient.
>>
>> And for the functions using WARN_ON() and its return, the test
>> on return from WARN_ON() is now also used for the WARN_ON() itself.
>>
>> On PPC64 we don't want it because we want to be able to use CFAR
>> register to track how we entered the code that trapped. The CFAR
>> register would be clobbered by the branch.
>>
>> A simple test function:
>>
>>     unsigned long test9w(unsigned long a, unsigned long b)
>>     {
>>         if (WARN_ON(!b))
>>             return 0;
>>         return a / b;
>>     }
>>
>> Before the patch:
>>
>>     0000046c <test9w>:
>>      46c:    7c 89 00 34     cntlzw  r9,r4
>>      470:    55 29 d9 7e     rlwinm  r9,r9,27,5,31
>>      474:    0f 09 00 00     twnei   r9,0
>>      478:    2c 04 00 00     cmpwi   r4,0
>>      47c:    41 82 00 0c     beq     488 <test9w+0x1c>
>>      480:    7c 63 23 96     divwu   r3,r3,r4
>>      484:    4e 80 00 20     blr
>>
>>      488:    38 60 00 00     li      r3,0
>>      48c:    4e 80 00 20     blr
>>
>> After the patch:
>>
>>     00000468 <test9w>:
>>      468:    2c 04 00 00     cmpwi   r4,0
>>      46c:    41 82 00 0c     beq     478 <test9w+0x10>
>>      470:    7c 63 23 96     divwu   r3,r3,r4
>>      474:    4e 80 00 20     blr
>>
>>      478:    0f e0 00 00     twui    r0,0
>>      47c:    38 60 00 00     li      r3,0
>>      480:    4e 80 00 20     blr
>>
>> So we see before the patch we need 3 instructions on the likely path
>> to handle the WARN_ON(). With the patch the trap goes on the unlikely
>> path.
>>
>> See below the difference at the entry of system_call_exception where
>> we have several BUG_ON(), allthough less impressing.
>>
>> With the patch:
>>
>>     00000000 <system_call_exception>:
>>        0:    81 6a 00 84     lwz     r11,132(r10)
>>        4:    90 6a 00 88     stw     r3,136(r10)
>>        8:    71 60 00 02     andi.   r0,r11,2
>>        c:    41 82 00 70     beq     7c <system_call_exception+0x7c>
>>       10:    71 60 40 00     andi.   r0,r11,16384
>>       14:    41 82 00 6c     beq     80 <system_call_exception+0x80>
>>       18:    71 6b 80 00     andi.   r11,r11,32768
>>       1c:    41 82 00 68     beq     84 <system_call_exception+0x84>
>>       20:    94 21 ff e0     stwu    r1,-32(r1)
>>       24:    93 e1 00 1c     stw     r31,28(r1)
>>       28:    7d 8c 42 e6     mftb    r12
>>     ...
>>       7c:    0f e0 00 00     twui    r0,0
>>       80:    0f e0 00 00     twui    r0,0
>>       84:    0f e0 00 00     twui    r0,0
>>
>> Without the patch:
>>
>>     00000000 <system_call_exception>:
>>        0:    94 21 ff e0     stwu    r1,-32(r1)
>>        4:    93 e1 00 1c     stw     r31,28(r1)
>>        8:    90 6a 00 88     stw     r3,136(r10)
>>        c:    81 6a 00 84     lwz     r11,132(r10)
>>       10:    69 60 00 02     xori    r0,r11,2
>>       14:    54 00 ff fe     rlwinm  r0,r0,31,31,31
>>       18:    0f 00 00 00     twnei   r0,0
>>       1c:    69 60 40 00     xori    r0,r11,16384
>>       20:    54 00 97 fe     rlwinm  r0,r0,18,31,31
>>       24:    0f 00 00 00     twnei   r0,0
>>       28:    69 6b 80 00     xori    r11,r11,32768
>>       2c:    55 6b 8f fe     rlwinm  r11,r11,17,31,31
>>       30:    0f 0b 00 00     twnei   r11,0
>>       34:    7d 8c 42 e6     mftb    r12
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/include/asm/bug.h | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index d1635ffbb179..101dea4eec8d 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -68,7 +68,11 @@
>>       BUG_ENTRY("twi 31, 0, 0", 0);                \
>>       unreachable();                        \
>>   } while (0)
>> +#define HAVE_ARCH_BUG
>> +
>> +#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
>> +#ifdef CONFIG_PPC64
>>   #define BUG_ON(x) do {                        \
>>       if (__builtin_constant_p(x)) {                \
>>           if (x)                        \
>> @@ -78,8 +82,6 @@
>>       }                            \
>>   } while (0)
>> -#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
>> -
>>   #define WARN_ON(x) ({                        \
>>       int __ret_warn_on = !!(x);                \
>>       if (__builtin_constant_p(__ret_warn_on)) {        \
>> @@ -93,9 +95,10 @@
>>       unlikely(__ret_warn_on);                \
>>   })
>> -#define HAVE_ARCH_BUG
>>   #define HAVE_ARCH_BUG_ON
>>   #define HAVE_ARCH_WARN_ON
>> +#endif
>> +
>>   #endif /* __ASSEMBLY __ */
>>   #else
>>   #ifdef __ASSEMBLY__
>>

^ permalink raw reply

* [PATCH] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
From: Christophe Leroy @ 2021-08-04  5:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linuxppc-dev, linux-kernel

In those hot functions that are called at every interrupt, any saved
cycle is worth it.

interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
called from three places:
- From entry_32.S
- From interrupt_64.S
- From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()

In entry_32.S, there are inambiguously called based on MSR_PR:

	interrupt_return:
		lwz	r4,_MSR(r1)
		addi	r3,r1,STACK_FRAME_OVERHEAD
		andi.	r0,r4,MSR_PR
		beq	.Lkernel_interrupt_return
		bl	interrupt_exit_user_prepare
	...
	.Lkernel_interrupt_return:
		bl	interrupt_exit_kernel_prepare

In interrupt_64.S, that's similar:

	interrupt_return_\srr\():
		ld	r4,_MSR(r1)
		andi.	r0,r4,MSR_PR
		beq	interrupt_return_\srr\()_kernel
	interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
		addi	r3,r1,STACK_FRAME_OVERHEAD
		bl	interrupt_exit_user_prepare
	...
	interrupt_return_\srr\()_kernel:
		addi	r3,r1,STACK_FRAME_OVERHEAD
		bl	interrupt_exit_kernel_prepare

In interrupt_exit_user_restart() and interrupt_exit_kernel_restart(),
MSR_PR is verified respectively by BUG_ON(!user_mode(regs)) and
BUG_ON(user_mode(regs)) prior to calling interrupt_exit_user_prepare()
and interrupt_exit_kernel_prepare().

The verification in interrupt_exit_user_prepare() and
interrupt_exit_kernel_prepare() are therefore useless and can be removed.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/interrupt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 21bbd615ca41..f26caf911ab5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -465,7 +465,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
 
 	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
 		BUG_ON(!(regs->msr & MSR_RI));
-	BUG_ON(!(regs->msr & MSR_PR));
 	BUG_ON(arch_irq_disabled_regs(regs));
 	CT_WARN_ON(ct_state() == CONTEXT_USER);
 
@@ -499,7 +498,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
 	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x) &&
 	    unlikely(!(regs->msr & MSR_RI)))
 		unrecoverable_exception(regs);
-	BUG_ON(regs->msr & MSR_PR);
 	/*
 	 * CT_WARN_ON comes here via program_check_exception,
 	 * so avoid recursion.
-- 
2.25.0


^ permalink raw reply related

* Re: undefined reference to `.radix__create_section_mapping'
From: Christophe Leroy @ 2021-08-04  5:31 UTC (permalink / raw)
  To: Randy Dunlap, kernel test robot, Jordan Niethe
  Cc: linuxppc-dev@lists.ozlabs.org, kbuild-all, linux-kernel
In-Reply-To: <082eea82-c788-72e0-f6a4-eadfb54d1231@infradead.org>

Hi Randy,

Le 04/08/2021 à 04:40, Randy Dunlap a écrit :
> On 7/31/21 11:22 AM, kernel test robot wrote:
>> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> head:   c7d102232649226a69dddd58a4942cf13cff4f7c
>> commit: fe3dc333d2ed50c9764d281869d87bae0d795ce5 powerpc/mmu: Don't duplicate radix_enabled()
>> date:   3 months ago
>> config: powerpc64-randconfig-r013-20210731 (attached as .config)
>> compiler: powerpc-linux-gcc (GCC) 10.3.0
>> reproduce (this is a W=1 build):
>>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
>> ~/bin/make.cross
>>          chmod +x ~/bin/make.cross
>>          # 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe3dc333d2ed50c9764d281869d87bae0d795ce5 
>>
>>          git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>          git fetch --no-tags linus master
>>          git checkout fe3dc333d2ed50c9764d281869d87bae0d795ce5
>>          # save the attached .config to linux build tree
>>          mkdir build_dir
>>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=powerpc 
>> SHELL=/bin/bash
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> All errors (new ones prefixed by >>):
>>
>>     powerpc-linux-ld: arch/powerpc/mm/book3s64/pgtable.o: in function `.create_section_mapping':
>>>> (.meminit.text+0x3c): undefined reference to `.radix__create_section_mapping'
>>     powerpc-linux-ld: arch/powerpc/mm/book3s64/pgtable.o: in function `.remove_section_mapping':
>>>> (.meminit.text+0x90): undefined reference to `.radix__remove_section_mapping'
> 
> In the randconfig file:
> # CONFIG_PPC_RADIX_MMU is not set
> 
> It is default y, but maybe that is not strong enough?
> I.e., should it be selected by PPC_BOOK3S_64?
> 
> Changing the config to PPC_RADIX_MMU=y fixes the build errors.
> 
> Or should arch/powerpc/mm/book3s64/pgtable.c be modified to handle
> the case of PPC_RADIX_MMU is not set?
> 

Can you test 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210804013724.514468-1-jniethe5@gmail.com/ ?

Thanks
Christophe

^ permalink raw reply

* Re: [PATCH] powerpc: Always inline radix_enabled() to fix build failure
From: Christophe Leroy @ 2021-08-04  5:35 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: erhard_f
In-Reply-To: <20210804013724.514468-1-jniethe5@gmail.com>



Le 04/08/2021 à 03:37, Jordan Niethe a écrit :
> This is the same as commit acdad8fb4a15 ("powerpc: Force inlining of
> mmu_has_feature to fix build failure") but for radix_enabled().  The
> config in the linked bugzilla causes the following build failure:
> 
> LD      .tmp_vmlinux.kallsyms1
> powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags':
> pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags'
> 
> This is due to radix_enabled() not being inlined. See extract from building with -Winline:
> 
> In file included from arch/powerpc/include/asm/lppaca.h:46,
>                   from arch/powerpc/include/asm/paca.h:17,
>                   from arch/powerpc/include/asm/current.h:13,
>                   from include/linux/thread_info.h:23,
>                   from include/asm-generic/preempt.h:5,
>                   from ./arch/powerpc/include/generated/asm/preempt.h:1,
>                   from include/linux/preempt.h:78,
>                   from include/linux/spinlock.h:51,
>                   from include/linux/mmzone.h:8,
>                   from include/linux/gfp.h:6,
>                   from arch/powerpc/mm/pgtable.c:21:
> arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags':
> arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline]
> 
> The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
> and elimination of non possible parts of code at compile time. For this
> to work radix_enabled() must be inlined so make it __always_inline.

Thanks for looking at that. I also got a few notifications of that problem by kernel test robot but 
I didn't look at it yet.

https://lkml.org/lkml/2021/7/31/257
https://lkml.org/lkml/2021/7/25/271

> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803
> Reported-by: Erhard F. <erhard_f@mailbox.org>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
>   arch/powerpc/include/asm/mmu.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index 27016b98ecb2..8abe8e42e045 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -324,7 +324,7 @@ static inline void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
>   }
>   #endif /* !CONFIG_DEBUG_VM */
>   
> -static inline bool radix_enabled(void)
> +static __always_inline bool radix_enabled(void)
>   {
>   	return mmu_has_feature(MMU_FTR_TYPE_RADIX);
>   }
> 

^ permalink raw reply

* [Bug 213961] Oops while loading radeon driver
From: bugzilla-daemon @ 2021-08-04  5:44 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-213961-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=213961

Christophe Leroy (christophe.leroy@csgroup.eu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christophe.leroy@csgroup.eu

--- Comment #1 from Christophe Leroy (christophe.leroy@csgroup.eu) ---
There is not much that can be done with the log provided.

Could you please rebuild your kernel with CONFIG_DEBUG_INFO so that we get the
name of involved functions in the BUG report.

Could you also provide your .config, and a full dump of 'dmesg'

Thanks

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: undefined reference to `.radix__create_section_mapping'
From: Randy Dunlap @ 2021-08-04  5:46 UTC (permalink / raw)
  To: Christophe Leroy, kernel test robot, Jordan Niethe
  Cc: linuxppc-dev@lists.ozlabs.org, kbuild-all, linux-kernel
In-Reply-To: <c4398495-a50a-a749-6679-7510ed862db2@csgroup.eu>

On 8/3/21 10:31 PM, Christophe Leroy wrote:
> Hi Randy,
> 
> Le 04/08/2021 à 04:40, Randy Dunlap a écrit :
>> On 7/31/21 11:22 AM, kernel test robot wrote:
>>> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>> head:   c7d102232649226a69dddd58a4942cf13cff4f7c
>>> commit: fe3dc333d2ed50c9764d281869d87bae0d795ce5 powerpc/mmu: Don't duplicate radix_enabled()
>>> date:   3 months ago
>>> config: powerpc64-randconfig-r013-20210731 (attached as .config)
>>> compiler: powerpc-linux-gcc (GCC) 10.3.0
>>> reproduce (this is a W=1 build):
>>>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>>>          chmod +x ~/bin/make.cross
>>>          # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe3dc333d2ed50c9764d281869d87bae0d795ce5
>>>          git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>>          git fetch --no-tags linus master
>>>          git checkout fe3dc333d2ed50c9764d281869d87bae0d795ce5
>>>          # save the attached .config to linux build tree
>>>          mkdir build_dir
>>>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash
>>>
>>> If you fix the issue, kindly add following tag as appropriate
>>> Reported-by: kernel test robot <lkp@intel.com>
>>>
>>> All errors (new ones prefixed by >>):
>>>
>>>     powerpc-linux-ld: arch/powerpc/mm/book3s64/pgtable.o: in function `.create_section_mapping':
>>>>> (.meminit.text+0x3c): undefined reference to `.radix__create_section_mapping'
>>>     powerpc-linux-ld: arch/powerpc/mm/book3s64/pgtable.o: in function `.remove_section_mapping':
>>>>> (.meminit.text+0x90): undefined reference to `.radix__remove_section_mapping'
>>
>> In the randconfig file:
>> # CONFIG_PPC_RADIX_MMU is not set
>>
>> It is default y, but maybe that is not strong enough?
>> I.e., should it be selected by PPC_BOOK3S_64?
>>
>> Changing the config to PPC_RADIX_MMU=y fixes the build errors.
>>
>> Or should arch/powerpc/mm/book3s64/pgtable.c be modified to handle
>> the case of PPC_RADIX_MMU is not set?
>>
> 
> Can you test https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210804013724.514468-1-jniethe5@gmail.com/ ?

Hi Christophe,

Yes, that builds without a problem. Thanks for the pointer.

Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested

-- 
~Randy

^ permalink raw reply

* Re: [PATCH] powerpc/32: Fix critical and debug interrupts on BOOKE
From: Christophe Leroy @ 2021-08-04  5:52 UTC (permalink / raw)
  To: radu.rendec; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu>

Hi Radu,

Le 07/07/2021 à 07:55, Christophe Leroy a écrit :
> 32 bits BOOKE have special interrupts for debug and other
> critical events.

Were you able to test this patch ?

Thanks
Christophe


> 
> When handling those interrupts, dedicated registers are saved
> in the stack frame in addition to the standard registers, leading
> to a shift of the pt_regs struct.
> 
> Since commit db297c3b07af ("powerpc/32: Don't save thread.regs on
> interrupt entry"), the pt_regs struct is expected to be at the
> same place all the time.
> 
> Instead of handling a special struct in addition to pt_regs, just
> add those special registers to struct pt_regs.
> 
> Reported-by: Radu Rendec <radu.rendec@gmail.com>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Fixes: db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry")
> Cc: stable@vger.kernel.org
> ---
>   arch/powerpc/include/asm/ptrace.h | 16 ++++++++++++++++
>   arch/powerpc/kernel/asm-offsets.c | 31 ++++++++++++++-----------------
>   arch/powerpc/kernel/head_booke.h  | 27 +++------------------------
>   3 files changed, 33 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index 3e5d470a6155..14422e851494 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -70,6 +70,22 @@ struct pt_regs
>   		unsigned long __pad[4];	/* Maintain 16 byte interrupt stack alignment */
>   	};
>   #endif
> +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE)
> +	struct { /* Must be a multiple of 16 bytes */
> +		unsigned long mas0;
> +		unsigned long mas1;
> +		unsigned long mas2;
> +		unsigned long mas3;
> +		unsigned long mas6;
> +		unsigned long mas7;
> +		unsigned long srr0;
> +		unsigned long srr1;
> +		unsigned long csrr0;
> +		unsigned long csrr1;
> +		unsigned long dsrr0;
> +		unsigned long dsrr1;
> +	};
> +#endif
>   };
>   #endif
>   
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index a47eefa09bcb..5bee245d832b 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -309,24 +309,21 @@ int main(void)
>   	STACK_PT_REGS_OFFSET(STACK_REGS_IAMR, iamr);
>   #endif
>   
> -#if defined(CONFIG_PPC32)
> -#if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
> -	DEFINE(EXC_LVL_SIZE, STACK_EXC_LVL_FRAME_SIZE);
> -	DEFINE(MAS0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas0));
> +#if defined(CONFIG_PPC32) && defined(CONFIG_BOOKE)
> +	STACK_PT_REGS_OFFSET(MAS0, mas0);
>   	/* we overload MMUCR for 44x on MAS0 since they are mutually exclusive */
> -	DEFINE(MMUCR, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas0));
> -	DEFINE(MAS1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas1));
> -	DEFINE(MAS2, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas2));
> -	DEFINE(MAS3, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas3));
> -	DEFINE(MAS6, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas6));
> -	DEFINE(MAS7, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, mas7));
> -	DEFINE(_SRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, srr0));
> -	DEFINE(_SRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, srr1));
> -	DEFINE(_CSRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, csrr0));
> -	DEFINE(_CSRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, csrr1));
> -	DEFINE(_DSRR0, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, dsrr0));
> -	DEFINE(_DSRR1, STACK_INT_FRAME_SIZE+offsetof(struct exception_regs, dsrr1));
> -#endif
> +	STACK_PT_REGS_OFFSET(MMUCR, mas0);
> +	STACK_PT_REGS_OFFSET(MAS1, mas1);
> +	STACK_PT_REGS_OFFSET(MAS2, mas2);
> +	STACK_PT_REGS_OFFSET(MAS3, mas3);
> +	STACK_PT_REGS_OFFSET(MAS6, mas6);
> +	STACK_PT_REGS_OFFSET(MAS7, mas7);
> +	STACK_PT_REGS_OFFSET(_SRR0, srr0);
> +	STACK_PT_REGS_OFFSET(_SRR1, srr1);
> +	STACK_PT_REGS_OFFSET(_CSRR0, csrr0);
> +	STACK_PT_REGS_OFFSET(_CSRR1, csrr1);
> +	STACK_PT_REGS_OFFSET(_DSRR0, dsrr0);
> +	STACK_PT_REGS_OFFSET(_DSRR1, dsrr1);
>   #endif
>   
>   	/* About the CPU features table */
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index 87b806e8eded..e5503420b6c6 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -168,20 +168,18 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   /* only on e500mc */
>   #define DBG_STACK_BASE		dbgirq_ctx
>   
> -#define EXC_LVL_FRAME_OVERHEAD	(THREAD_SIZE - INT_FRAME_SIZE - EXC_LVL_SIZE)
> -
>   #ifdef CONFIG_SMP
>   #define BOOKE_LOAD_EXC_LEVEL_STACK(level)		\
>   	mfspr	r8,SPRN_PIR;				\
>   	slwi	r8,r8,2;				\
>   	addis	r8,r8,level##_STACK_BASE@ha;		\
>   	lwz	r8,level##_STACK_BASE@l(r8);		\
> -	addi	r8,r8,EXC_LVL_FRAME_OVERHEAD;
> +	addi	r8,r8,THREAD_SIZE - INT_FRAME_SIZE;
>   #else
>   #define BOOKE_LOAD_EXC_LEVEL_STACK(level)		\
>   	lis	r8,level##_STACK_BASE@ha;		\
>   	lwz	r8,level##_STACK_BASE@l(r8);		\
> -	addi	r8,r8,EXC_LVL_FRAME_OVERHEAD;
> +	addi	r8,r8,THREAD_SIZE - INT_FRAME_SIZE;
>   #endif
>   
>   /*
> @@ -208,7 +206,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   	mtmsr	r11;							\
>   	mfspr	r11,SPRN_SPRG_THREAD;	/* if from user, start at top of   */\
>   	lwz	r11, TASK_STACK - THREAD(r11); /* this thread's kernel stack */\
> -	addi	r11,r11,EXC_LVL_FRAME_OVERHEAD;	/* allocate stack frame    */\
> +	addi	r11,r11,THREAD_SIZE - INT_FRAME_SIZE;	/* allocate stack frame    */\
>   	beq	1f;							     \
>   	/* COMING FROM USER MODE */					     \
>   	stw	r9,_CCR(r11);		/* save CR			   */\
> @@ -516,24 +514,5 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
>   	bl	kernel_fp_unavailable_exception;			      \
>   	b	interrupt_return
>   
> -#else /* __ASSEMBLY__ */
> -struct exception_regs {
> -	unsigned long mas0;
> -	unsigned long mas1;
> -	unsigned long mas2;
> -	unsigned long mas3;
> -	unsigned long mas6;
> -	unsigned long mas7;
> -	unsigned long srr0;
> -	unsigned long srr1;
> -	unsigned long csrr0;
> -	unsigned long csrr1;
> -	unsigned long dsrr0;
> -	unsigned long dsrr1;
> -};
> -
> -/* ensure this structure is always sized to a multiple of the stack alignment */
> -#define STACK_EXC_LVL_FRAME_SIZE	ALIGN(sizeof (struct exception_regs), 16)
> -
>   #endif /* __ASSEMBLY__ */
>   #endif /* __HEAD_BOOKE_H__ */
> 

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix napping restore in data storage interrupt (DSI)
From: Christophe Leroy @ 2021-08-04  6:07 UTC (permalink / raw)
  To: Finn Thain; +Cc: userm57, linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <ce20d16c-b0b2-94c-3e22-794d95c376b@linux-m68k.org>



Le 04/08/2021 à 06:04, Finn Thain a écrit :
> On Tue, 3 Aug 2021, Christophe Leroy wrote:
> 
>> When a DSI (Data Storage Interrupt) is taken while in NAP mode, r11
>> doesn't survive the call to power_save_ppc32_restore().
>>
>> So use r1 instead of r11 as they both contain the virtual stack pointer
>> at that point.
>>
>> Reported-by: Finn Thain <fthain@linux-m68k.org>
>> Fixes: 4c0104a83fc3 ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE")
> 
> Regarding that 'Fixes' tag, this patch has not fixed the failure below,
> unfortunately. But there appears to be several bugs in play here. Can you
> tell us which failure mode is associated with the bug addressed by this
> patch?


This is unrelated to the failure below. This patch is related to the bisect you did that pointed to 
4c0104a83fc3 ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE")

I think maybe the starting point should be to (manually) apply the patch on top of that commit in 
order to check that the bug to leaded to pointing that commit as 'first bad commit' is now gone.

The BUG below is likely something completely different.

And the other bug involving KUAP write is also something else to be investigated separately.

> 
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/kernel/interrupt.c:49!
> Oops: Exception in kernel mode, sig: 5 [#1]
> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
> Modules linked in:
> CPU: 0 PID: 1859 Comm: xfce4-session Not tainted 5.13.0-pmac-VMAP #10
> NIP:  c0011474 LR: c0011464 CTR: 00000000
> REGS: e2f75e40 TRAP: 0700   Not tainted  (5.13.0-pmac-VMAP)
> MSR:  00021032 <ME,IR,DR,RI>  CR: 2400446c  XER: 20000000
> 
> GPR00: c001604c e2f75f00 ca284a60 00000000 00000000 a5205eb0 00000008 00000020
> GPR08: ffffffc0 00000001 501200d9 ce030005 ca285010 00c1f778 00000000 00000000
> GPR16: 00945b20 009402f8 00000001 a6b87550 a51fd000 afb73220 a6b22c78 a6a6aecc
> GPR24: 00000000 ffffffc0 00000020 00000008 a5205eb0 00000000 e2f75f40 000000ae
> NIP [c0011474] system_call_exception+0x60/0x164
> LR [c0011464] system_call_exception+0x50/0x164
> Call Trace:
> [e2f75f00] [00009000] 0x9000 (unreliable)
> [e2f75f30] [c001604c] ret_from_syscall+0x0/0x28
> --- interrupt: c00 at 0xa69d6cb0
> NIP:  a69d6cb0 LR: a69d6c3c CTR: 00000000
> REGS: e2f75f40 TRAP: 0c00   Not tainted  (5.13.0-pmac-VMAP)
> MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 2400446c  XER: 20000000
> 
> GPR00: 000000ae a5205de0 a5687ca0 00000000 00000000 a5205eb0 00000008 00000020
> GPR08: ffffffc0 401201ea 401200d9 ffffffff c158f230 00c1f778 00000000 00000000
> GPR16: 00945b20 009402f8 00000001 a6b87550 a51fd000 afb73220 a6b22c78 a6a6aecc
> GPR24: afb72fc8 00000000 00000001 a5205f30 afb733dc 00000000 a6b85ff4 a5205eb0
> NIP [a69d6cb0] 0xa69d6cb0
> LR [a69d6c3c] 0xa69d6c3c
> --- interrupt: c00
> Instruction dump:
> 7cdb3378 93810020 7cbc2b78 93a10024 7c9d2378 93e1002c 7d3f4b78 4800d629
> 817e0084 931e0088 69690002 5529fffe <0f090000> 69694000 552997fe 0f090000
> ---[ end trace c66c6c3c44806276 ]---
> 

^ permalink raw reply

* [powerpc:next-test] BUILD SUCCESS c09f799bf0fdc0eeb44db87df07a7a9632c3420c
From: kernel test robot @ 2021-08-04  6:18 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: c09f799bf0fdc0eeb44db87df07a7a9632c3420c  pseries/drmem: update LMBs after LPM

elapsed time: 1007m

configs tested: 100
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210803
powerpc                     tqm8541_defconfig
nios2                         3c120_defconfig
arm                            pleb_defconfig
powerpc                     pseries_defconfig
arm                        trizeps4_defconfig
arm                          imote2_defconfig
powerpc                   bluestone_defconfig
mips                          rm200_defconfig
powerpc                      ppc64e_defconfig
powerpc                      ppc6xx_defconfig
sh                     sh7710voipgw_defconfig
powerpc                           allnoconfig
powerpc                 mpc8540_ads_defconfig
powerpc                       ebony_defconfig
powerpc                      bamboo_defconfig
powerpc                     kilauea_defconfig
mips                        bcm63xx_defconfig
powerpc                 mpc8315_rdb_defconfig
xtensa                generic_kc705_defconfig
mips                           ip27_defconfig
powerpc64                           defconfig
arm                           sama5_defconfig
sh                          sdk7786_defconfig
arc                    vdk_hs38_smp_defconfig
x86_64                            allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
x86_64               randconfig-a002-20210803
x86_64               randconfig-a004-20210803
x86_64               randconfig-a006-20210803
x86_64               randconfig-a003-20210803
x86_64               randconfig-a001-20210803
x86_64               randconfig-a005-20210803
i386                 randconfig-a004-20210802
i386                 randconfig-a005-20210802
i386                 randconfig-a002-20210802
i386                 randconfig-a006-20210802
i386                 randconfig-a001-20210802
i386                 randconfig-a003-20210802
i386                 randconfig-a012-20210803
i386                 randconfig-a011-20210803
i386                 randconfig-a015-20210803
i386                 randconfig-a013-20210803
i386                 randconfig-a014-20210803
i386                 randconfig-a016-20210803
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                    rhel-8.3-kselftests
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-a012-20210803
x86_64               randconfig-a016-20210803
x86_64               randconfig-a013-20210803
x86_64               randconfig-a011-20210803
x86_64               randconfig-a014-20210803
x86_64               randconfig-a015-20210803

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix napping restore in data storage interrupt (DSI)
From: Christophe Leroy @ 2021-08-04  6:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Finn Thain, userm57, linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <b04a90a9-9d62-2192-f896-ea99be911604@csgroup.eu>

Hi Nic,

I think I'll need your help on that one.

Le 04/08/2021 à 08:07, Christophe Leroy a écrit :
> 
> 
> Le 04/08/2021 à 06:04, Finn Thain a écrit :
>> On Tue, 3 Aug 2021, Christophe Leroy wrote:
>>
...
>>
>> ------------[ cut here ]------------
>> kernel BUG at arch/powerpc/kernel/interrupt.c:49!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
>> Modules linked in:
>> CPU: 0 PID: 1859 Comm: xfce4-session Not tainted 5.13.0-pmac-VMAP #10
>> NIP:  c0011474 LR: c0011464 CTR: 00000000
>> REGS: e2f75e40 TRAP: 0700   Not tainted  (5.13.0-pmac-VMAP)
>> MSR:  00021032 <ME,IR,DR,RI>  CR: 2400446c  XER: 20000000
>>
>> GPR00: c001604c e2f75f00 ca284a60 00000000 00000000 a5205eb0 00000008 00000020
>> GPR08: ffffffc0 00000001 501200d9 ce030005 ca285010 00c1f778 00000000 00000000
>> GPR16: 00945b20 009402f8 00000001 a6b87550 a51fd000 afb73220 a6b22c78 a6a6aecc
>> GPR24: 00000000 ffffffc0 00000020 00000008 a5205eb0 00000000 e2f75f40 000000ae
>> NIP [c0011474] system_call_exception+0x60/0x164
>> LR [c0011464] system_call_exception+0x50/0x164
>> Call Trace:
>> [e2f75f00] [00009000] 0x9000 (unreliable)
>> [e2f75f30] [c001604c] ret_from_syscall+0x0/0x28
>> --- interrupt: c00 at 0xa69d6cb0
>> NIP:  a69d6cb0 LR: a69d6c3c CTR: 00000000
>> REGS: e2f75f40 TRAP: 0c00   Not tainted  (5.13.0-pmac-VMAP)
>> MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 2400446c  XER: 20000000
>>
>> GPR00: 000000ae a5205de0 a5687ca0 00000000 00000000 a5205eb0 00000008 00000020
>> GPR08: ffffffc0 401201ea 401200d9 ffffffff c158f230 00c1f778 00000000 00000000
>> GPR16: 00945b20 009402f8 00000001 a6b87550 a51fd000 afb73220 a6b22c78 a6a6aecc
>> GPR24: afb72fc8 00000000 00000001 a5205f30 afb733dc 00000000 a6b85ff4 a5205eb0
>> NIP [a69d6cb0] 0xa69d6cb0
>> LR [a69d6c3c] 0xa69d6c3c
>> --- interrupt: c00
>> Instruction dump:
>> 7cdb3378 93810020 7cbc2b78 93a10024 7c9d2378 93e1002c 7d3f4b78 4800d629
>> 817e0084 931e0088 69690002 5529fffe <0f090000> 69694000 552997fe 0f090000
>> ---[ end trace c66c6c3c44806276 ]---
>>

Getting a BUG at arch/powerpc/kernel/interrupt.c:49 meaning MSR_RI is not set, but the c00 interrupt 
frame shows MSR_RI properly set, so what ?

Thanks
Christophe

^ permalink raw reply

* Re: Debian SID kernel doesn't boot on PowerBook 3400c
From: Christophe Leroy @ 2021-08-04  6:28 UTC (permalink / raw)
  To: Finn Thain; +Cc: Debian PowerPC, linuxppc-dev, Stan Johnson
In-Reply-To: <c535cc2b-3f45-2415-1e81-32ea24b4ec@linux-m68k.org>



Le 04/08/2021 à 02:34, Finn Thain a écrit :
> 
> On Tue, 3 Aug 2021, Christophe Leroy wrote:
> 
>>
>> Looks like the memory errors are linked to KUAP (Kernel Userspace Access
>> Protection). Based on the places the problems happen, I don't think
>> there are any invalid access, so there must be something wrong in the
>> KUAP logic, probably linked to some interrupts happenning in kernel mode
>> while the KUAP window is opened. And because is not selected by default
>> on book3s/32 until 5.14, probably nobody ever tested it in a real
>> environment before you.
>>
>> I think the issue may be linked to commit
>> https://github.com/linuxppc/linux/commit/c16728835 which happened
>> between 5.12 and 5.13.
> 
> The messages, "Kernel attempted to write user page (c6207c) - exploit
> attempt? (uid: 0)", appear in the console logs generated by v5.13. Those
> logs come from the Powerbook G3 discussion in the other thread. Could that
> be the same bug?
> 

Yes, most likely.

So you confirm this appears with 5.13 and not 5.12 ?

Can you check if they happen at commit c16728835
Can you check if they DO NOT happen at preceding commit c16728835~

Could you test without CONFIG_PPC_KUAP
Could you test with CONFIG_PPC_KUAP and CONFIG_PPC_KUAP_DEBUG

Thanks
Christophe

^ permalink raw reply

* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
From: Nicholas Piggin @ 2021-08-04  6:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe
  Cc: Peter Zijlstra, linux-arch, Will Deacon, linux-mm
In-Reply-To: <1628053302.0qclx0xcj9.astroid@bobo.none>

Excerpts from Nicholas Piggin's message of August 4, 2021 3:14 pm:
> Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
>> With shared mapping, even though we are unmapping a large range, the kernel
>> will force a TLB flush with ptl lock held to avoid the race mentioned in
>> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
>> This results in the kernel issuing a high number of TLB flushes even for a large
>> range. This can be improved by making sure the kernel switch to pid based flush if the
>> kernel is unmapping a 2M range.
> 
> It would be good to have a bit more description here.
> 
> In any patch that changes a heuristic like this, I would like to see 
> some justification or reasoning that could be refuted or used as a 
> supporting argument if we ever wanted to change the heuristic later.
> Ideally with some of the obvious downsides listed as well.
> 
> This "improves" things here, but what if it hurt things elsewhere, how 
> would we come in later and decide to change it back?
> 
> THP flushes for example, I think now they'll do PID flushes (if they 
> have to be broadcast, which they will tend to be when khugepaged does
> them). So now that might increase jitter for THP and cause it to be a
> loss for more workloads.
> 
> So where do you notice this? What's the benefit?

For that matter, I wonder if we shouldn't do something like this 
(untested) so the low level batch flush has visibility to the high
level flush range.

x86 could use this too AFAIKS, just needs to pass the range a bit
further down, but in practice I'm not sure it would ever really
matter for them because the 2MB level exceeds the single page flush
ceiling for 4k pages unlike powerpc with 64k pages. But in corner
cases where the unmap crossed a bunch of small vmas or the ceiling
was increased then in theory it could be of use.

Subject: [PATCH v1] mm/mmu_gather: provide archs with the entire range that is
 to be flushed, not just the particular gather

This allows archs to optimise flushing heuristics better, in the face of
flush operations forcing smaller flush batches. For example, an
architecture may choose a more costly per-page invalidation for small
ranges of pages with the assumption that the full TLB flush cost would
be more expensive in terms of refills. However if a very large range is
forced into flushing small ranges, the faster full-process flush may
have been the better choice.

---
 arch/powerpc/mm/book3s64/radix_tlb.c | 33 ++++++++++++++++------------
 fs/exec.c                            |  3 ++-
 include/asm-generic/tlb.h            |  9 ++++++++
 include/linux/mm_types.h             |  3 ++-
 mm/hugetlb.c                         |  2 +-
 mm/madvise.c                         |  6 ++---
 mm/memory.c                          |  4 ++--
 mm/mmap.c                            |  2 +-
 mm/mmu_gather.c                      | 10 ++++++---
 mm/oom_kill.c                        |  2 +-
 10 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index aefc100d79a7..e1072d85d72e 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -1110,12 +1110,13 @@ static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
 
 static inline void __radix__flush_tlb_range(struct mm_struct *mm,
-					    unsigned long start, unsigned long end)
+					    unsigned long start, unsigned long end,
+					    unsigned long entire_range)
 {
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift;
 	unsigned long page_size = 1UL << page_shift;
-	unsigned long nr_pages = (end - start) >> page_shift;
+	unsigned long entire_nr_pages = entire_range >> page_shift;
 	bool fullmm = (end == TLB_FLUSH_ALL);
 	bool flush_pid, flush_pwc = false;
 	enum tlb_flush_type type;
@@ -1133,9 +1134,9 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
 	if (fullmm)
 		flush_pid = true;
 	else if (type == FLUSH_TYPE_GLOBAL)
-		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+		flush_pid = entire_nr_pages > tlb_single_page_flush_ceiling;
 	else
-		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
+		flush_pid = entire_nr_pages > tlb_local_single_page_flush_ceiling;
 	/*
 	 * full pid flush already does the PWC flush. if it is not full pid
 	 * flush check the range is more than PMD and force a pwc flush
@@ -1220,7 +1221,7 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		return radix__flush_hugetlb_tlb_range(vma, start, end);
 #endif
 
-	__radix__flush_tlb_range(vma->vm_mm, start, end);
+	__radix__flush_tlb_range(vma->vm_mm, start, end, end - start);
 }
 EXPORT_SYMBOL(radix__flush_tlb_range);
 
@@ -1278,6 +1279,11 @@ void radix__flush_all_lpid_guest(unsigned int lpid)
 	_tlbie_lpid_guest(lpid, RIC_FLUSH_ALL);
 }
 
+static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
+				unsigned long start, unsigned long end,
+				unsigned long entire_range,
+				int psize, bool also_pwc);
+
 void radix__tlb_flush(struct mmu_gather *tlb)
 {
 	int psize = 0;
@@ -1285,6 +1291,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 	int page_size = tlb->page_size;
 	unsigned long start = tlb->start;
 	unsigned long end = tlb->end;
+	unsigned long entire_range = tlb->entire_end - tlb->entire_start;
 
 	/*
 	 * if page size is not something we understand, do a full mm flush
@@ -1301,21 +1308,19 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		if (!tlb->freed_tables)
-			radix__flush_tlb_range_psize(mm, start, end, psize);
-		else
-			radix__flush_tlb_pwc_range_psize(mm, start, end, psize);
+		__radix__flush_tlb_range_psize(mm, start, end, entire_range, psize, tlb->freed_tables);
 	}
 }
 
 static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 				unsigned long start, unsigned long end,
+				unsigned long entire_range,
 				int psize, bool also_pwc)
 {
 	unsigned long pid;
 	unsigned int page_shift = mmu_psize_defs[psize].shift;
 	unsigned long page_size = 1UL << page_shift;
-	unsigned long nr_pages = (end - start) >> page_shift;
+	unsigned long entire_nr_pages = entire_range >> page_shift;
 	bool fullmm = (end == TLB_FLUSH_ALL);
 	bool flush_pid;
 	enum tlb_flush_type type;
@@ -1335,9 +1340,9 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 	if (fullmm)
 		flush_pid = true;
 	else if (type == FLUSH_TYPE_GLOBAL)
-		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
+		flush_pid = entire_nr_pages > tlb_single_page_flush_ceiling;
 	else
-		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
+		flush_pid = entire_nr_pages > tlb_local_single_page_flush_ceiling;
 
 	if (!mmu_has_feature(MMU_FTR_GTSE) && type == FLUSH_TYPE_GLOBAL) {
 		unsigned long tgt = H_RPTI_TARGET_CMMU;
@@ -1381,13 +1386,13 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 void radix__flush_tlb_range_psize(struct mm_struct *mm, unsigned long start,
 				  unsigned long end, int psize)
 {
-	return __radix__flush_tlb_range_psize(mm, start, end, psize, false);
+	return __radix__flush_tlb_range_psize(mm, start, end, end - start, psize, false);
 }
 
 void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
 				      unsigned long end, int psize)
 {
-	__radix__flush_tlb_range_psize(mm, start, end, psize, true);
+	__radix__flush_tlb_range_psize(mm, start, end, end - start, psize, true);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/fs/exec.c b/fs/exec.c
index 38f63451b928..c769c12bdf56 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -705,11 +705,11 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		return -ENOMEM;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm);
 	if (new_end > old_start) {
 		/*
 		 * when the old and new regions overlap clear from new_end.
 		 */
+		tlb_gather_mmu(&tlb, mm, new_end, old_end);
 		free_pgd_range(&tlb, new_end, old_end, new_end,
 			vma->vm_next ? vma->vm_next->vm_start : USER_PGTABLES_CEILING);
 	} else {
@@ -719,6 +719,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
 		 * have constraints on va-space that make this illegal (IA64) -
 		 * for the others its just a little faster.
 		 */
+		tlb_gather_mmu(&tlb, mm, old_start, old_end);
 		free_pgd_range(&tlb, old_start, old_end, new_end,
 			vma->vm_next ? vma->vm_next->vm_start : USER_PGTABLES_CEILING);
 	}
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 2c68a545ffa7..857fd83af695 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -256,6 +256,15 @@ struct mmu_gather {
 	struct mmu_table_batch	*batch;
 #endif
 
+	/*
+	 * This is the range of the "entire" logical flush
+	 * operation being performed. It does not relate to
+	 * the current batch to flush, but it can inform
+	 * heuristics that choose the best flushing strategy.
+	 */
+	unsigned long		entire_start;
+	unsigned long		entire_end;
+
 	unsigned long		start;
 	unsigned long		end;
 	/*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 52bbd2b7cb46..9d2ff06b574c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -599,7 +599,8 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 }
 
 struct mmu_gather;
-extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm);
+extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
+				unsigned long start, unsigned long end);
 extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm);
 extern void tlb_finish_mmu(struct mmu_gather *tlb);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dfc940d5221d..e41106eb4df7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4458,7 +4458,7 @@ void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 {
 	struct mmu_gather tlb;
 
-	tlb_gather_mmu(&tlb, vma->vm_mm);
+	tlb_gather_mmu(&tlb, vma->vm_mm, start, end);
 	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
 	tlb_finish_mmu(&tlb);
 }
diff --git a/mm/madvise.c b/mm/madvise.c
index 6d3d348b17f4..b3634672aeb9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -508,7 +508,7 @@ static long madvise_cold(struct vm_area_struct *vma,
 		return -EINVAL;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm);
+	tlb_gather_mmu(&tlb, mm, start_addr, end_addr);
 	madvise_cold_page_range(&tlb, vma, start_addr, end_addr);
 	tlb_finish_mmu(&tlb);
 
@@ -561,7 +561,7 @@ static long madvise_pageout(struct vm_area_struct *vma,
 		return 0;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm);
+	tlb_gather_mmu(&tlb, mm, start_addr, end_addr);
 	madvise_pageout_page_range(&tlb, vma, start_addr, end_addr);
 	tlb_finish_mmu(&tlb);
 
@@ -726,7 +726,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 				range.start, range.end);
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm);
+	tlb_gather_mmu(&tlb, mm, range.start, range.end);
 	update_hiwater_rss(mm);
 
 	mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/memory.c b/mm/memory.c
index 25fc46e87214..61c303e84baf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1647,7 +1647,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	lru_add_drain();
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
 				start, start + size);
-	tlb_gather_mmu(&tlb, vma->vm_mm);
+	tlb_gather_mmu(&tlb, vma->vm_mm, range.start, range.end);
 	update_hiwater_rss(vma->vm_mm);
 	mmu_notifier_invalidate_range_start(&range);
 	for ( ; vma && vma->vm_start < range.end; vma = vma->vm_next)
@@ -1674,7 +1674,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	lru_add_drain();
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
 				address, address + size);
-	tlb_gather_mmu(&tlb, vma->vm_mm);
+	tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
 	update_hiwater_rss(vma->vm_mm);
 	mmu_notifier_invalidate_range_start(&range);
 	unmap_single_vma(&tlb, vma, address, range.end, details);
diff --git a/mm/mmap.c b/mm/mmap.c
index ca54d36d203a..f2808febde40 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2672,7 +2672,7 @@ static void unmap_region(struct mm_struct *mm,
 	struct mmu_gather tlb;
 
 	lru_add_drain();
-	tlb_gather_mmu(&tlb, mm);
+	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end);
 	free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 1b9837419bf9..863a5bd7e650 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -250,9 +250,12 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 }
 
 static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
+			     unsigned long start, unsigned long end,
 			     bool fullmm)
 {
 	tlb->mm = mm;
+	tlb->entire_start = start;
+	tlb->entire_end = end;
 	tlb->fullmm = fullmm;
 
 #ifndef CONFIG_MMU_GATHER_NO_GATHER
@@ -281,9 +284,10 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
  * Called to initialize an (on-stack) mmu_gather structure for page-table
  * tear-down from @mm.
  */
-void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm)
+void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
+		    unsigned long start, unsigned long end)
 {
-	__tlb_gather_mmu(tlb, mm, false);
+	__tlb_gather_mmu(tlb, mm, start, end, false);
 }
 
 /**
@@ -299,7 +303,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm)
  */
 void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm)
 {
-	__tlb_gather_mmu(tlb, mm, true);
+	__tlb_gather_mmu(tlb, mm, 0, ~0UL, true);
 }
 
 /**
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c729a4c4a1ac..bfcc9cbdfb20 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -545,7 +545,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
 			mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
 						vma, mm, vma->vm_start,
 						vma->vm_end);
-			tlb_gather_mmu(&tlb, mm);
+			tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
 			if (mmu_notifier_invalidate_range_start_nonblock(&range)) {
 				tlb_finish_mmu(&tlb);
 				ret = false;
-- 
2.23.0


^ permalink raw reply related

* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
From: Michael Ellerman @ 2021-08-04  6:59 UTC (permalink / raw)
  To: Nicholas Piggin, Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <1628053302.0qclx0xcj9.astroid@bobo.none>

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
>> With shared mapping, even though we are unmapping a large range, the kernel
>> will force a TLB flush with ptl lock held to avoid the race mentioned in
>> commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
>> This results in the kernel issuing a high number of TLB flushes even for a large
>> range. This can be improved by making sure the kernel switch to pid based flush if the
>> kernel is unmapping a 2M range.
>
> It would be good to have a bit more description here.
>
> In any patch that changes a heuristic like this, I would like to see 
> some justification or reasoning that could be refuted or used as a 
> supporting argument if we ever wanted to change the heuristic later.
> Ideally with some of the obvious downsides listed as well.
>
> This "improves" things here, but what if it hurt things elsewhere, how 
> would we come in later and decide to change it back?
>
> THP flushes for example, I think now they'll do PID flushes (if they 
> have to be broadcast, which they will tend to be when khugepaged does
> them). So now that might increase jitter for THP and cause it to be a
> loss for more workloads.
>
> So where do you notice this? What's the benefit?

Ack. Needs some numbers and supporting evidence.

>> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
>> index aefc100d79a7..21d0f098e43b 100644
>> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
>> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
>> @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range);
>>   * invalidating a full PID, so it has a far lower threshold to change from
>>   * individual page flushes to full-pid flushes.
>>   */
>> -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
>> +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32;
>>  static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2;
>>  
>>  static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>> @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>>  	if (fullmm)
>>  		flush_pid = true;
>>  	else if (type == FLUSH_TYPE_GLOBAL)
>> -		flush_pid = nr_pages > tlb_single_page_flush_ceiling;
>> +		flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
>
> Arguably >= is nicer than > here, but this shouldn't be in the same 
> patch as the value change.
>
>>  	else
>>  		flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
>
> And it should change everything to be consistent. Although I'm not sure 
> it's worth changing even though I highly doubt any administrator would
> be tweaking this.

This made me look at how an administrator tweaks these thresholds, and
AFAICS the answer is "recompile the kernel"?

It looks like x86 have a debugfs file for tlb_single_page_flush_ceiling,
but we don't. I guess we meant to copy that but never did?

So at the moment both thresholds could just be #defines.

Making them tweakable at runtime would be nice, it would give us an
escape hatch if we ever hit a workload in production that wants a
different value.

cheers

^ permalink raw reply

* Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
From: Peter Zijlstra @ 2021-08-04  7:34 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Will Deacon, Aneesh Kumar K.V, linux-mm, linuxppc-dev
In-Reply-To: <1628058455.rphrivzjzb.astroid@bobo.none>

On Wed, Aug 04, 2021 at 04:39:44PM +1000, Nicholas Piggin wrote:
> For that matter, I wonder if we shouldn't do something like this 
> (untested) so the low level batch flush has visibility to the high
> level flush range.
> 
> x86 could use this too AFAIKS, just needs to pass the range a bit
> further down, but in practice I'm not sure it would ever really
> matter for them because the 2MB level exceeds the single page flush
> ceiling for 4k pages unlike powerpc with 64k pages. But in corner
> cases where the unmap crossed a bunch of small vmas or the ceiling
> was increased then in theory it could be of use.

x86 can do single invalidates for 2M pages if that's the only type
encountered in the range, at which point we can do 33*2M = 66M, which is
well below the 1G range for the next level of huge.

> Subject: [PATCH v1] mm/mmu_gather: provide archs with the entire range that is
>  to be flushed, not just the particular gather
> 
> This allows archs to optimise flushing heuristics better, in the face of
> flush operations forcing smaller flush batches. For example, an
> architecture may choose a more costly per-page invalidation for small
> ranges of pages with the assumption that the full TLB flush cost would
> be more expensive in terms of refills. However if a very large range is
> forced into flushing small ranges, the faster full-process flush may
> have been the better choice.

What is splitting up the flushes for you? That page_size power-special?

^ permalink raw reply

* Re: [PATCH] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
From: Nicholas Piggin @ 2021-08-04  8:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <b36623df00ef3d2296f928487b6e23f93a217afa.1628054802.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of August 4, 2021 3:27 pm:
> In those hot functions that are called at every interrupt, any saved
> cycle is worth it.
> 
> interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
> called from three places:
> - From entry_32.S
> - From interrupt_64.S
> - From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()
> 
> In entry_32.S, there are inambiguously called based on MSR_PR:
> 
> 	interrupt_return:
> 		lwz	r4,_MSR(r1)
> 		addi	r3,r1,STACK_FRAME_OVERHEAD
> 		andi.	r0,r4,MSR_PR
> 		beq	.Lkernel_interrupt_return
> 		bl	interrupt_exit_user_prepare
> 	...
> 	.Lkernel_interrupt_return:
> 		bl	interrupt_exit_kernel_prepare
> 
> In interrupt_64.S, that's similar:
> 
> 	interrupt_return_\srr\():
> 		ld	r4,_MSR(r1)
> 		andi.	r0,r4,MSR_PR
> 		beq	interrupt_return_\srr\()_kernel
> 	interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
> 		addi	r3,r1,STACK_FRAME_OVERHEAD
> 		bl	interrupt_exit_user_prepare
> 	...
> 	interrupt_return_\srr\()_kernel:
> 		addi	r3,r1,STACK_FRAME_OVERHEAD
> 		bl	interrupt_exit_kernel_prepare
> 
> In interrupt_exit_user_restart() and interrupt_exit_kernel_restart(),
> MSR_PR is verified respectively by BUG_ON(!user_mode(regs)) and
> BUG_ON(user_mode(regs)) prior to calling interrupt_exit_user_prepare()
> and interrupt_exit_kernel_prepare().
> 
> The verification in interrupt_exit_user_prepare() and
> interrupt_exit_kernel_prepare() are therefore useless and can be removed.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Probably okay to do now things are ironing out.

Unless we want to make a new define for interrupt handler debug and put 
a bunch of these asserts under it. There's quite a lot more here, and
in asm/interrupt.h, etc.

Thanks,
Nick

> ---
>  arch/powerpc/kernel/interrupt.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 21bbd615ca41..f26caf911ab5 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -465,7 +465,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
>  
>  	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>  		BUG_ON(!(regs->msr & MSR_RI));
> -	BUG_ON(!(regs->msr & MSR_PR));
>  	BUG_ON(arch_irq_disabled_regs(regs));
>  	CT_WARN_ON(ct_state() == CONTEXT_USER);
>  
> @@ -499,7 +498,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
>  	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x) &&
>  	    unlikely(!(regs->msr & MSR_RI)))
>  		unrecoverable_exception(regs);
> -	BUG_ON(regs->msr & MSR_PR);
>  	/*
>  	 * CT_WARN_ON comes here via program_check_exception,
>  	 * so avoid recursion.
> -- 
> 2.25.0
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
From: Christophe Leroy @ 2021-08-04  8:37 UTC (permalink / raw)
  To: Nicholas Piggin, Benjamin Herrenschmidt, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1628064412.48kzr1eula.astroid@bobo.none>



Le 04/08/2021 à 10:08, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of August 4, 2021 3:27 pm:
>> In those hot functions that are called at every interrupt, any saved
>> cycle is worth it.
>>
>> interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
>> called from three places:
>> - From entry_32.S
>> - From interrupt_64.S
>> - From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()
>>
>> In entry_32.S, there are inambiguously called based on MSR_PR:
>>
>> 	interrupt_return:
>> 		lwz	r4,_MSR(r1)
>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>> 		andi.	r0,r4,MSR_PR
>> 		beq	.Lkernel_interrupt_return
>> 		bl	interrupt_exit_user_prepare
>> 	...
>> 	.Lkernel_interrupt_return:
>> 		bl	interrupt_exit_kernel_prepare
>>
>> In interrupt_64.S, that's similar:
>>
>> 	interrupt_return_\srr\():
>> 		ld	r4,_MSR(r1)
>> 		andi.	r0,r4,MSR_PR
>> 		beq	interrupt_return_\srr\()_kernel
>> 	interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>> 		bl	interrupt_exit_user_prepare
>> 	...
>> 	interrupt_return_\srr\()_kernel:
>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>> 		bl	interrupt_exit_kernel_prepare
>>
>> In interrupt_exit_user_restart() and interrupt_exit_kernel_restart(),
>> MSR_PR is verified respectively by BUG_ON(!user_mode(regs)) and
>> BUG_ON(user_mode(regs)) prior to calling interrupt_exit_user_prepare()
>> and interrupt_exit_kernel_prepare().
>>
>> The verification in interrupt_exit_user_prepare() and
>> interrupt_exit_kernel_prepare() are therefore useless and can be removed.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> 
> Probably okay to do now things are ironing out.
> 
> Unless we want to make a new define for interrupt handler debug and put
> a bunch of these asserts under it. There's quite a lot more here, and
> in asm/interrupt.h, etc.

But that one is so trivial that I'm not sure there is any point in keeping it even as a kind of 
additional DEBUG level, unless you want those BUG_ONs because you don't trust the compiler.

Christophe

> 
> Thanks,
> Nick
> 
>> ---
>>   arch/powerpc/kernel/interrupt.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
>> index 21bbd615ca41..f26caf911ab5 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -465,7 +465,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs)
>>   
>>   	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x))
>>   		BUG_ON(!(regs->msr & MSR_RI));
>> -	BUG_ON(!(regs->msr & MSR_PR));
>>   	BUG_ON(arch_irq_disabled_regs(regs));
>>   	CT_WARN_ON(ct_state() == CONTEXT_USER);
>>   
>> @@ -499,7 +498,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs)
>>   	if (!IS_ENABLED(CONFIG_BOOKE) && !IS_ENABLED(CONFIG_40x) &&
>>   	    unlikely(!(regs->msr & MSR_RI)))
>>   		unrecoverable_exception(regs);
>> -	BUG_ON(regs->msr & MSR_PR);
>>   	/*
>>   	 * CT_WARN_ON comes here via program_check_exception,
>>   	 * so avoid recursion.
>> -- 
>> 2.25.0
>>
>>

^ permalink raw reply

* Re: [PATCH] powerpc: Remove MSR_PR check in interrupt_exit_{user/kernel}_prepare()
From: Nicholas Piggin @ 2021-08-04  8:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Christophe Leroy, Michael Ellerman,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <cd5f54fd-fbf4-e471-9971-1e8c86755754@csgroup.eu>

Excerpts from Christophe Leroy's message of August 4, 2021 6:37 pm:
> 
> 
> Le 04/08/2021 à 10:08, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of August 4, 2021 3:27 pm:
>>> In those hot functions that are called at every interrupt, any saved
>>> cycle is worth it.
>>>
>>> interrupt_exit_user_prepare() and interrupt_exit_kernel_prepare() are
>>> called from three places:
>>> - From entry_32.S
>>> - From interrupt_64.S
>>> - From interrupt_exit_user_restart() and interrupt_exit_kernel_restart()
>>>
>>> In entry_32.S, there are inambiguously called based on MSR_PR:
>>>
>>> 	interrupt_return:
>>> 		lwz	r4,_MSR(r1)
>>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>>> 		andi.	r0,r4,MSR_PR
>>> 		beq	.Lkernel_interrupt_return
>>> 		bl	interrupt_exit_user_prepare
>>> 	...
>>> 	.Lkernel_interrupt_return:
>>> 		bl	interrupt_exit_kernel_prepare
>>>
>>> In interrupt_64.S, that's similar:
>>>
>>> 	interrupt_return_\srr\():
>>> 		ld	r4,_MSR(r1)
>>> 		andi.	r0,r4,MSR_PR
>>> 		beq	interrupt_return_\srr\()_kernel
>>> 	interrupt_return_\srr\()_user: /* make backtraces match the _kernel variant */
>>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>>> 		bl	interrupt_exit_user_prepare
>>> 	...
>>> 	interrupt_return_\srr\()_kernel:
>>> 		addi	r3,r1,STACK_FRAME_OVERHEAD
>>> 		bl	interrupt_exit_kernel_prepare
>>>
>>> In interrupt_exit_user_restart() and interrupt_exit_kernel_restart(),
>>> MSR_PR is verified respectively by BUG_ON(!user_mode(regs)) and
>>> BUG_ON(user_mode(regs)) prior to calling interrupt_exit_user_prepare()
>>> and interrupt_exit_kernel_prepare().
>>>
>>> The verification in interrupt_exit_user_prepare() and
>>> interrupt_exit_kernel_prepare() are therefore useless and can be removed.
>>>
>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> 
>> Probably okay to do now things are ironing out.
>> 
>> Unless we want to make a new define for interrupt handler debug and put
>> a bunch of these asserts under it. There's quite a lot more here, and
>> in asm/interrupt.h, etc.
> 
> But that one is so trivial that I'm not sure there is any point in keeping it even as a kind of 
> additional DEBUG level, unless you want those BUG_ONs because you don't trust the compiler.

Fair point.

Acked-by: Nicholas Piggin <npiggin@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes
From: Srikar Dronamraju @ 2021-08-04 10:01 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Nathan Lynch, Gautham R Shenoy, Vincent Guittot, Rik van Riel,
	Peter Zijlstra, linuxppc-dev, Geetika Moolchandani, LKML,
	Dietmar Eggemann, Thomas Gleixner, Laurent Dufour, Mel Gorman,
	Ingo Molnar
In-Reply-To: <20210723143914.GI3836887@linux.vnet.ibm.com>

* Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2021-07-23 20:09:14]:

> * Valentin Schneider <valentin.schneider@arm.com> [2021-07-13 17:32:14]:
> 
> > On 12/07/21 18:18, Srikar Dronamraju wrote:
> > > Hi Valentin,
> > >
> > >> On 01/07/21 09:45, Srikar Dronamraju wrote:
> > >> > @@ -1891,12 +1894,30 @@ void sched_init_numa(void)
> > >> >  void sched_domains_numa_masks_set(unsigned int cpu)
> > >> >  {
> > >

Hey Valentin / Peter

Did you get a chance to look at this?

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes
From: Valentin Schneider @ 2021-08-04 10:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Vincent Guittot, Rik van Riel,
	Peter Zijlstra, linuxppc-dev, Geetika Moolchandani, LKML,
	Dietmar Eggemann, Thomas Gleixner, Laurent Dufour, Mel Gorman,
	Ingo Molnar
In-Reply-To: <20210804100155.GE4072958@linux.vnet.ibm.com>

On 04/08/21 15:31, Srikar Dronamraju wrote:
> * Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2021-07-23 20:09:14]:
>
>> * Valentin Schneider <valentin.schneider@arm.com> [2021-07-13 17:32:14]:
>>
>> > On 12/07/21 18:18, Srikar Dronamraju wrote:
>> > > Hi Valentin,
>> > >
>> > >> On 01/07/21 09:45, Srikar Dronamraju wrote:
>> > >> > @@ -1891,12 +1894,30 @@ void sched_init_numa(void)
>> > >> >  void sched_domains_numa_masks_set(unsigned int cpu)
>> > >> >  {
>> > >
>
> Hey Valentin / Peter
>
> Did you get a chance to look at this?
>

Barely, I wanted to set some time aside to stare at this and have been
failing miserably. Let me bump it up my todolist, I'll get to it before the
end of the week.

> --
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox