LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next 3/3] crypto: caam: Replace in_irq() usage.
From: Horia Geantă @ 2020-11-02 21:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev@vger.kernel.org
  Cc: Aymen Sghaier, Herbert Xu, Madalin Bucur, Leo Li,
	linux-crypto@vger.kernel.org, Jakub Kicinski, Thomas Gleixner,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20201101232257.3028508-4-bigeasy@linutronix.de>

On 11/2/2020 1:23 AM, Sebastian Andrzej Siewior wrote:
> The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
> scheduling is required or packet processing.
> 
> The usage of in_*() in drivers is phased out and Linus clearly requested
> that code which changes behaviour depending on context should either be
> separated or the context be conveyed in an argument passed by the caller,
> which usually knows the context.
> 
> Use the `sched_napi' argument passed by the callback. It is set true if
> called from the interrupt handler and NAPI should be scheduled.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: "Horia Geantă" <horia.geanta@nxp.com>
> Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Madalin Bucur <madalin.bucur@nxp.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Li Yang <leoyang.li@nxp.com>
> Cc: linux-crypto@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>

Thanks,
Horia

^ permalink raw reply

* Re: [PATCH v2 net-next 1/3] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.
From: Horia Geantă @ 2020-11-02 21:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev@vger.kernel.org, Priyanka Jain,
	Vakul Garg
  Cc: Aymen Sghaier, Herbert Xu, Madalin Bucur, Leo Li,
	linux-crypto@vger.kernel.org, Jakub Kicinski, Thomas Gleixner,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20201101232257.3028508-2-bigeasy@linutronix.de>

On 11/2/2020 1:23 AM, Sebastian Andrzej Siewior wrote:
> dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
> invoked from:
> 
>  - Hard interrupt context
>  - Any context which is not serving soft interrupts
> 
> Any context which is not serving soft interrupts includes hard interrupts
> so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
> about this:
> 
>         /*
>          * In case of threaded ISR, for RT kernels in_irq() does not return
>          * appropriate value, so use in_serving_softirq to distinguish between
>          * softirq and irq contexts.
>          */
>          if (in_irq() || !in_serving_softirq())
> 
> This has nothing to do with RT. Even on a non RT kernel force threaded
> interrupts run obviously in thread context and therefore in_irq() returns
> false when invoked from the handler.
> 
> The extension of the in_irq() check with !in_serving_softirq() was there
> when the drivers were added, but in the out of tree FSL BSP the original
> condition was in_irq() which got extended due to failures on RT.
> 
Looks like the initial FSL BSP commit adding this check is:
edca0b7a448a ("dpaa_eth: Fix Rx-stall issue in threaded ISR")
https://source.codeaurora.org/external/qoriq/qoriq-yocto-sdk/linux/commit/?h=fsl-sdk-v1.2&id=edca0b7a448ac18ef0a9b1238209b7595d511e19

This was done for dpaa_eth and the same logic was reused in caam.
In the process of upstreaming the development history got lost and
the comment in dpaa_eth was removed.

This was back in 2012 on a v3.0.34 kernel.
Not sure if/how things changed in the meantime, i.e. whether in_irq()
behaviour when called from softirq changed on -rt kernels (assuming this was
the problem Priyanka tried solving).

> The usage of in_xxx() in drivers is phased out and Linus clearly requested
> that code which changes behaviour depending on context should either be
> separated or the context be conveyed in an argument passed by the caller,
> which usually knows the context. Right he is, the above construct is
> clearly showing why.
> 
> The following callchains have been analyzed to end up in
> dpaa_eth_napi_schedule():
> 
> qman_p_poll_dqrr()
>   __poll_portal_fast()
>     fq->cb.dqrr()
>        dpaa_eth_napi_schedule()
> 
> portal_isr()
>   __poll_portal_fast()
>     fq->cb.dqrr()
>        dpaa_eth_napi_schedule()
> 
> Both need to schedule NAPI.
Only the call from interrupt context.

> The crypto part has another code path leading up to this:
>   kill_fq()
>      empty_retired_fq()
>        qman_p_poll_dqrr()
>          __poll_portal_fast()
>             fq->cb.dqrr()
>                dpaa_eth_napi_schedule()
> 
> kill_fq() is called from task context and ends up scheduling NAPI, but
> that's pointless and an unintended side effect of the !in_serving_softirq()
> check.
> 
Correct.

> The code path:
>   caam_qi_poll() -> qman_p_poll_dqrr()
> 
> is invoked from NAPI and I *assume* from crypto's NAPI device and not
> from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
> (because this is what happens now) but could be changed if it is wrong
> due to `budget' handling.
> 
Looks good to me.

> Add an argument to __poll_portal_fast() which is true if NAPI needs to be
> scheduled. This requires propagating the value to the caller including
> `qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: "Horia Geantă" <horia.geanta@nxp.com>
> Cc: Aymen Sghaier <aymen.sghaier@nxp.com>
> Cc: Herbert XS <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Madalin Bucur <madalin.bucur@nxp.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Li Yang <leoyang.li@nxp.com>
> Cc: linux-crypto@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>

Thanks,
Horia

^ permalink raw reply

* Kernel panic from malloc() on SUSE 15.1?
From: Carl Jacobsen @ 2020-11-02 20:14 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]

I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
simple
test program, built in a slightly unusual way.

I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
copy of libcrypto.a (from openssl-1.1.1g), built without threads.
I have a 10 line C test program that compiles and runs fine on the
SUSE 12 system. If I compile the same program on SUSE 15.1 (with
gcc 7.4.1), it runs fine on SUSE 15.1.

But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
system, the call to RAND_status() gets to a malloc() and then panics.
(And, of course, if I just compile a call to malloc(), that runs fine
on both systems.) Here's the test program, it's really just a call to
RAND_status():

    #include <stdio.h>
    #include <openssl/rand.h>

    int main(int argc, char **argv)
    {
        int has_enough_data = RAND_status();
        printf("The PRNG %s been seeded with enough data\n",
               has_enough_data ? "HAS" : "has NOT");
        return 0;
    }

openssl is configured/built with:
    ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
    make

and the test program is compiled with:
    gcc -ggdb3 -o rand_test rand_test.c libcrypto.a

The kernel on SUSE 12 is: 3.12.28-4-default
And glibc is: 2.19

The kernel on SUSE 15.1 is: 4.12.14-197.18-default
And glibc is: 2.26

In a previous iteration it was panicking in pthread_once(), so
I compiled openssl without pthreads support, and now it panics
calling malloc().

If I link to the system-supplied libcrypto.so, it works fine, and
running the same tests on x86_64 works fine, it's only ppc64le
that panics, and only running code from the old system on the
new one.

I'm trying to dig further down into this to come up with a standalone
test case, but I'm wondering if anything here stands out as a known
problem, or if someone can point me in the right direction.

Thanks,
Carl Jacobsen

[-- Attachment #2: Type: text/html, Size: 2231 bytes --]

^ permalink raw reply

* Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output
From: Gautham R Shenoy @ 2020-11-02 15:06 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Gautham R. Shenoy, Jason A. Donenfeld, Heikki Krogerus,
	Peter Meerwald-Stadler, Petr Mladek, Linux Doc Mailing List,
	Alexander Shishkin, Nayna Jain, Alexandre Belloni, Mimi Zohar,
	Sebastian Reichel, Guenter Roeck, Bruno Meneguele, Vishal Verma,
	Pavel Machek, Hanjun Guo, Mauro Carvalho Chehab, netdev,
	Oleh Kravchenko, Dan Williams, Andrew Donnellan,
	Javier González, Fabrice Gasnier, Stefano Stabellini,
	linux-acpi, Jonathan Corbet, Chunyan Zhang, Mario Limonciello,
	linux-stm32, Lakshmi Ramasubramanian, Ludovic Desroches,
	Pawan Gupta, linux-arm-kernel, Frederic Barrat, Niklas Cassel,
	Len Brown, Juergen Gross, Mika Westerberg, Alexandre Torgue,
	linux-pm, linux-kernel, Richard Cochran, linuxppc-dev,
	Baolin Wang, Lars-Peter Clausen, Dan Murphy, Orson Zhai,
	Philippe Bergheaud, xen-devel, Boris Ostrovsky, Andy Shevchenko,
	Benson Leung, Konstantin Khlebnikov, Jens Axboe, Felipe Balbi,
	Kranthi Kuntala, Martin K. Petersen, linux-mm, Greg Kroah-Hartman,
	linux-usb, Rafael J. Wysocki, Nicolas Ferre, linux-iio,
	Thinh Nguyen, Sergey Senozhatsky, Thomas Gleixner,
	Leonid Maksymchuk, Maxime Coquelin, Johannes Thumshirn,
	Enric Balletbo i Serra, Vineela Tummalapalli, Peter Rosin,
	Jonathan Cameron, Mike Kravetz
In-Reply-To: <4ebaaa0320101479e392ce2db4b62e24fdf15ef1.1603893146.git.mchehab+huawei@kernel.org>

On Wed, Oct 28, 2020 at 03:23:18PM +0100, Mauro Carvalho Chehab wrote:
> From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> 
> Some files over there won't parse well by Sphinx.
> 

[..snip..]



> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b555df825447..274c337ec6a9 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -151,23 +151,28 @@ Description:
>  		The processor idle states which are available for use have the
>  		following attributes:
> 
> -		name: (RO) Name of the idle state (string).
> +		======== ==== =================================================
> +		name:	 (RO) Name of the idle state (string).
> 
>  		latency: (RO) The latency to exit out of this idle state (in
> -		microseconds).
> +			      microseconds).
> 
> -		power: (RO) The power consumed while in this idle state (in
> -		milliwatts).
> +		power:   (RO) The power consumed while in this idle state (in
> +			      milliwatts).
> 
> -		time: (RO) The total time spent in this idle state (in microseconds).
> +		time:    (RO) The total time spent in this idle state
> +			      (in microseconds).
> 
> -		usage: (RO) Number of times this state was entered (a count).
> +		usage:	 (RO) Number of times this state was entered (a count).
> 
> -		above: (RO) Number of times this state was entered, but the
> -		       observed CPU idle duration was too short for it (a count).
> +		above:	 (RO) Number of times this state was entered, but the
> +			      observed CPU idle duration was too short for it
> +			      (a count).
> 
> -		below: (RO) Number of times this state was entered, but the
> -		       observed CPU idle duration was too long for it (a count).
> +		below: 	 (RO) Number of times this state was entered, but the
> +			      observed CPU idle duration was too long for it
> +			      (a count).
> +		======== ==== =================================================
> 
>  What:		/sys/devices/system/cpu/cpuX/cpuidle/stateN/desc
>  Date:		February 2008
> @@ -290,6 +295,7 @@ Description:	Processor frequency boosting control
>  		This switch controls the boost setting for the whole system.
>  		Boosting allows the CPU and the firmware to run at a frequency
>  		beyound it's nominal limit.
> +
>  		More details can be found in
>  		Documentation/admin-guide/pm/cpufreq.rst
> 

The changes to cpuidle states look good to me.


[..snip..]

> @@ -414,30 +434,30 @@ Description:	POWERNV CPUFreq driver's frequency throttle stats directory and
>  		throttle attributes exported in the 'throttle_stats' directory:
> 
>  		- turbo_stat : This file gives the total number of times the max
> -		frequency is throttled to lower frequency in turbo (at and above
> -		nominal frequency) range of frequencies.
> +		  frequency is throttled to lower frequency in turbo (at and above
> +		  nominal frequency) range of frequencies.
> 
>  		- sub_turbo_stat : This file gives the total number of times the
> -		max frequency is throttled to lower frequency in sub-turbo(below
> -		nominal frequency) range of frequencies.
> +		  max frequency is throttled to lower frequency in sub-turbo(below
> +		  nominal frequency) range of frequencies.
> 
>  		- unthrottle : This file gives the total number of times the max
> -		frequency is unthrottled after being throttled.
> +		  frequency is unthrottled after being throttled.
> 
>  		- powercap : This file gives the total number of times the max
> -		frequency is throttled due to 'Power Capping'.
> +		  frequency is throttled due to 'Power Capping'.
> 
>  		- overtemp : This file gives the total number of times the max
> -		frequency is throttled due to 'CPU Over Temperature'.
> +		  frequency is throttled due to 'CPU Over Temperature'.
> 
>  		- supply_fault : This file gives the total number of times the
> -		max frequency is throttled due to 'Power Supply Failure'.
> +		  max frequency is throttled due to 'Power Supply Failure'.
> 
>  		- overcurrent : This file gives the total number of times the
> -		max frequency is throttled due to 'Overcurrent'.
> +		  max frequency is throttled due to 'Overcurrent'.
> 
>  		- occ_reset : This file gives the total number of times the max
> -		frequency is throttled due to 'OCC Reset'.
> +		  frequency is throttled due to 'OCC Reset'.
> 
>  		The sysfs attributes representing different throttle reasons like
>  		powercap, overtemp, supply_fault, overcurrent and occ_reset map to


This hunk for the powernv cpufreq driver looks good to me.
For these two hunks,

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>



^ permalink raw reply

* Re: [PATCH v2 20/39] docs: ABI: testing: make the files compatible with ReST output
From: Mauro Carvalho Chehab @ 2020-11-02 14:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Gautham R. Shenoy, Jason A. Donenfeld, Heikki Krogerus,
	Peter Meerwald-Stadler, Petr Mladek, Linux Doc Mailing List,
	Alexander Shishkin, Nayna Jain, Jonathan Cameron,
	Alexandre Belloni, Mimi Zohar, Sebastian Reichel, linux-mm,
	Bruno Meneguele, Vishal Verma, Pavel Machek, Hanjun Guo,
	Guenter Roeck, netdev, Oleh Kravchenko, Dan Williams,
	Andrew Donnellan, Javier González, Fabrice Gasnier,
	Mark Gross, linux-acpi, Jonathan Corbet, Chunyan Zhang,
	Mario Limonciello, linux-stm32, Lakshmi Ramasubramanian,
	Ludovic Desroches, Pawan Gupta, linux-arm-kernel, Tom Rix,
	Frederic Barrat, Niklas Cassel, Len Brown, Juergen Gross,
	linuxppc-dev, Mika Westerberg, Alexandre Torgue, linux-pm,
	linux-kernel, Richard Cochran, Oded Gabbay, Baolin Wang,
	Lars-Peter Clausen, Dan Murphy, Orson Zhai, Philippe Bergheaud,
	xen-devel, Boris Ostrovsky, Andy Shevchenko, Benson Leung,
	Konstantin Khlebnikov, Jens Axboe, Felipe Balbi, Kranthi Kuntala,
	Martin K. Petersen, Johannes Thumshirn, linux-usb,
	Rafael J. Wysocki, Nicolas Ferre, linux-iio, Thinh Nguyen,
	Sergey Senozhatsky, Stefano Stabellini, Thomas Gleixner,
	Leonid Maksymchuk, Maxime Coquelin, Enric Balletbo i Serra,
	Vaibhav Jain, Vineela Tummalapalli, Peter Rosin, Jonathan Cameron,
	Mike Kravetz
In-Reply-To: <20201102124641.GA881895@kroah.com>

Em Mon, 2 Nov 2020 13:46:41 +0100
Greg Kroah-Hartman <gregkh@linuxfoundation.org> escreveu:

> On Mon, Nov 02, 2020 at 12:04:36PM +0100, Fabrice Gasnier wrote:
> > On 10/30/20 11:09 AM, Mauro Carvalho Chehab wrote:  
> > > Em Fri, 30 Oct 2020 10:19:12 +0100
> > > Fabrice Gasnier <fabrice.gasnier@st.com> escreveu:
> > >   
> > >> Hi Mauro,
> > >>
> > >> [...]
> > >>  
> > >>>  
> > >>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
> > >>> +KernelVersion:	4.12
> > >>> +Contact:	benjamin.gaignard@st.com
> > >>> +Description:
> > >>> +		Reading returns the list possible quadrature modes.
> > >>> +
> > >>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode
> > >>> +KernelVersion:	4.12
> > >>> +Contact:	benjamin.gaignard@st.com
> > >>> +Description:
> > >>> +		Configure the device counter quadrature modes:
> > >>> +
> > >>> +		channel_A:
> > >>> +			Encoder A input servers as the count input and B as
> > >>> +			the UP/DOWN direction control input.
> > >>> +
> > >>> +		channel_B:
> > >>> +			Encoder B input serves as the count input and A as
> > >>> +			the UP/DOWN direction control input.
> > >>> +
> > >>> +		quadrature:
> > >>> +			Encoder A and B inputs are mixed to get direction
> > >>> +			and count with a scale of 0.25.
> > >>> +    
> > >>  
> > > 
> > > Hi Fabrice,
> > >   
> > >> I just noticed that since Jonathan question in v1.
> > >>
> > >> Above ABI has been moved in the past as discussed in [1]. You can take a
> > >> look at:
> > >> b299d00 IIO: stm32: Remove quadrature related functions from trigger driver
> > >>
> > >> Could you please remove the above chunk ?
> > >>
> > >> With that, for the stm32 part:
> > >> Acked-by: Fabrice Gasnier <fabrice.gasnier@st.com>  
> > > 
> > > 
> > > Hmm... probably those were re-introduced due to a rebase. This
> > > series were originally written about 1,5 years ago.
> > > 
> > > I'll drop those hunks.  
> > 
> > Hi Mauro, Greg,
> > 
> > I just figured out this patch has been applied with above hunk.
> > 
> > This should be dropped: is there a fix on its way already ?
> > (I may have missed it)  
> 
> Can you send a fix for just this hunk?

Hmm...

	$ git grep /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
	Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8:What:                /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
	Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:What:             /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
	Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:What:               /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available

Even re-doing the changes from 
changeset b299d00420e2 ("IIO: stm32: Remove quadrature related functions from trigger driver")
at Documentation/ABI/testing/sysfs-bus-iio-timer-stm32, there's still
a third duplicate of some of those, as reported by the script:

	$ ./scripts/get_abi.pl validate 2>&1|grep quadra
	Warning: /sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:117  Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:14
	Warning: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available is defined 3 times:  Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8:2  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:111  Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:8

As in_count_quadrature_mode_available is also defined at:
	Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8:2

The best here seems to have a patch that will also drop the other
duplication of this, probably moving in_count_quadrature_mode_available
to a generic node probably placing it inside 
Documentation/ABI/testing/sysfs-bus-iio.

Comments?

Thanks,
Mauro

PS.: the IIO subsystem is the one that currently has more duplicated
ABI entries:

$ ./scripts/get_abi.pl validate 2>&1|grep iio
Warning: /sys/bus/iio/devices/iio:deviceX/in_accel_x_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:0  Documentation/ABI/testing/sysfs-bus-iio:394
Warning: /sys/bus/iio/devices/iio:deviceX/in_accel_y_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:1  Documentation/ABI/testing/sysfs-bus-iio:395
Warning: /sys/bus/iio/devices/iio:deviceX/in_accel_z_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:2  Documentation/ABI/testing/sysfs-bus-iio:396
Warning: /sys/bus/iio/devices/iio:deviceX/in_anglvel_x_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:3  Documentation/ABI/testing/sysfs-bus-iio:397
Warning: /sys/bus/iio/devices/iio:deviceX/in_anglvel_y_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:4  Documentation/ABI/testing/sysfs-bus-iio:398
Warning: /sys/bus/iio/devices/iio:deviceX/in_anglvel_z_calibbias is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-icm42600:5  Documentation/ABI/testing/sysfs-bus-iio:399
Warning: /sys/bus/iio/devices/iio:deviceX/in_count0_preset is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:100  Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:0
Warning: /sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:117  Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:14
Warning: /sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available is defined 3 times:  Documentation/ABI/testing/sysfs-bus-iio-counter-104-quad-8:2  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:111  Documentation/ABI/testing/sysfs-bus-iio-lptimer-stm32:8
Warning: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_frequency is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-frequency-adf4371:0  Documentation/ABI/testing/sysfs-bus-iio:599
Warning: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_powerdown is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-frequency-adf4371:36  Documentation/ABI/testing/sysfs-bus-iio:588
Warning: /sys/bus/iio/devices/iio:deviceX/out_currentY_raw is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-light-lm3533-als:43  Documentation/ABI/testing/sysfs-bus-iio-health-afe440x:38
Warning: /sys/bus/iio/devices/iio:deviceX/out_current_heater_raw is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-humidity-hdc2010:0  Documentation/ABI/testing/sysfs-bus-iio-humidity-hdc100x:0
Warning: /sys/bus/iio/devices/iio:deviceX/out_current_heater_raw_available is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-humidity-hdc2010:1  Documentation/ABI/testing/sysfs-bus-iio-humidity-hdc100x:1
Warning: /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-distance-srf08:0  Documentation/ABI/testing/sysfs-bus-iio-proximity-as3935:8
Warning: /sys/bus/iio/devices/triggerX/sampling_frequency is defined 2 times:  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32:92  Documentation/ABI/testing/sysfs-bus-iio:45

^ permalink raw reply

* Re: [PATCH v2 20/39] docs: ABI: testing: make the files compatible with ReST output
From: Greg Kroah-Hartman @ 2020-11-02 12:46 UTC (permalink / raw)
  To: Fabrice Gasnier
  Cc: Gautham R. Shenoy, Jason A. Donenfeld, Heikki Krogerus,
	Peter Meerwald-Stadler, Petr Mladek, Linux Doc Mailing List,
	Alexander Shishkin, Nayna Jain, Jonathan Cameron,
	Alexandre Belloni, Mimi Zohar, Sebastian Reichel, linux-mm,
	Bruno Meneguele, Vishal Verma, Pavel Machek, Hanjun Guo,
	Guenter Roeck, netdev, Oleh Kravchenko, Dan Williams,
	Andrew Donnellan, Javier González, Lars-Peter Clausen,
	Mark Gross, linux-acpi, Jonathan Corbet, Mauro Carvalho Chehab,
	Chunyan Zhang, Mario Limonciello, linux-stm32,
	Lakshmi Ramasubramanian, Ludovic Desroches, Pawan Gupta,
	linux-arm-kernel, Tom Rix, Frederic Barrat, Niklas Cassel,
	Len Brown, Juergen Gross, linuxppc-dev, Mika Westerberg,
	Alexandre Torgue, linux-pm, linux-kernel, Richard Cochran,
	Oded Gabbay, Baolin Wang, Stefano Stabellini, Dan Murphy,
	Orson Zhai, Philippe Bergheaud, xen-devel, Boris Ostrovsky,
	Andy Shevchenko, Benson Leung, Konstantin Khlebnikov, Jens Axboe,
	Felipe Balbi, Kranthi Kuntala, Martin K. Petersen,
	Johannes Thumshirn, linux-usb, Rafael J. Wysocki, Nicolas Ferre,
	linux-iio, Thinh Nguyen, Sergey Senozhatsky, Thomas Gleixner,
	Leonid Maksymchuk, Maxime Coquelin, Enric Balletbo i Serra,
	Vaibhav Jain, Vineela Tummalapalli, Peter Rosin, Jonathan Cameron,
	Mike Kravetz
In-Reply-To: <cb586ea3-b6e6-4e48-2344-2bd641e5323f@st.com>

On Mon, Nov 02, 2020 at 12:04:36PM +0100, Fabrice Gasnier wrote:
> On 10/30/20 11:09 AM, Mauro Carvalho Chehab wrote:
> > Em Fri, 30 Oct 2020 10:19:12 +0100
> > Fabrice Gasnier <fabrice.gasnier@st.com> escreveu:
> > 
> >> Hi Mauro,
> >>
> >> [...]
> >>
> >>>  
> >>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
> >>> +KernelVersion:	4.12
> >>> +Contact:	benjamin.gaignard@st.com
> >>> +Description:
> >>> +		Reading returns the list possible quadrature modes.
> >>> +
> >>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode
> >>> +KernelVersion:	4.12
> >>> +Contact:	benjamin.gaignard@st.com
> >>> +Description:
> >>> +		Configure the device counter quadrature modes:
> >>> +
> >>> +		channel_A:
> >>> +			Encoder A input servers as the count input and B as
> >>> +			the UP/DOWN direction control input.
> >>> +
> >>> +		channel_B:
> >>> +			Encoder B input serves as the count input and A as
> >>> +			the UP/DOWN direction control input.
> >>> +
> >>> +		quadrature:
> >>> +			Encoder A and B inputs are mixed to get direction
> >>> +			and count with a scale of 0.25.
> >>> +  
> >>
> > 
> > Hi Fabrice,
> > 
> >> I just noticed that since Jonathan question in v1.
> >>
> >> Above ABI has been moved in the past as discussed in [1]. You can take a
> >> look at:
> >> b299d00 IIO: stm32: Remove quadrature related functions from trigger driver
> >>
> >> Could you please remove the above chunk ?
> >>
> >> With that, for the stm32 part:
> >> Acked-by: Fabrice Gasnier <fabrice.gasnier@st.com>
> > 
> > 
> > Hmm... probably those were re-introduced due to a rebase. This
> > series were originally written about 1,5 years ago.
> > 
> > I'll drop those hunks.
> 
> Hi Mauro, Greg,
> 
> I just figured out this patch has been applied with above hunk.
> 
> This should be dropped: is there a fix on its way already ?
> (I may have missed it)

Can you send a fix for just this hunk?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2 20/39] docs: ABI: testing: make the files compatible with ReST output
From: Fabrice Gasnier @ 2020-11-02 11:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Greg Kroah-Hartman
  Cc: Gautham R. Shenoy, Jason A. Donenfeld, Heikki Krogerus,
	Peter Meerwald-Stadler, Petr Mladek, Linux Doc Mailing List,
	Alexander Shishkin, Nayna Jain, Jonathan Cameron,
	Alexandre Belloni, Mimi Zohar, Sebastian Reichel, linux-mm,
	Bruno Meneguele, Vishal Verma, Pavel Machek, Hanjun Guo,
	Guenter Roeck, netdev, Oleh Kravchenko, Dan Williams,
	Andrew Donnellan, Javier González, Lars-Peter Clausen,
	Mark Gross, linux-acpi, Jonathan Corbet, Chunyan Zhang,
	Mario Limonciello, linux-stm32, Lakshmi Ramasubramanian,
	Ludovic Desroches, Pawan Gupta, linux-arm-kernel, Tom Rix,
	Frederic Barrat, Niklas Cassel, Len Brown, Juergen Gross,
	linuxppc-dev, Mika Westerberg, Alexandre Torgue, linux-pm,
	linux-kernel, Richard Cochran, Oded Gabbay, Baolin Wang,
	Stefano Stabellini, Dan Murphy, Orson Zhai, Philippe Bergheaud,
	xen-devel, Boris Ostrovsky, Andy Shevchenko, Benson Leung,
	Konstantin Khlebnikov, Jens Axboe, Felipe Balbi, Kranthi Kuntala,
	Martin K. Petersen, Johannes Thumshirn, linux-usb,
	Rafael J. Wysocki, Nicolas Ferre, linux-iio, Thinh Nguyen,
	Sergey Senozhatsky, Thomas Gleixner, Leonid Maksymchuk,
	Maxime Coquelin, Enric Balletbo i Serra, Vaibhav Jain,
	Vineela Tummalapalli, Peter Rosin, Jonathan Cameron, Mike Kravetz
In-Reply-To: <20201030110925.3e09d59e@coco.lan>

On 10/30/20 11:09 AM, Mauro Carvalho Chehab wrote:
> Em Fri, 30 Oct 2020 10:19:12 +0100
> Fabrice Gasnier <fabrice.gasnier@st.com> escreveu:
> 
>> Hi Mauro,
>>
>> [...]
>>
>>>  
>>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count_quadrature_mode_available
>>> +KernelVersion:	4.12
>>> +Contact:	benjamin.gaignard@st.com
>>> +Description:
>>> +		Reading returns the list possible quadrature modes.
>>> +
>>> +What:		/sys/bus/iio/devices/iio:deviceX/in_count0_quadrature_mode
>>> +KernelVersion:	4.12
>>> +Contact:	benjamin.gaignard@st.com
>>> +Description:
>>> +		Configure the device counter quadrature modes:
>>> +
>>> +		channel_A:
>>> +			Encoder A input servers as the count input and B as
>>> +			the UP/DOWN direction control input.
>>> +
>>> +		channel_B:
>>> +			Encoder B input serves as the count input and A as
>>> +			the UP/DOWN direction control input.
>>> +
>>> +		quadrature:
>>> +			Encoder A and B inputs are mixed to get direction
>>> +			and count with a scale of 0.25.
>>> +  
>>
> 
> Hi Fabrice,
> 
>> I just noticed that since Jonathan question in v1.
>>
>> Above ABI has been moved in the past as discussed in [1]. You can take a
>> look at:
>> b299d00 IIO: stm32: Remove quadrature related functions from trigger driver
>>
>> Could you please remove the above chunk ?
>>
>> With that, for the stm32 part:
>> Acked-by: Fabrice Gasnier <fabrice.gasnier@st.com>
> 
> 
> Hmm... probably those were re-introduced due to a rebase. This
> series were originally written about 1,5 years ago.
> 
> I'll drop those hunks.

Hi Mauro, Greg,

I just figured out this patch has been applied with above hunk.

This should be dropped: is there a fix on its way already ?
(I may have missed it)

Please advise,
Fabrice
> 
> Thanks!
> Mauro
> 

^ permalink raw reply

* [PATCH 11/11 v2.2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 19:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102124606.72bd89c5@gandalf.local.home>

From c532ff6b048dd4a12943b05c7b8ce30666c587c8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Date: Thu, 29 Oct 2020 15:27:06 -0400
Subject: [PATCH] ftrace: Add recording of functions that caused recursion

This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

Changes since v2.1:
  Added EXPORT_SYMBOL_GPL() to ftrace_record_recursion() function

 Documentation/trace/ftrace-uses.rst   |   6 +-
 arch/csky/kernel/probes/ftrace.c      |   2 +-
 arch/parisc/kernel/ftrace.c           |   2 +-
 arch/powerpc/kernel/kprobes-ftrace.c  |   2 +-
 arch/s390/kernel/ftrace.c             |   2 +-
 arch/x86/kernel/kprobes/ftrace.c      |   2 +-
 fs/pstore/ftrace.c                    |   2 +-
 include/linux/trace_recursion.h       |  32 +++-
 kernel/livepatch/patch.c              |   2 +-
 kernel/trace/Kconfig                  |  25 +++
 kernel/trace/Makefile                 |   1 +
 kernel/trace/ftrace.c                 |   4 +-
 kernel/trace/trace_event_perf.c       |   2 +-
 kernel/trace/trace_functions.c        |   2 +-
 kernel/trace/trace_output.c           |   6 +-
 kernel/trace/trace_output.h           |   1 +
 kernel/trace/trace_recursion_record.c | 236 ++++++++++++++++++++++++++
 17 files changed, 309 insertions(+), 20 deletions(-)
 create mode 100644 kernel/trace/trace_recursion_record.c

diff --git a/Documentation/trace/ftrace-uses.rst b/Documentation/trace/ftrace-uses.rst
index 86cd14b8e126..5981d5691745 100644
--- a/Documentation/trace/ftrace-uses.rst
+++ b/Documentation/trace/ftrace-uses.rst
@@ -118,7 +118,7 @@ can help in this regard. If you start your code with:
 
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
@@ -130,7 +130,9 @@ The code in between will be safe to use, even if it ends up calling a
 function that the callback is tracing. Note, on success,
 ftrace_test_recursion_trylock() will disable preemption, and the
 ftrace_test_recursion_unlock() will enable it again (if it was previously
-enabled).
+enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
+ftrace_test_recursion_trylock() to record where the recursion happened
+(if CONFIG_FTRACE_RECORD_RECURSION is set).
 
 Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
 (as explained below), then a helper trampoline will be used to test
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 5eb2604fdf71..f30b179924ef 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -18,7 +18,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p;
 	struct kprobe_ctlblk *kcb;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 4b1fdf15662c..8b0ed7c5a4ab 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -210,7 +210,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip);
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 5df8d50c65ae..fdfee39938ea 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -20,7 +20,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(nip, parent_nip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 88466d7fb6b2..a1556333d481 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -204,7 +204,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip);
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index a40a6cdfcca3..954d930a7127 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -20,7 +20,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c
index 816210fc5d3a..adb0935eb062 100644
--- a/fs/pstore/ftrace.c
+++ b/fs/pstore/ftrace.c
@@ -41,7 +41,7 @@ static void notrace pstore_ftrace_call(unsigned long ip,
 	if (unlikely(oops_in_progress))
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ac3d73484cb2..1cba5fe8777a 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -142,7 +142,28 @@ static __always_inline int trace_get_context_bit(void)
 			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
 }
 
-static __always_inline int trace_test_and_set_recursion(int start, int max)
+#ifdef CONFIG_FTRACE_RECORD_RECURSION
+extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
+/*
+* The paranoid_test check can cause dropped reports (unlikely), but
+* if the recursion is common, it will likely still be recorded later.
+* But the paranoid_test is needed to make sure we don't crash.
+*/
+# define do_ftrace_record_recursion(ip, pip)				\
+	do {								\
+		static atomic_t paranoid_test;				\
+		if (!atomic_read(&paranoid_test)) {			\
+			atomic_inc(&paranoid_test);			\
+			ftrace_record_recursion(ip, pip);		\
+			atomic_dec(&paranoid_test);			\
+		}							\
+	} while (0)
+#else
+# define do_ftrace_record_recursion(ip, pip)	do { } while (0)
+#endif
+
+static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
+							int start, int max)
 {
 	unsigned int val = current->trace_recursion;
 	int bit;
@@ -158,8 +179,10 @@ static __always_inline int trace_test_and_set_recursion(int start, int max)
 		 * a switch between contexts. Allow for a single recursion.
 		 */
 		bit = TRACE_TRANSITION_BIT;
-		if (trace_recursion_test(bit))
+		if (trace_recursion_test(bit)) {
+			do_ftrace_record_recursion(ip, pip);
 			return -1;
+		}
 		trace_recursion_set(bit);
 		barrier();
 		return bit + 1;
@@ -199,9 +222,10 @@ static __always_inline void trace_clear_recursion(int bit)
  * Returns: -1 if a recursion happened.
  *           >= 0 if no recursion
  */
-static __always_inline int ftrace_test_recursion_trylock(void)
+static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
+							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
 }
 
 /**
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index 15480bf3ce88..875c5dbbdd33 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,7 +49,7 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 
 	ops = container_of(fops, struct klp_ops, fops);
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (WARN_ON_ONCE(bit < 0))
 		return;
 	/*
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a4020c0b4508..9b11c096d139 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -727,6 +727,31 @@ config TRACE_EVAL_MAP_FILE
 
 	If unsure, say N.
 
+config FTRACE_RECORD_RECURSION
+	bool "Record functions that recurse in function tracing"
+	depends on FUNCTION_TRACER
+	help
+	  All callbacks that attach to the function tracing have some sort
+	  of protection against recursion. Even though the protection exists,
+	  it adds overhead. This option will create a file in the tracefs
+	  file system called "recursed_functions" that will list the functions
+	  that triggered a recursion.
+
+	  This will add more overhead to cases that have recursion.
+
+	  If unsure, say N
+
+config FTRACE_RECORD_RECURSION_SIZE
+	int "Max number of recursed functions to record"
+	default	128
+	depends on FTRACE_RECORD_RECURSION
+	help
+	  This defines the limit of number of functions that can be
+	  listed in the "recursed_functions" file, that lists all
+	  the functions that caused a recursion to happen.
+	  This file can be reset, but the limit can not change in
+	  size at runtime.
+
 config GCOV_PROFILE_FTRACE
 	bool "Enable GCOV profiling on ftrace subsystem"
 	depends on GCOV_KERNEL
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e153be351548..7e44cea89fdc 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -92,6 +92,7 @@ obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
 obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
 obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
 obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
+obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
 
 obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 39f2bba89b76..03aad2b5cd5e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6918,7 +6918,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	struct ftrace_ops *op;
 	int bit;
 
-	bit = trace_test_and_set_recursion(TRACE_LIST_START, TRACE_LIST_MAX);
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
 	if (bit < 0)
 		return;
 
@@ -6993,7 +6993,7 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
 {
 	int bit;
 
-	bit = trace_test_and_set_recursion(TRACE_LIST_START, TRACE_LIST_MAX);
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a2b9fddb8148..1b202e28dfaa 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -447,7 +447,7 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 	if ((unsigned long)ops->private != smp_processor_id())
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 89c414ce1388..646eda6c44a5 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -141,7 +141,7 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
 	if (unlikely(!tr->function_enabled))
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 000e9dc224c6..92b1575ae0ca 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -353,8 +353,8 @@ static inline const char *kretprobed(const char *name)
 }
 #endif /* CONFIG_KRETPROBES */
 
-static void
-seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
+void
+trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
 {
 #ifdef CONFIG_KALLSYMS
 	char str[KSYM_SYMBOL_LEN];
@@ -420,7 +420,7 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
 		goto out;
 	}
 
-	seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET);
+	trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET);
 
 	if (sym_flags & TRACE_ITER_SYM_ADDR)
 		trace_seq_printf(s, " <" IP_FMT ">", ip);
diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
index 2f742b74e7e6..4c954636caf0 100644
--- a/kernel/trace/trace_output.h
+++ b/kernel/trace/trace_output.h
@@ -16,6 +16,7 @@ extern int
 seq_print_ip_sym(struct trace_seq *s, unsigned long ip,
 		unsigned long sym_flags);
 
+extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset);
 extern int trace_print_context(struct trace_iterator *iter);
 extern int trace_print_lat_context(struct trace_iterator *iter);
 
diff --git a/kernel/trace/trace_recursion_record.c b/kernel/trace/trace_recursion_record.c
new file mode 100644
index 000000000000..b2edac1fe156
--- /dev/null
+++ b/kernel/trace/trace_recursion_record.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+#include <linux/fs.h>
+
+#include "trace_output.h"
+
+struct recursed_functions {
+	unsigned long		ip;
+	unsigned long		parent_ip;
+};
+
+static struct recursed_functions recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE];
+static atomic_t nr_records;
+
+/*
+ * Cache the last found function. Yes, updates to this is racey, but
+ * so is memory cache ;-)
+ */
+static unsigned long cached_function;
+
+void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
+{
+	int index = 0;
+	int i;
+	unsigned long old;
+
+ again:
+	/* First check the last one recorded */
+	if (ip == cached_function)
+		return;
+
+	i = atomic_read(&nr_records);
+	/* nr_records is -1 when clearing records */
+	smp_mb__after_atomic();
+	if (i < 0)
+		return;
+
+	/*
+	 * If there's two writers and this writer comes in second,
+	 * the cmpxchg() below to update the ip will fail. Then this
+	 * writer will try again. It is possible that index will now
+	 * be greater than nr_records. This is because the writer
+	 * that succeeded has not updated the nr_records yet.
+	 * This writer could keep trying again until the other writer
+	 * updates nr_records. But if the other writer takes an
+	 * interrupt, and that interrupt locks up that CPU, we do
+	 * not want this CPU to lock up due to the recursion protection,
+	 * and have a bug report showing this CPU as the cause of
+	 * locking up the computer. To not lose this record, this
+	 * writer will simply use the next position to update the
+	 * recursed_functions, and it will update the nr_records
+	 * accordingly.
+	 */
+	if (index < i)
+		index = i;
+	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
+		return;
+
+	for (i = index - 1; i >= 0; i--) {
+		if (recursed_functions[i].ip == ip) {
+			cached_function = ip;
+			return;
+		}
+	}
+
+	cached_function = ip;
+
+	/*
+	 * We only want to add a function if it hasn't been added before.
+	 * Add to the current location before incrementing the count.
+	 * If it fails to add, then increment the index (save in i)
+	 * and try again.
+	 */
+	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
+	if (old != 0) {
+		/* Did something else already added this for us? */
+		if (old == ip)
+			return;
+		/* Try the next location (use i for the next index) */
+		index++;
+		goto again;
+	}
+
+	recursed_functions[index].parent_ip = parent_ip;
+
+	/*
+	 * It's still possible that we could race with the clearing
+	 *    CPU0                                    CPU1
+	 *    ----                                    ----
+	 *                                       ip = func
+	 *  nr_records = -1;
+	 *  recursed_functions[0] = 0;
+	 *                                       i = -1
+	 *                                       if (i < 0)
+	 *  nr_records = 0;
+	 *  (new recursion detected)
+	 *      recursed_functions[0] = func
+	 *                                            cmpxchg(recursed_functions[0],
+	 *                                                    func, 0)
+	 *
+	 * But the worse that could happen is that we get a zero in
+	 * the recursed_functions array, and it's likely that "func" will
+	 * be recorded again.
+	 */
+	i = atomic_read(&nr_records);
+	smp_mb__after_atomic();
+	if (i < 0)
+		cmpxchg(&recursed_functions[index].ip, ip, 0);
+	else if (i <= index)
+		atomic_cmpxchg(&nr_records, i, index + 1);
+}
+EXPORT_SYMBOL_GPL(ftrace_record_recursion);
+
+static DEFINE_MUTEX(recursed_function_lock);
+static struct trace_seq *tseq;
+
+static void *recursed_function_seq_start(struct seq_file *m, loff_t *pos)
+{
+	void *ret = NULL;
+	int index;
+
+	mutex_lock(&recursed_function_lock);
+	index = atomic_read(&nr_records);
+	if (*pos < index) {
+		ret = &recursed_functions[*pos];
+	}
+
+	tseq = kzalloc(sizeof(*tseq), GFP_KERNEL);
+	if (!tseq)
+		return ERR_PTR(-ENOMEM);
+
+	trace_seq_init(tseq);
+
+	return ret;
+}
+
+static void *recursed_function_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	int index;
+	int p;
+
+	index = atomic_read(&nr_records);
+	p = ++(*pos);
+
+	return p < index ? &recursed_functions[p] : NULL;
+}
+
+static void recursed_function_seq_stop(struct seq_file *m, void *v)
+{
+	kfree(tseq);
+	mutex_unlock(&recursed_function_lock);
+}
+
+static int recursed_function_seq_show(struct seq_file *m, void *v)
+{
+	struct recursed_functions *record = v;
+	int ret = 0;
+
+	if (record) {
+		trace_seq_print_sym(tseq, record->parent_ip, true);
+		trace_seq_puts(tseq, ":\t");
+		trace_seq_print_sym(tseq, record->ip, true);
+		trace_seq_putc(tseq, '\n');
+		ret = trace_print_seq(m, tseq);
+	}
+
+	return ret;
+}
+
+static const struct seq_operations recursed_function_seq_ops = {
+	.start  = recursed_function_seq_start,
+	.next   = recursed_function_seq_next,
+	.stop   = recursed_function_seq_stop,
+	.show   = recursed_function_seq_show
+};
+
+static int recursed_function_open(struct inode *inode, struct file *file)
+{
+	int ret = 0;
+
+	mutex_lock(&recursed_function_lock);
+	/* If this file was opened for write, then erase contents */
+	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+		/* disable updating records */
+		atomic_set(&nr_records, -1);
+		smp_mb__after_atomic();
+		memset(recursed_functions, 0, sizeof(recursed_functions));
+		smp_wmb();
+		/* enable them again */
+		atomic_set(&nr_records, 0);
+	}
+	if (file->f_mode & FMODE_READ)
+		ret = seq_open(file, &recursed_function_seq_ops);
+	mutex_unlock(&recursed_function_lock);
+
+	return ret;
+}
+
+static ssize_t recursed_function_write(struct file *file,
+				       const char __user *buffer,
+				       size_t count, loff_t *ppos)
+{
+	return count;
+}
+
+static int recursed_function_release(struct inode *inode, struct file *file)
+{
+	if (file->f_mode & FMODE_READ)
+		seq_release(inode, file);
+	return 0;
+}
+
+static const struct file_operations recursed_functions_fops = {
+	.open           = recursed_function_open,
+	.write		= recursed_function_write,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = recursed_function_release,
+};
+
+__init static int create_recursed_functions(void)
+{
+	struct dentry *dentry;
+
+	dentry = trace_create_file("recursed_functions", 0644, NULL, NULL,
+				   &recursed_functions_fops);
+	if (!dentry)
+		pr_warn("WARNING: Failed to create recursed_functions\n");
+	return 0;
+}
+
+fs_initcall(create_recursed_functions);
-- 
2.25.4


^ permalink raw reply related

* [PATCH 11/11 v2.1] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 19:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102123721.4fcce2cb@gandalf.local.home>

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/ftrace-uses.rst   |   6 +-
 arch/csky/kernel/probes/ftrace.c      |   2 +-
 arch/parisc/kernel/ftrace.c           |   2 +-
 arch/powerpc/kernel/kprobes-ftrace.c  |   2 +-
 arch/s390/kernel/ftrace.c             |   2 +-
 arch/x86/kernel/kprobes/ftrace.c      |   2 +-
 fs/pstore/ftrace.c                    |   2 +-
 include/linux/trace_recursion.h       |  32 +++-
 kernel/livepatch/patch.c              |   2 +-
 kernel/trace/Kconfig                  |  25 +++
 kernel/trace/Makefile                 |   1 +
 kernel/trace/ftrace.c                 |   4 +-
 kernel/trace/trace_event_perf.c       |   2 +-
 kernel/trace/trace_functions.c        |   2 +-
 kernel/trace/trace_output.c           |   6 +-
 kernel/trace/trace_output.h           |   1 +
 kernel/trace/trace_recursion_record.c | 235 ++++++++++++++++++++++++++
 17 files changed, 308 insertions(+), 20 deletions(-)
 create mode 100644 kernel/trace/trace_recursion_record.c

diff --git a/Documentation/trace/ftrace-uses.rst b/Documentation/trace/ftrace-uses.rst
index 86cd14b8e126..5981d5691745 100644
--- a/Documentation/trace/ftrace-uses.rst
+++ b/Documentation/trace/ftrace-uses.rst
@@ -118,7 +118,7 @@ can help in this regard. If you start your code with:
 
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
@@ -130,7 +130,9 @@ The code in between will be safe to use, even if it ends up calling a
 function that the callback is tracing. Note, on success,
 ftrace_test_recursion_trylock() will disable preemption, and the
 ftrace_test_recursion_unlock() will enable it again (if it was previously
-enabled).
+enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
+ftrace_test_recursion_trylock() to record where the recursion happened
+(if CONFIG_FTRACE_RECORD_RECURSION is set).
 
 Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
 (as explained below), then a helper trampoline will be used to test
diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 5eb2604fdf71..f30b179924ef 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -18,7 +18,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p;
 	struct kprobe_ctlblk *kcb;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 4b1fdf15662c..8b0ed7c5a4ab 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -210,7 +210,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip);
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 5df8d50c65ae..fdfee39938ea 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -20,7 +20,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(nip, parent_nip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index 88466d7fb6b2..a1556333d481 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -204,7 +204,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip);
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index a40a6cdfcca3..954d930a7127 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -20,7 +20,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c
index 816210fc5d3a..adb0935eb062 100644
--- a/fs/pstore/ftrace.c
+++ b/fs/pstore/ftrace.c
@@ -41,7 +41,7 @@ static void notrace pstore_ftrace_call(unsigned long ip,
 	if (unlikely(oops_in_progress))
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ac3d73484cb2..1cba5fe8777a 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -142,7 +142,28 @@ static __always_inline int trace_get_context_bit(void)
 			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
 }
 
-static __always_inline int trace_test_and_set_recursion(int start, int max)
+#ifdef CONFIG_FTRACE_RECORD_RECURSION
+extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
+/*
+* The paranoid_test check can cause dropped reports (unlikely), but
+* if the recursion is common, it will likely still be recorded later.
+* But the paranoid_test is needed to make sure we don't crash.
+*/
+# define do_ftrace_record_recursion(ip, pip)				\
+	do {								\
+		static atomic_t paranoid_test;				\
+		if (!atomic_read(&paranoid_test)) {			\
+			atomic_inc(&paranoid_test);			\
+			ftrace_record_recursion(ip, pip);		\
+			atomic_dec(&paranoid_test);			\
+		}							\
+	} while (0)
+#else
+# define do_ftrace_record_recursion(ip, pip)	do { } while (0)
+#endif
+
+static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
+							int start, int max)
 {
 	unsigned int val = current->trace_recursion;
 	int bit;
@@ -158,8 +179,10 @@ static __always_inline int trace_test_and_set_recursion(int start, int max)
 		 * a switch between contexts. Allow for a single recursion.
 		 */
 		bit = TRACE_TRANSITION_BIT;
-		if (trace_recursion_test(bit))
+		if (trace_recursion_test(bit)) {
+			do_ftrace_record_recursion(ip, pip);
 			return -1;
+		}
 		trace_recursion_set(bit);
 		barrier();
 		return bit + 1;
@@ -199,9 +222,10 @@ static __always_inline void trace_clear_recursion(int bit)
  * Returns: -1 if a recursion happened.
  *           >= 0 if no recursion
  */
-static __always_inline int ftrace_test_recursion_trylock(void)
+static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
+							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
 }
 
 /**
diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index 15480bf3ce88..875c5dbbdd33 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,7 +49,7 @@ static void notrace klp_ftrace_handler(unsigned long ip,
 
 	ops = container_of(fops, struct klp_ops, fops);
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (WARN_ON_ONCE(bit < 0))
 		return;
 	/*
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a4020c0b4508..9b11c096d139 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -727,6 +727,31 @@ config TRACE_EVAL_MAP_FILE
 
 	If unsure, say N.
 
+config FTRACE_RECORD_RECURSION
+	bool "Record functions that recurse in function tracing"
+	depends on FUNCTION_TRACER
+	help
+	  All callbacks that attach to the function tracing have some sort
+	  of protection against recursion. Even though the protection exists,
+	  it adds overhead. This option will create a file in the tracefs
+	  file system called "recursed_functions" that will list the functions
+	  that triggered a recursion.
+
+	  This will add more overhead to cases that have recursion.
+
+	  If unsure, say N
+
+config FTRACE_RECORD_RECURSION_SIZE
+	int "Max number of recursed functions to record"
+	default	128
+	depends on FTRACE_RECORD_RECURSION
+	help
+	  This defines the limit of number of functions that can be
+	  listed in the "recursed_functions" file, that lists all
+	  the functions that caused a recursion to happen.
+	  This file can be reset, but the limit can not change in
+	  size at runtime.
+
 config GCOV_PROFILE_FTRACE
 	bool "Enable GCOV profiling on ftrace subsystem"
 	depends on GCOV_KERNEL
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e153be351548..7e44cea89fdc 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -92,6 +92,7 @@ obj-$(CONFIG_DYNAMIC_EVENTS) += trace_dynevent.o
 obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
 obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
 obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
+obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
 
 obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 39f2bba89b76..03aad2b5cd5e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6918,7 +6918,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	struct ftrace_ops *op;
 	int bit;
 
-	bit = trace_test_and_set_recursion(TRACE_LIST_START, TRACE_LIST_MAX);
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
 	if (bit < 0)
 		return;
 
@@ -6993,7 +6993,7 @@ static void ftrace_ops_assist_func(unsigned long ip, unsigned long parent_ip,
 {
 	int bit;
 
-	bit = trace_test_and_set_recursion(TRACE_LIST_START, TRACE_LIST_MAX);
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START, TRACE_LIST_MAX);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a2b9fddb8148..1b202e28dfaa 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -447,7 +447,7 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 	if ((unsigned long)ops->private != smp_processor_id())
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 89c414ce1388..646eda6c44a5 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -141,7 +141,7 @@ function_trace_call(unsigned long ip, unsigned long parent_ip,
 	if (unlikely(!tr->function_enabled))
 		return;
 
-	bit = ftrace_test_recursion_trylock();
+	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 000e9dc224c6..92b1575ae0ca 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -353,8 +353,8 @@ static inline const char *kretprobed(const char *name)
 }
 #endif /* CONFIG_KRETPROBES */
 
-static void
-seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
+void
+trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
 {
 #ifdef CONFIG_KALLSYMS
 	char str[KSYM_SYMBOL_LEN];
@@ -420,7 +420,7 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
 		goto out;
 	}
 
-	seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET);
+	trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET);
 
 	if (sym_flags & TRACE_ITER_SYM_ADDR)
 		trace_seq_printf(s, " <" IP_FMT ">", ip);
diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
index 2f742b74e7e6..4c954636caf0 100644
--- a/kernel/trace/trace_output.h
+++ b/kernel/trace/trace_output.h
@@ -16,6 +16,7 @@ extern int
 seq_print_ip_sym(struct trace_seq *s, unsigned long ip,
 		unsigned long sym_flags);
 
+extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset);
 extern int trace_print_context(struct trace_iterator *iter);
 extern int trace_print_lat_context(struct trace_iterator *iter);
 
diff --git a/kernel/trace/trace_recursion_record.c b/kernel/trace/trace_recursion_record.c
new file mode 100644
index 000000000000..a1859843781b
--- /dev/null
+++ b/kernel/trace/trace_recursion_record.c
@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/seq_file.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/ftrace.h>
+#include <linux/fs.h>
+
+#include "trace_output.h"
+
+struct recursed_functions {
+	unsigned long		ip;
+	unsigned long		parent_ip;
+};
+
+static struct recursed_functions recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE];
+static atomic_t nr_records;
+
+/*
+ * Cache the last found function. Yes, updates to this is racey, but
+ * so is memory cache ;-)
+ */
+static unsigned long cached_function;
+
+void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
+{
+	int index = 0;
+	int i;
+	unsigned long old;
+
+ again:
+	/* First check the last one recorded */
+	if (ip == cached_function)
+		return;
+
+	i = atomic_read(&nr_records);
+	/* nr_records is -1 when clearing records */
+	smp_mb__after_atomic();
+	if (i < 0)
+		return;
+
+	/*
+	 * If there's two writers and this writer comes in second,
+	 * the cmpxchg() below to update the ip will fail. Then this
+	 * writer will try again. It is possible that index will now
+	 * be greater than nr_records. This is because the writer
+	 * that succeeded has not updated the nr_records yet.
+	 * This writer could keep trying again until the other writer
+	 * updates nr_records. But if the other writer takes an
+	 * interrupt, and that interrupt locks up that CPU, we do
+	 * not want this CPU to lock up due to the recursion protection,
+	 * and have a bug report showing this CPU as the cause of
+	 * locking up the computer. To not lose this record, this
+	 * writer will simply use the next position to update the
+	 * recursed_functions, and it will update the nr_records
+	 * accordingly.
+	 */
+	if (index < i)
+		index = i;
+	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
+		return;
+
+	for (i = index - 1; i >= 0; i--) {
+		if (recursed_functions[i].ip == ip) {
+			cached_function = ip;
+			return;
+		}
+	}
+
+	cached_function = ip;
+
+	/*
+	 * We only want to add a function if it hasn't been added before.
+	 * Add to the current location before incrementing the count.
+	 * If it fails to add, then increment the index (save in i)
+	 * and try again.
+	 */
+	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
+	if (old != 0) {
+		/* Did something else already added this for us? */
+		if (old == ip)
+			return;
+		/* Try the next location (use i for the next index) */
+		index++;
+		goto again;
+	}
+
+	recursed_functions[index].parent_ip = parent_ip;
+
+	/*
+	 * It's still possible that we could race with the clearing
+	 *    CPU0                                    CPU1
+	 *    ----                                    ----
+	 *                                       ip = func
+	 *  nr_records = -1;
+	 *  recursed_functions[0] = 0;
+	 *                                       i = -1
+	 *                                       if (i < 0)
+	 *  nr_records = 0;
+	 *  (new recursion detected)
+	 *      recursed_functions[0] = func
+	 *                                            cmpxchg(recursed_functions[0],
+	 *                                                    func, 0)
+	 *
+	 * But the worse that could happen is that we get a zero in
+	 * the recursed_functions array, and it's likely that "func" will
+	 * be recorded again.
+	 */
+	i = atomic_read(&nr_records);
+	smp_mb__after_atomic();
+	if (i < 0)
+		cmpxchg(&recursed_functions[index].ip, ip, 0);
+	else if (i <= index)
+		atomic_cmpxchg(&nr_records, i, index + 1);
+}
+
+static DEFINE_MUTEX(recursed_function_lock);
+static struct trace_seq *tseq;
+
+static void *recursed_function_seq_start(struct seq_file *m, loff_t *pos)
+{
+	void *ret = NULL;
+	int index;
+
+	mutex_lock(&recursed_function_lock);
+	index = atomic_read(&nr_records);
+	if (*pos < index) {
+		ret = &recursed_functions[*pos];
+	}
+
+	tseq = kzalloc(sizeof(*tseq), GFP_KERNEL);
+	if (!tseq)
+		return ERR_PTR(-ENOMEM);
+
+	trace_seq_init(tseq);
+
+	return ret;
+}
+
+static void *recursed_function_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	int index;
+	int p;
+
+	index = atomic_read(&nr_records);
+	p = ++(*pos);
+
+	return p < index ? &recursed_functions[p] : NULL;
+}
+
+static void recursed_function_seq_stop(struct seq_file *m, void *v)
+{
+	kfree(tseq);
+	mutex_unlock(&recursed_function_lock);
+}
+
+static int recursed_function_seq_show(struct seq_file *m, void *v)
+{
+	struct recursed_functions *record = v;
+	int ret = 0;
+
+	if (record) {
+		trace_seq_print_sym(tseq, record->parent_ip, true);
+		trace_seq_puts(tseq, ":\t");
+		trace_seq_print_sym(tseq, record->ip, true);
+		trace_seq_putc(tseq, '\n');
+		ret = trace_print_seq(m, tseq);
+	}
+
+	return ret;
+}
+
+static const struct seq_operations recursed_function_seq_ops = {
+	.start  = recursed_function_seq_start,
+	.next   = recursed_function_seq_next,
+	.stop   = recursed_function_seq_stop,
+	.show   = recursed_function_seq_show
+};
+
+static int recursed_function_open(struct inode *inode, struct file *file)
+{
+	int ret = 0;
+
+	mutex_lock(&recursed_function_lock);
+	/* If this file was opened for write, then erase contents */
+	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+		/* disable updating records */
+		atomic_set(&nr_records, -1);
+		smp_mb__after_atomic();
+		memset(recursed_functions, 0, sizeof(recursed_functions));
+		smp_wmb();
+		/* enable them again */
+		atomic_set(&nr_records, 0);
+	}
+	if (file->f_mode & FMODE_READ)
+		ret = seq_open(file, &recursed_function_seq_ops);
+	mutex_unlock(&recursed_function_lock);
+
+	return ret;
+}
+
+static ssize_t recursed_function_write(struct file *file,
+				       const char __user *buffer,
+				       size_t count, loff_t *ppos)
+{
+	return count;
+}
+
+static int recursed_function_release(struct inode *inode, struct file *file)
+{
+	if (file->f_mode & FMODE_READ)
+		seq_release(inode, file);
+	return 0;
+}
+
+static const struct file_operations recursed_functions_fops = {
+	.open           = recursed_function_open,
+	.write		= recursed_function_write,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = recursed_function_release,
+};
+
+__init static int create_recursed_functions(void)
+{
+	struct dentry *dentry;
+
+	dentry = trace_create_file("recursed_functions", 0644, NULL, NULL,
+				   &recursed_functions_fops);
+	if (!dentry)
+		pr_warn("WARNING: Failed to create recursed_functions\n");
+	return 0;
+}
+
+fs_initcall(create_recursed_functions);
-- 
2.25.4


^ permalink raw reply related

* RE: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
From: David Laight @ 2020-11-02 18:23 UTC (permalink / raw)
  To: 'Greg KH'
  Cc: linux-aio@kvack.org, 'David Hildenbrand',
	linux-mips@vger.kernel.org, David Howells, linux-mm@kvack.org,
	keyrings@vger.kernel.org, sparclinux@vger.kernel.org,
	Christoph Hellwig, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-scsi@vger.kernel.org,
	kernel-team@android.com, Arnd Bergmann,
	linux-block@vger.kernel.org, Al Viro, io-uring@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Jens Axboe,
	linux-parisc@vger.kernel.org, netdev@vger.kernel.org,
	Nick Desaulniers, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20201102135202.GA1016272@kroah.com>

From: 'Greg KH'
> Sent: 02 November 2020 13:52
> 
> On Mon, Nov 02, 2020 at 09:06:38AM +0000, David Laight wrote:
> > From: 'Greg KH'
> > > Sent: 23 October 2020 15:47
> > >
> > > On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> > > > From: David Hildenbrand
> > > > > Sent: 23 October 2020 15:33
> > > > ...
> > > > > I just checked against upstream code generated by clang 10 and it
> > > > > properly discards the upper 32bit via a mov w23 w2.
> > > > >
> > > > > So at least clang 10 indeed properly assumes we could have garbage and
> > > > > masks it off.
> > > > >
> > > > > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > > > > behaves differently.
> > > >
> > > > We'll need the disassembly from a failing kernel image.
> > > > It isn't that big to hand annotate.
> > >
> > > I've worked around the merge at the moment in the android tree, but it
> > > is still quite reproducable, and will try to get a .o file to
> > > disassemble on Monday or so...
> >
> > Did this get properly resolved?
> 
> For some reason, 5.10-rc2 fixed all of this up.  I backed out all of the
> patches I had to revert to get 5.10-rc1 to work properly, and then did
> the merge and all is well.
> 
> It must have been something to do with the compat changes in this same
> area that went in after 5.10-rc1, and something got reorganized in the
> files somehow.  I really do not know, and at the moment, don't have the
> time to track it down anymore.  So for now, I'd say it's all good, sorry
> for the noise.

Hopefully it won't appear again.

Saved me spending a day off reading arm64 assembler.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 17:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102123721.4fcce2cb@gandalf.local.home>

On Mon, 2 Nov 2020 12:37:21 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:


> The only race that I see that can happen, is the one in the comment I
> showed. And that is after enabling the recursed functions again after
> clearing, one CPU could add a function while another CPU that just added
> that same function could be just exiting this routine, notice that a
> clearing of the array happened, and remove its function (which was the same
> as the one just happened). So we get a "zero" in the array. If this
> happens, it is likely that that function will recurse again and will be
> added later.
> 

Updated version of this function:

-- Steve


void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
{
	int index = 0;
	int i;
	unsigned long old;

 again:
	/* First check the last one recorded */
	if (ip == cached_function)
		return;

	i = atomic_read(&nr_records);
	/* nr_records is -1 when clearing records */
	smp_mb__after_atomic();
	if (i < 0)
		return;

	/*
	 * If there's two writers and this writer comes in second,
	 * the cmpxchg() below to update the ip will fail. Then this
	 * writer will try again. It is possible that index will now
	 * be greater than nr_records. This is because the writer
	 * that succeeded has not updated the nr_records yet.
	 * This writer could keep trying again until the other writer
	 * updates nr_records. But if the other writer takes an
	 * interrupt, and that interrupt locks up that CPU, we do
	 * not want this CPU to lock up due to the recursion protection,
	 * and have a bug report showing this CPU as the cause of
	 * locking up the computer. To not lose this record, this
	 * writer will simply use the next position to update the
	 * recursed_functions, and it will update the nr_records
	 * accordingly.
	 */
	if (index < i)
		index = i;
	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
		return;

	for (i = index - 1; i >= 0; i--) {
		if (recursed_functions[i].ip == ip) {
			cached_function = ip;
			return;
		}
	}

	cached_function = ip;

	/*
	 * We only want to add a function if it hasn't been added before.
	 * Add to the current location before incrementing the count.
	 * If it fails to add, then increment the index (save in i)
	 * and try again.
	 */
	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
	if (old != 0) {
		/* Did something else already added this for us? */
		if (old == ip)
			return;
		/* Try the next location (use i for the next index) */
		index++;
		goto again;
	}

	recursed_functions[index].parent_ip = parent_ip;

	/*
	 * It's still possible that we could race with the clearing
	 *    CPU0                                    CPU1
	 *    ----                                    ----
	 *                                       ip = func
	 *  nr_records = -1;
	 *  recursed_functions[0] = 0;
	 *                                       i = -1
	 *                                       if (i < 0)
	 *  nr_records = 0;
	 *  (new recursion detected)
	 *      recursed_functions[0] = func
	 *                                            cmpxchg(recursed_functions[0],
	 *                                                    func, 0)
	 *
	 * But the worse that could happen is that we get a zero in
	 * the recursed_functions array, and it's likely that "func" will
	 * be recorded again.
	 */
	i = atomic_read(&nr_records);
	smp_mb__after_atomic();
	if (i < 0)
		cmpxchg(&recursed_functions[index].ip, ip, 0);
	else if (i <= index)
		atomic_cmpxchg(&nr_records, i, index + 1);
}

^ permalink raw reply

* Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 17:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102164147.GJ20201@alley>

On Mon, 2 Nov 2020 17:41:47 +0100
Petr Mladek <pmladek@suse.com> wrote:

> > +	i = atomic_read(&nr_records);
> > +	smp_mb__after_atomic();
> > +	if (i < 0)
> > +		cmpxchg(&recursed_functions[index].ip, ip, 0);
> > +	else if (i <= index)
> > +		atomic_cmpxchg(&nr_records, i, index + 1);  
> 
> This looks weird. It would shift nr_records past the record added
> in this call. It might skip many slots that were zeroed when clearing.
> Also we do not know if our entry was not zeroed as well.

nr_records always holds the next position to write to.

	index = nr_records;
	recursed_functions[index].ip = ip;
	nr_records++;

Before clearing, we have:

	nr_records = -1;
	smp_mb();
	memset(recursed_functions, 0);
	smp_wmb();
	nr_records = 0;

When we enter this function:

	i = nr_records;
	smp_mb();
	if (i < 0)
		return;


Thus, we just stopped all new updates while clearing the records.

But what about if something is currently updating?

	i = nr_records;
	smp_mb();
	if (i < 0)
		cmpxchg(recursed_functions, ip, 0);

The above shows that if the current updating process notices that the
clearing happens, it will clear the function it added.

	else if (i <= index)
		cmpxchg(nr_records, i, index + 1);

This makes sure that nr_records only grows if it is greater or equal to
zero.

The only race that I see that can happen, is the one in the comment I
showed. And that is after enabling the recursed functions again after
clearing, one CPU could add a function while another CPU that just added
that same function could be just exiting this routine, notice that a
clearing of the array happened, and remove its function (which was the same
as the one just happened). So we get a "zero" in the array. If this
happens, it is likely that that function will recurse again and will be
added later.

-- Steve

^ permalink raw reply

* Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102120907.457ad2f7@gandalf.local.home>

On Mon, 2 Nov 2020 12:09:07 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > > +void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> > > +{
> > > +	int index;
> > > +	int i = 0;
> > > +	unsigned long old;
> > > +
> > > + again:
> > > +	/* First check the last one recorded */
> > > +	if (ip == cached_function)
> > > +		return;
> > > +
> > > +	index = atomic_read(&nr_records);
> > > +	/* nr_records is -1 when clearing records */
> > > +	smp_mb__after_atomic();
> > > +	if (index < 0)
> > > +		return;
> > > +
> > > +	/* See below */
> > > +	if (i > index)
> > > +		index = i;    
> > 
> > This looks like a complicated way to do index++ via "i" variable.
> > I guess that it was needed only in some older variant of the code.
> > See below.  
> 
> Because we reread the index above, and index could be bigger than i (more
> than index + 1).
> 
> >   
> > > +	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
> > > +		return;
> > > +
> > > +	for (i = index - 1; i >= 0; i--) {
> > > +		if (recursed_functions[i].ip == ip) {
> > > +			cached_function = ip;
> > > +			return;
> > > +		}
> > > +	}
> > > +
> > > +	cached_function = ip;
> > > +
> > > +	/*
> > > +	 * We only want to add a function if it hasn't been added before.
> > > +	 * Add to the current location before incrementing the count.
> > > +	 * If it fails to add, then increment the index (save in i)
> > > +	 * and try again.
> > > +	 */
> > > +	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
> > > +	if (old != 0) {
> > > +		/* Did something else already added this for us? */
> > > +		if (old == ip)
> > > +			return;
> > > +		/* Try the next location (use i for the next index) */
> > > +		i = index + 1;    
> > 
> > What about
> > 
> > 		index++;
> > 
> > We basically want to run the code again with index + 1 limit.  
> 
> But something else could update nr_records, and we want to use that if
> nr_records is greater than i.
> 
> Now, we could swap the use case, and have
> 
> 	int index = 0;
> 
> 	[..]
> 	i = atomic_read(&nr_records);
> 	if (i > index)
> 		index = i;
> 
> 	[..]
> 
> 		index++;
> 		goto again;
> 
> 
> > 
> > Maybe, it even does not make sense to check the array again
> > and we should just try to store the value into the next slot.  
> 
> We do this dance to prevent duplicates.
> 
> But you are correct, that this went through a few iterations. And the first
> ones didn't have the cmpxchg on the ip itself, and that could make it so
> that we don't need this index = i dance.

Playing with this more, I remember why I did this song and dance.

If we have two or more writers, and one beats the other in updating the ip
(with a different function). This one will go and try again. The reason to
look at one passed nr_records, is because of the race between the multiple
writers. This one may loop before the other can update nr_records, and it
will fail to apply it again.

You could just say, "hey we'll just keep looping until the other writer
eventually updates nr_records". But this is where my paranoia gets in. What
happens if that other writer takes an interrupt (interrupts are not
disabled), and then deadlocks, or does something bad? This CPU will not get
locked up spinning.

Unlikely scenario, and it would require a bug someplace else. But I don't
want a bug report stating that it found this recursion locking locking up
the CPU and hide the real culprit.

I'll add a comment to explain this in the code. And also swap the i and
index around to make a little more sense.

-- Steve

^ permalink raw reply

* Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion
From: Steven Rostedt @ 2020-11-02 17:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201102164147.GJ20201@alley>

On Mon, 2 Nov 2020 17:41:47 +0100
Petr Mladek <pmladek@suse.com> wrote:

> On Fri 2020-10-30 17:31:53, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
> > "recursed_functions" all the functions that caused recursion while a
> > callback to the function tracer was running.
> >   
> 
> > --- /dev/null
> > +++ b/kernel/trace/trace_recursion_record.c
> > @@ -0,0 +1,220 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/seq_file.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/module.h>
> > +#include <linux/ftrace.h>
> > +#include <linux/fs.h>
> > +
> > +#include "trace_output.h"
> > +
> > +struct recursed_functions {
> > +	unsigned long		ip;
> > +	unsigned long		parent_ip;
> > +};
> > +
> > +static struct recursed_functions recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE];  
> 
> The code tries to be lockless safe as much as possible. It would make
> sense to allign the array.

Hmm, is there an arch where the compiler would put an array of structures
with two unsigned long, misaligned?

> 
> 
> > +static atomic_t nr_records;
> > +
> > +/*
> > + * Cache the last found function. Yes, updates to this is racey, but
> > + * so is memory cache ;-)
> > + */
> > +static unsigned long cached_function;
> > +
> > +void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> > +{
> > +	int index;
> > +	int i = 0;
> > +	unsigned long old;
> > +
> > + again:
> > +	/* First check the last one recorded */
> > +	if (ip == cached_function)
> > +		return;
> > +
> > +	index = atomic_read(&nr_records);
> > +	/* nr_records is -1 when clearing records */
> > +	smp_mb__after_atomic();
> > +	if (index < 0)
> > +		return;
> > +
> > +	/* See below */
> > +	if (i > index)
> > +		index = i;  
> 
> This looks like a complicated way to do index++ via "i" variable.
> I guess that it was needed only in some older variant of the code.
> See below.

Because we reread the index above, and index could be bigger than i (more
than index + 1).

> 
> > +	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
> > +		return;
> > +
> > +	for (i = index - 1; i >= 0; i--) {
> > +		if (recursed_functions[i].ip == ip) {
> > +			cached_function = ip;
> > +			return;
> > +		}
> > +	}
> > +
> > +	cached_function = ip;
> > +
> > +	/*
> > +	 * We only want to add a function if it hasn't been added before.
> > +	 * Add to the current location before incrementing the count.
> > +	 * If it fails to add, then increment the index (save in i)
> > +	 * and try again.
> > +	 */
> > +	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
> > +	if (old != 0) {
> > +		/* Did something else already added this for us? */
> > +		if (old == ip)
> > +			return;
> > +		/* Try the next location (use i for the next index) */
> > +		i = index + 1;  
> 
> What about
> 
> 		index++;
> 
> We basically want to run the code again with index + 1 limit.

But something else could update nr_records, and we want to use that if
nr_records is greater than i.

Now, we could swap the use case, and have

	int index = 0;

	[..]
	i = atomic_read(&nr_records);
	if (i > index)
		index = i;

	[..]

		index++;
		goto again;


> 
> Maybe, it even does not make sense to check the array again
> and we should just try to store the value into the next slot.

We do this dance to prevent duplicates.

But you are correct, that this went through a few iterations. And the first
ones didn't have the cmpxchg on the ip itself, and that could make it so
that we don't need this index = i dance.

> 
> > +		goto again;
> > +	}
> > +
> > +	recursed_functions[index].parent_ip = parent_ip;  
> 
> WRITE_ONCE() ?

Does it really matter?

> 
> > +
> > +	/*
> > +	 * It's still possible that we could race with the clearing
> > +	 *    CPU0                                    CPU1
> > +	 *    ----                                    ----
> > +	 *                                       ip = func
> > +	 *  nr_records = -1;
> > +	 *  recursed_functions[0] = 0;
> > +	 *                                       i = -1
> > +	 *                                       if (i < 0)
> > +	 *  nr_records = 0;
> > +	 *  (new recursion detected)
> > +	 *      recursed_functions[0] = func
> > +	 *                                            cmpxchg(recursed_functions[0],
> > +	 *                                                    func, 0)
> > +	 *
> > +	 * But the worse that could happen is that we get a zero in
> > +	 * the recursed_functions array, and it's likely that "func" will
> > +	 * be recorded again.
> > +	 */
> > +	i = atomic_read(&nr_records);
> > +	smp_mb__after_atomic();
> > +	if (i < 0)
> > +		cmpxchg(&recursed_functions[index].ip, ip, 0);
> > +	else if (i <= index)
> > +		atomic_cmpxchg(&nr_records, i, index + 1);  
> 
> This looks weird. It would shift nr_records past the record added
> in this call. It might skip many slots that were zeroed when clearing.
> Also we do not know if our entry was not zeroed as well.
> 
> I would suggest to do it some other way (not even compile tested):
> 
> void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> {
> 	int index, old_index;
> 	int i = 0;
> 	unsigned long old_ip;
> 
>  again:
> 	/* First check the last one recorded. */
> 	if (ip == READ_ONCE(cached_function))
> 		return;
> 
> 	index = atomic_read(&nr_records);
> 	/* nr_records is -1 when clearing records. */
> 	smp_mb__after_atomic();
> 	if (index < 0)
> 		return;
> 
> 	/* Already cached? */
> 	for (i = index - 1; i >= 0; i--) {
> 		if (recursed_functions[i].ip == ip) {
> 			WRITE_ONCE(cached_function, ip);
> 			return;
> 		}
> 	}
> 
> 	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
> 		return;
> 
> 	/*
> 	 * Try to reserve the slot. It might be already taken
> 	 * or the entire cache cleared.
> 	 */
> 	old_index = atomic_cmpxchg(&nr_records, index, index + 1);
> 	if (old_index != index)
> 		goto again;
> 
> 	/*
> 	 * Be careful. The entire cache might have been cleared and reused in
> 	 * the meantime. Replace only empty slot.
> 	 */
> 	old_ip = cmpxchg(&recursed_functions[index].ip, 0, ip);
> 	if (old_ip != 0)
> 		goto again;
> 
> 	old_ip = cmpxchg(&recursed_functions[index].parent_ip, 0, parrent_ip);
> 	if (old_ip != 0)
> 		goto again;
> 
> 	/*
> 	 * No ip is better than non-consistent one. The race with
> 	 * clearing should be rare and not worth a perfect solution.
> 	 */
> 	if (READ_ONCE(recursed_functions[index].ip) != ip) {
> 		cmpxchg(&recursed_functions[index].ip, ip, 0UL)
> 		goto again;
> 	}
> }

Let me go and rewrite it, this time considering the cmpxchg in the ip
update code. I may end up with what you have above ;-)


> 
> The last check probably is not needed. Inconsistent entries
> should be prevented by the way how this func is called:
> 
> 		static atomic_t paranoid_test;				\
> 		if (!atomic_read(&paranoid_test)) {			\
> 			atomic_inc(&paranoid_test);			\
> 			ftrace_record_recursion(ip, pip);		\
> 			atomic_dec(&paranoid_test);			\
> 		}							\
> 
> 
> 
> 
> The rest of the patchset looks fine. I do not feel comfortable to give
> it Reviewed-by because I did not review it in depth.
> 
> I spent more time with the above lockless code. I took it is a
> training. I need to improve this skill to feel more comfortable with
> the lockless printk ring buffer ;-)

Yeah, everything becomes exponentially complex when you make it lockless
with multiple concurrent writers.

-- Steve

^ permalink raw reply

* Re: [PATCH 11/11 v2] ftrace: Add recording of functions that caused recursion
From: Petr Mladek @ 2020-11-02 16:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Anton Vorontsov, linux-doc, Peter Zijlstra,
	Sebastian Andrzej Siewior, Kamalesh Babulal, James E.J. Bottomley,
	Guo Ren, H. Peter Anvin, live-patching, Miroslav Benes,
	Ingo Molnar, linux-s390, Joe Lawrence, Jonathan Corbet,
	Mauro Carvalho Chehab, Helge Deller, x86, linux-csky,
	Christian Borntraeger, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Jiri Kosina, Borislav Petkov, Josh Poimboeuf, Thomas Gleixner,
	Tony Luck, linux-parisc, linux-kernel, Masami Hiramatsu,
	Colin Cross, Paul Mackerras, Andrew Morton, linuxppc-dev
In-Reply-To: <20201030214014.801706340@goodmis.org>

On Fri 2020-10-30 17:31:53, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
> "recursed_functions" all the functions that caused recursion while a
> callback to the function tracer was running.
> 

> --- /dev/null
> +++ b/kernel/trace/trace_recursion_record.c
> @@ -0,0 +1,220 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/seq_file.h>
> +#include <linux/kallsyms.h>
> +#include <linux/module.h>
> +#include <linux/ftrace.h>
> +#include <linux/fs.h>
> +
> +#include "trace_output.h"
> +
> +struct recursed_functions {
> +	unsigned long		ip;
> +	unsigned long		parent_ip;
> +};
> +
> +static struct recursed_functions recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE];

The code tries to be lockless safe as much as possible. It would make
sense to allign the array.


> +static atomic_t nr_records;
> +
> +/*
> + * Cache the last found function. Yes, updates to this is racey, but
> + * so is memory cache ;-)
> + */
> +static unsigned long cached_function;
> +
> +void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> +{
> +	int index;
> +	int i = 0;
> +	unsigned long old;
> +
> + again:
> +	/* First check the last one recorded */
> +	if (ip == cached_function)
> +		return;
> +
> +	index = atomic_read(&nr_records);
> +	/* nr_records is -1 when clearing records */
> +	smp_mb__after_atomic();
> +	if (index < 0)
> +		return;
> +
> +	/* See below */
> +	if (i > index)
> +		index = i;

This looks like a complicated way to do index++ via "i" variable.
I guess that it was needed only in some older variant of the code.
See below.

> +	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
> +		return;
> +
> +	for (i = index - 1; i >= 0; i--) {
> +		if (recursed_functions[i].ip == ip) {
> +			cached_function = ip;
> +			return;
> +		}
> +	}
> +
> +	cached_function = ip;
> +
> +	/*
> +	 * We only want to add a function if it hasn't been added before.
> +	 * Add to the current location before incrementing the count.
> +	 * If it fails to add, then increment the index (save in i)
> +	 * and try again.
> +	 */
> +	old = cmpxchg(&recursed_functions[index].ip, 0, ip);
> +	if (old != 0) {
> +		/* Did something else already added this for us? */
> +		if (old == ip)
> +			return;
> +		/* Try the next location (use i for the next index) */
> +		i = index + 1;

What about

		index++;

We basically want to run the code again with index + 1 limit.

Maybe, it even does not make sense to check the array again
and we should just try to store the value into the next slot.

> +		goto again;
> +	}
> +
> +	recursed_functions[index].parent_ip = parent_ip;

WRITE_ONCE() ?

> +
> +	/*
> +	 * It's still possible that we could race with the clearing
> +	 *    CPU0                                    CPU1
> +	 *    ----                                    ----
> +	 *                                       ip = func
> +	 *  nr_records = -1;
> +	 *  recursed_functions[0] = 0;
> +	 *                                       i = -1
> +	 *                                       if (i < 0)
> +	 *  nr_records = 0;
> +	 *  (new recursion detected)
> +	 *      recursed_functions[0] = func
> +	 *                                            cmpxchg(recursed_functions[0],
> +	 *                                                    func, 0)
> +	 *
> +	 * But the worse that could happen is that we get a zero in
> +	 * the recursed_functions array, and it's likely that "func" will
> +	 * be recorded again.
> +	 */
> +	i = atomic_read(&nr_records);
> +	smp_mb__after_atomic();
> +	if (i < 0)
> +		cmpxchg(&recursed_functions[index].ip, ip, 0);
> +	else if (i <= index)
> +		atomic_cmpxchg(&nr_records, i, index + 1);

This looks weird. It would shift nr_records past the record added
in this call. It might skip many slots that were zeroed when clearing.
Also we do not know if our entry was not zeroed as well.

I would suggest to do it some other way (not even compile tested):

void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
{
	int index, old_index;
	int i = 0;
	unsigned long old_ip;

 again:
	/* First check the last one recorded. */
	if (ip == READ_ONCE(cached_function))
		return;

	index = atomic_read(&nr_records);
	/* nr_records is -1 when clearing records. */
	smp_mb__after_atomic();
	if (index < 0)
		return;

	/* Already cached? */
	for (i = index - 1; i >= 0; i--) {
		if (recursed_functions[i].ip == ip) {
			WRITE_ONCE(cached_function, ip);
			return;
		}
	}

	if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
		return;

	/*
	 * Try to reserve the slot. It might be already taken
	 * or the entire cache cleared.
	 */
	old_index = atomic_cmpxchg(&nr_records, index, index + 1);
	if (old_index != index)
		goto again;

	/*
	 * Be careful. The entire cache might have been cleared and reused in
	 * the meantime. Replace only empty slot.
	 */
	old_ip = cmpxchg(&recursed_functions[index].ip, 0, ip);
	if (old_ip != 0)
		goto again;

	old_ip = cmpxchg(&recursed_functions[index].parent_ip, 0, parrent_ip);
	if (old_ip != 0)
		goto again;

	/*
	 * No ip is better than non-consistent one. The race with
	 * clearing should be rare and not worth a perfect solution.
	 */
	if (READ_ONCE(recursed_functions[index].ip) != ip) {
		cmpxchg(&recursed_functions[index].ip, ip, 0UL)
		goto again;
	}
}

The last check probably is not needed. Inconsistent entries
should be prevented by the way how this func is called:

		static atomic_t paranoid_test;				\
		if (!atomic_read(&paranoid_test)) {			\
			atomic_inc(&paranoid_test);			\
			ftrace_record_recursion(ip, pip);		\
			atomic_dec(&paranoid_test);			\
		}							\




The rest of the patchset looks fine. I do not feel comfortable to give
it Reviewed-by because I did not review it in depth.

I spent more time with the above lockless code. I took it is a
training. I need to improve this skill to feel more comfortable with
the lockless printk ring buffer ;-)

Best Regards,
Petr

^ permalink raw reply

* [PATCH] ASoC: fsl_xcvr: fix break condition
From: Viorel Suman (OSS) @ 2020-11-02 16:18 UTC (permalink / raw)
  To: Timur Tabi, Nicolin Chen, Xiubo Li, Fabio Estevam, Shengjiu Wang,
	Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	alsa-devel, linuxppc-dev, linux-kernel
  Cc: Viorel Suman

From: Viorel Suman <viorel.suman@nxp.com>

The break condition copied by mistake as same
as loop condition in the previous version, but must
be the opposite. So fix it.

Signed-off-by: Viorel Suman <viorel.suman@nxp.com>
---
 sound/soc/fsl/fsl_xcvr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index c055179e6d11..2a28810d0e29 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -247,7 +247,7 @@ static int fsl_xcvr_ai_write(struct fsl_xcvr *xcvr, u8 reg, u32 data, bool phy)
 	regmap_write(xcvr->regmap, FSL_XCVR_PHY_AI_CTRL_TOG, idx);
 
 	ret = regmap_read_poll_timeout(xcvr->regmap, FSL_XCVR_PHY_AI_CTRL, val,
-				       (val & idx) != ((val & tidx) >> 1),
+				       (val & idx) == ((val & tidx) >> 1),
 				       10, 10000);
 	if (ret)
 		dev_err(dev, "AI timeout: failed to set %s reg 0x%02x=0x%08x\n",
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v3 4/4] arch, mm: make kernel_page_present() always available
From: Mike Rapoport @ 2020-11-02 15:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Zijlstra, Dave Hansen, linux-mm, Paul Mackerras,
	Pavel Machek, H. Peter Anvin, sparclinux, Christoph Lameter,
	Will Deacon, linux-riscv, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, Catalin Marinas, Len Brown,
	Albert Ou, Vasily Gorbik, linux-pm, Heiko Carstens,
	David Rientjes, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Kirill A. Shutemov, Thomas Gleixner, Joonsoo Kim,
	linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
	Palmer Dabbelt, Andrew Morton, Edgecombe, Rick P, linuxppc-dev,
	David S. Miller
In-Reply-To: <08db307a-b093-d7aa-7364-045f328ab147@redhat.com>

On Mon, Nov 02, 2020 at 10:28:14AM +0100, David Hildenbrand wrote:
> On 01.11.20 18:08, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
> > verify that a page is mapped in the kernel direct map can be useful
> > regardless of hibernation.
> > 
> > Add RISC-V implementation of kernel_page_present(), update its forward
> > declarations and stubs to be a part of set_memory API and remove ugly
> > ifdefery in inlcude/linux/mm.h around current declarations of
> > kernel_page_present().
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >   arch/arm64/include/asm/cacheflush.h |  1 +
> >   arch/arm64/mm/pageattr.c            |  4 +---
> >   arch/riscv/include/asm/set_memory.h |  1 +
> >   arch/riscv/mm/pageattr.c            | 29 +++++++++++++++++++++++++++++
> >   arch/x86/include/asm/set_memory.h   |  1 +
> >   arch/x86/mm/pat/set_memory.c        |  4 +---
> >   include/linux/mm.h                  |  7 -------
> >   include/linux/set_memory.h          |  5 +++++
> >   8 files changed, 39 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> > index 9384fd8fc13c..45217f21f1fe 100644
> > --- a/arch/arm64/include/asm/cacheflush.h
> > +++ b/arch/arm64/include/asm/cacheflush.h
> > @@ -140,6 +140,7 @@ int set_memory_valid(unsigned long addr, int numpages, int enable);
> >   int set_direct_map_invalid_noflush(struct page *page);
> >   int set_direct_map_default_noflush(struct page *page);
> > +bool kernel_page_present(struct page *page);
> >   #include <asm-generic/cacheflush.h>
> > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> > index 439325532be1..92eccaf595c8 100644
> > --- a/arch/arm64/mm/pageattr.c
> > +++ b/arch/arm64/mm/pageattr.c
> > @@ -186,8 +186,8 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
> >   	set_memory_valid((unsigned long)page_address(page), numpages, enable);
> >   }
> > +#endif /* CONFIG_DEBUG_PAGEALLOC */
> > -#ifdef CONFIG_HIBERNATION
> >   /*
> >    * This function is used to determine if a linear map page has been marked as
> >    * not-valid. Walk the page table and check the PTE_VALID bit. This is based
> > @@ -234,5 +234,3 @@ bool kernel_page_present(struct page *page)
> >   	ptep = pte_offset_kernel(pmdp, addr);
> >   	return pte_valid(READ_ONCE(*ptep));
> >   }
> > -#endif /* CONFIG_HIBERNATION */
> > -#endif /* CONFIG_DEBUG_PAGEALLOC */
> > diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h
> > index 4c5bae7ca01c..d690b08dff2a 100644
> > --- a/arch/riscv/include/asm/set_memory.h
> > +++ b/arch/riscv/include/asm/set_memory.h
> > @@ -24,6 +24,7 @@ static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
> >   int set_direct_map_invalid_noflush(struct page *page);
> >   int set_direct_map_default_noflush(struct page *page);
> > +bool kernel_page_present(struct page *page);
> >   #endif /* __ASSEMBLY__ */
> > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > index 321b09d2e2ea..87ba5a68bbb8 100644
> > --- a/arch/riscv/mm/pageattr.c
> > +++ b/arch/riscv/mm/pageattr.c
> > @@ -198,3 +198,32 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
> >   			     __pgprot(0), __pgprot(_PAGE_PRESENT));
> >   }
> >   #endif
> > +
> > +bool kernel_page_present(struct page *page)
> > +{
> > +	unsigned long addr = (unsigned long)page_address(page);
> > +	pgd_t *pgd;
> > +	pud_t *pud;
> > +	p4d_t *p4d;
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +
> > +	pgd = pgd_offset_k(addr);
> > +	if (!pgd_present(*pgd))
> > +		return false;
> > +
> > +	p4d = p4d_offset(pgd, addr);
> > +	if (!p4d_present(*p4d))
> > +		return false;
> > +
> > +	pud = pud_offset(p4d, addr);
> > +	if (!pud_present(*pud))
> > +		return false;
> > +
> > +	pmd = pmd_offset(pud, addr);
> > +	if (!pmd_present(*pmd))
> > +		return false;
> > +
> > +	pte = pte_offset_kernel(pmd, addr);
> > +	return pte_present(*pte);
> > +}
> > diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
> > index 5948218f35c5..4352f08bfbb5 100644
> > --- a/arch/x86/include/asm/set_memory.h
> > +++ b/arch/x86/include/asm/set_memory.h
> > @@ -82,6 +82,7 @@ int set_pages_rw(struct page *page, int numpages);
> >   int set_direct_map_invalid_noflush(struct page *page);
> >   int set_direct_map_default_noflush(struct page *page);
> > +bool kernel_page_present(struct page *page);
> >   extern int kernel_set_to_readonly;
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index bc9be96b777f..16f878c26667 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -2226,8 +2226,8 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
> >   	arch_flush_lazy_mmu_mode();
> >   }
> > +#endif /* CONFIG_DEBUG_PAGEALLOC */
> > -#ifdef CONFIG_HIBERNATION
> >   bool kernel_page_present(struct page *page)
> >   {
> >   	unsigned int level;
> > @@ -2239,8 +2239,6 @@ bool kernel_page_present(struct page *page)
> >   	pte = lookup_address((unsigned long)page_address(page), &level);
> >   	return (pte_val(*pte) & _PAGE_PRESENT);
> >   }
> > -#endif /* CONFIG_HIBERNATION */
> > -#endif /* CONFIG_DEBUG_PAGEALLOC */
> >   int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> >   				   unsigned numpages, unsigned long page_flags)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ab0ef6bd351d..44b82f22e76a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2937,16 +2937,9 @@ static inline void debug_pagealloc_map_pages(struct page *page,
> >   	if (debug_pagealloc_enabled_static())
> >   		__kernel_map_pages(page, numpages, enable);
> >   }
> > -
> > -#ifdef CONFIG_HIBERNATION
> > -extern bool kernel_page_present(struct page *page);
> > -#endif	/* CONFIG_HIBERNATION */
> >   #else	/* CONFIG_DEBUG_PAGEALLOC */
> >   static inline void debug_pagealloc_map_pages(struct page *page,
> >   					     int numpages, int enable) {}
> > -#ifdef CONFIG_HIBERNATION
> > -static inline bool kernel_page_present(struct page *page) { return true; }
> > -#endif	/* CONFIG_HIBERNATION */
> >   #endif	/* CONFIG_DEBUG_PAGEALLOC */
> >   #ifdef __HAVE_ARCH_GATE_AREA
> > diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
> > index 860e0f843c12..fe1aa4e54680 100644
> > --- a/include/linux/set_memory.h
> > +++ b/include/linux/set_memory.h
> > @@ -23,6 +23,11 @@ static inline int set_direct_map_default_noflush(struct page *page)
> >   {
> >   	return 0;
> >   }
> > +
> > +static inline bool kernel_page_present(struct page *page)
> > +{
> > +	return true;
> > +}
> >   #endif
> >   #ifndef set_mce_nospec
> > 
> 
> It's somewhat weird to move this to set_memory.h - it's only one possible
> user. I think include/linux/mm.h is a better fit. Ack to making it
> independent of CONFIG_HIBERNATION.

Semantically this is a part of direct map manipulation, that's primarily
why I put it into set_memory.h

> in include/linux/mm.h , I'd prefer:
> 
> #if defined(CONFIG_DEBUG_PAGEALLOC) || \
>     defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP)

The second reason was to avoid this ^
and the third is -7 lines to include/linux/mm.h :)

> bool kernel_page_present(struct page *page);
> #else
> static inline bool kernel_page_present(struct page *page)
> {
> 	return true;
> }
> #endif
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v3 3/4] arch, mm: restore dependency of __kernel_map_pages() of DEBUG_PAGEALLOC
From: Mike Rapoport @ 2020-11-02 15:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Zijlstra, Dave Hansen, linux-mm, Paul Mackerras,
	Pavel Machek, H. Peter Anvin, sparclinux, Christoph Lameter,
	Will Deacon, linux-riscv, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, Catalin Marinas, Len Brown,
	Albert Ou, Vasily Gorbik, linux-pm, Heiko Carstens,
	David Rientjes, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Kirill A. Shutemov, Thomas Gleixner, Joonsoo Kim,
	linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
	Palmer Dabbelt, Andrew Morton, Edgecombe, Rick P, linuxppc-dev,
	David S. Miller
In-Reply-To: <8eac2aa4-114e-f981-c8f8-ad8523175cf8@redhat.com>

On Mon, Nov 02, 2020 at 10:23:20AM +0100, David Hildenbrand wrote:
> 
> >   int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
> >   				   unsigned numpages, unsigned long page_flags)
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 14e397f3752c..ab0ef6bd351d 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2924,7 +2924,11 @@ static inline bool debug_pagealloc_enabled_static(void)
> >   	return static_branch_unlikely(&_debug_pagealloc_enabled);
> >   }
> > -#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP)
> > +#ifdef CONFIG_DEBUG_PAGEALLOC
> > +/*
> > + * To support DEBUG_PAGEALLOC architecture must ensure that
> > + * __kernel_map_pages() never fails
> 
> Maybe add here, that this implies mapping everything via PTEs during boot.

This is more of an implementation detail, while assumption that
__kernel_map_pages() does not fail is somewhat a requirement :)

> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!

> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Cédric Le Goater @ 2020-11-02 14:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Qian Cai, Michael Ellerman
  Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
	Stephen Rothwell
In-Reply-To: <89726af2-00ca-9d47-f417-4bea8d5b8b1f@ozlabs.ru>

On 10/14/20 4:55 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/09/2020 17:06, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>>> When a passthrough IO adapter is removed from a pseries machine using
>>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>>> guest OS to clear all page table entries related to the adapter. If
>>>> some are still present, the RTAS call which isolates the PCI slot
>>>> returns error 9001 "valid outstanding translations" and the removal of
>>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>>> maps automatically the INTx interrupts in the Linux interrupt number
>>>> space but these are never removed.
>>>>
>>>> To solve this problem, we introduce a PPC platform specific
>>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>>> the bus is removed. This also clears the associated page table entries
>>>> of the ESB pages when using XIVE.
>>>>
>>>> For this purpose, we record the logical interrupt numbers of the
>>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>>> do the clean up.
>>>>
>>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>>> number of interrupt mappings is computed from the "interrupt-map"
>>>> property and the mapping array is allocated accordingly.
>>>>
>>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>>> this patch.
>>>
>>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>>
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue.
>>
>> I did try PHB removal under PowerNV, so I would like to understand
>> how we managed to remove twice the PCI bus and possibly reproduce.
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
> 
> 
> How do you remove PHBs exactly? There is no such thing in the powernv platform, I thought someone added this and you are fixing it but no. PHBs on powernv are created at the boot time and there is no way to remove them, you can only try removing all the bridges.

yes. I noticed that later when proposing the fix for the double 
free.

> So what exactly are you doing?

What you just said above, with the commands : 

  echo 1 >  /sys/devices/pci0031\:00/0031\:00\:00.0/remove
  echo 1 >  /sys/devices/pci0031\:00/pci_bus/0031\:00/rescan


C. 


^ permalink raw reply

* Re: [PATCH v3 2/4] PM: hibernate: make direct map manipulations more explicit
From: Mike Rapoport @ 2020-11-02 15:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Rafael J . Wysocki, Peter Zijlstra, Dave Hansen, linux-mm,
	Paul Mackerras, Pavel Machek, H. Peter Anvin, sparclinux,
	Christoph Lameter, Will Deacon, linux-riscv, linux-s390, x86,
	Mike Rapoport, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Len Brown, Albert Ou, Vasily Gorbik, linux-pm,
	Heiko Carstens, David Rientjes, Borislav Petkov, Andy Lutomirski,
	Paul Walmsley, Kirill A. Shutemov, Thomas Gleixner, Joonsoo Kim,
	linux-arm-kernel, Rafael J. Wysocki, linux-kernel, Pekka Enberg,
	Palmer Dabbelt, Andrew Morton, Edgecombe, Rick P, linuxppc-dev,
	David S. Miller
In-Reply-To: <55cd2a4a-cfa8-d420-66b3-a25fcdd9b876@redhat.com>

On Mon, Nov 02, 2020 at 10:19:36AM +0100, David Hildenbrand wrote:
> On 01.11.20 18:08, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be
> > not present in the direct map and has to be explicitly mapped before it
> > could be copied.
> > 
> > Introduce hibernate_map_page() that will explicitly use
> > set_direct_map_{default,invalid}_noflush() for ARCH_HAS_SET_DIRECT_MAP case
> > and debug_pagealloc_map_pages() for DEBUG_PAGEALLOC case.
> > 
> > The remapping of the pages in safe_copy_page() presumes that it only
> > changes protection bits in an existing PTE and so it is safe to ignore
> > return value of set_direct_map_{default,invalid}_noflush().
> > 
> > Still, add a WARN_ON() so that future changes in set_memory APIs will not
> > silently break hibernation.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >   include/linux/mm.h      | 12 ------------
> >   kernel/power/snapshot.c | 30 ++++++++++++++++++++++++++++--
> >   2 files changed, 28 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1fc0609056dc..14e397f3752c 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2927,16 +2927,6 @@ static inline bool debug_pagealloc_enabled_static(void)
> >   #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_ARCH_HAS_SET_DIRECT_MAP)
> >   extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> > -/*
> > - * When called in DEBUG_PAGEALLOC context, the call should most likely be
> > - * guarded by debug_pagealloc_enabled() or debug_pagealloc_enabled_static()
> > - */
> > -static inline void
> > -kernel_map_pages(struct page *page, int numpages, int enable)
> > -{
> > -	__kernel_map_pages(page, numpages, enable);
> > -}
> > -
> >   static inline void debug_pagealloc_map_pages(struct page *page,
> >   					     int numpages, int enable)
> >   {
> > @@ -2948,8 +2938,6 @@ static inline void debug_pagealloc_map_pages(struct page *page,
> >   extern bool kernel_page_present(struct page *page);
> >   #endif	/* CONFIG_HIBERNATION */
> >   #else	/* CONFIG_DEBUG_PAGEALLOC || CONFIG_ARCH_HAS_SET_DIRECT_MAP */
> > -static inline void
> > -kernel_map_pages(struct page *page, int numpages, int enable) {}
> >   static inline void debug_pagealloc_map_pages(struct page *page,
> >   					     int numpages, int enable) {}
> >   #ifdef CONFIG_HIBERNATION
> > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> > index 46b1804c1ddf..054c8cce4236 100644
> > --- a/kernel/power/snapshot.c
> > +++ b/kernel/power/snapshot.c
> > @@ -76,6 +76,32 @@ static inline void hibernate_restore_protect_page(void *page_address) {}
> >   static inline void hibernate_restore_unprotect_page(void *page_address) {}
> >   #endif /* CONFIG_STRICT_KERNEL_RWX  && CONFIG_ARCH_HAS_SET_MEMORY */
> > +static inline void hibernate_map_page(struct page *page, int enable)
> > +{
> > +	if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> > +		unsigned long addr = (unsigned long)page_address(page);
> > +		int ret;
> > +
> > +		/*
> > +		 * This should not fail because remapping a page here means
> > +		 * that we only update protection bits in an existing PTE.
> > +		 * It is still worth to have WARN_ON() here if something
> > +		 * changes and this will no longer be the case.
> > +		 */
> > +		if (enable)
> > +			ret = set_direct_map_default_noflush(page);
> > +		else
> > +			ret = set_direct_map_invalid_noflush(page);
> > +
> > +		if (WARN_ON(ret))
> > +			return;
> 
> People seem to prefer pr_warn() now that production kernels have panic on
> warn enabled. It's weird.

Weird indeed as the whole point of WARN to yell without causing a
crash...
I can change to pr_warn though...

> > +
> > +		flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > +	} else {
> > +		debug_pagealloc_map_pages(page, 1, enable);
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks!

> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v2 2/2] misc: ocxl: config: Rename function attribute description
From: Frederic Barrat @ 2020-11-02 14:29 UTC (permalink / raw)
  To: Lee Jones, gregkh, arnd; +Cc: linuxppc-dev, linux-kernel, Andrew Donnellan
In-Reply-To: <20201102142001.560490-2-lee.jones@linaro.org>



Le 02/11/2020 à 15:20, Lee Jones a écrit :
> Fixes the following W=1 kernel build warning(s):
> 
>   drivers/misc/ocxl/config.c:81: warning: Function parameter or member 'dev' not described in 'get_function_0'
>   drivers/misc/ocxl/config.c:81: warning: Excess function parameter 'device' description in 'get_function_0'
> 
> Cc: Frederic Barrat <fbarrat@linux.ibm.com>
> Cc: Andrew Donnellan <ajd@linux.ibm.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Lee Jones <lee.jones@linaro.org>
> ---


Thanks!
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   drivers/misc/ocxl/config.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index 4d490b92d951f..a68738f382521 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -73,7 +73,7 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
>   
>   /**
>    * get_function_0() - Find a related PCI device (function 0)
> - * @device: PCI device to match
> + * @dev: PCI device to match
>    *
>    * Returns a pointer to the related device, or null if not found
>    */
> 

^ permalink raw reply

* [PATCH v2 2/2] misc: ocxl: config: Rename function attribute description
From: Lee Jones @ 2020-11-02 14:20 UTC (permalink / raw)
  To: gregkh, arnd
  Cc: Frederic Barrat, linuxppc-dev, Lee Jones, linux-kernel,
	Andrew Donnellan
In-Reply-To: <20201102142001.560490-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 drivers/misc/ocxl/config.c:81: warning: Function parameter or member 'dev' not described in 'get_function_0'
 drivers/misc/ocxl/config.c:81: warning: Excess function parameter 'device' description in 'get_function_0'

Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/misc/ocxl/config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 4d490b92d951f..a68738f382521 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -73,7 +73,7 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
 
 /**
  * get_function_0() - Find a related PCI device (function 0)
- * @device: PCI device to match
+ * @dev: PCI device to match
  *
  * Returns a pointer to the related device, or null if not found
  */
-- 
2.25.1


^ permalink raw reply related

* Re: Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"
From: 'Greg KH' @ 2020-11-02 13:52 UTC (permalink / raw)
  To: David Laight
  Cc: linux-aio@kvack.org, 'David Hildenbrand',
	linux-mips@vger.kernel.org, David Howells, linux-mm@kvack.org,
	keyrings@vger.kernel.org, sparclinux@vger.kernel.org,
	Christoph Hellwig, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-scsi@vger.kernel.org,
	kernel-team@android.com, Arnd Bergmann,
	linux-block@vger.kernel.org, Al Viro, io-uring@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Jens Axboe,
	linux-parisc@vger.kernel.org, netdev@vger.kernel.org,
	Nick Desaulniers, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <0ab5ac71f28d459db2f350c2e07b88ca@AcuMS.aculab.com>

On Mon, Nov 02, 2020 at 09:06:38AM +0000, David Laight wrote:
> From: 'Greg KH'
> > Sent: 23 October 2020 15:47
> > 
> > On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> > > From: David Hildenbrand
> > > > Sent: 23 October 2020 15:33
> > > ...
> > > > I just checked against upstream code generated by clang 10 and it
> > > > properly discards the upper 32bit via a mov w23 w2.
> > > >
> > > > So at least clang 10 indeed properly assumes we could have garbage and
> > > > masks it off.
> > > >
> > > > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > > > behaves differently.
> > >
> > > We'll need the disassembly from a failing kernel image.
> > > It isn't that big to hand annotate.
> > 
> > I've worked around the merge at the moment in the android tree, but it
> > is still quite reproducable, and will try to get a .o file to
> > disassemble on Monday or so...
> 
> Did this get properly resolved?

For some reason, 5.10-rc2 fixed all of this up.  I backed out all of the
patches I had to revert to get 5.10-rc1 to work properly, and then did
the merge and all is well.

It must have been something to do with the compat changes in this same
area that went in after 5.10-rc1, and something got reorganized in the
files somehow.  I really do not know, and at the moment, don't have the
time to track it down anymore.  So for now, I'd say it's all good, sorry
for the noise.

greg k-h

^ permalink raw reply

* Re: [PATCH 23/23] mtd: devices: powernv_flash: Add function names to headers and fix 'dev'
From: Miquel Raynal @ 2020-11-02 12:52 UTC (permalink / raw)
  To: Lee Jones
  Cc: vigneshr, linux-kernel, Paul Mackerras, Richard Weinberger,
	linux-mtd, linuxppc-dev, Cyril Bur
In-Reply-To: <20201102115406.1074327-24-lee.jones@linaro.org>

Hi Lee,

Lee Jones <lee.jones@linaro.org> wrote on Mon,  2 Nov 2020 11:54:06
+0000:

> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/mtd/devices/powernv_flash.c:129: warning: Cannot understand  * @mtd: the device
>  drivers/mtd/devices/powernv_flash.c:145: warning: Cannot understand  * @mtd: the device
>  drivers/mtd/devices/powernv_flash.c:161: warning: Cannot understand  * @mtd: the device
>  drivers/mtd/devices/powernv_flash.c:184: warning: Function parameter or member 'dev' not described in 'powernv_flash_set_driver_info'
> 
> Cc: Miquel Raynal <miquel.raynal@bootlin.com>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Vignesh Raghavendra <vigneshr@ti.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Cyril Bur <cyril.bur@au1.ibm.com>
> Cc: linux-mtd@lists.infradead.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Lee Jones <lee.jones@linaro.org>
> ---
>  drivers/mtd/devices/powernv_flash.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c
> index 0b757d9ba2f6b..32cb0e649096f 100644
> --- a/drivers/mtd/devices/powernv_flash.c
> +++ b/drivers/mtd/devices/powernv_flash.c
> @@ -126,6 +126,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op,
>  }
>  
>  /**
> + * powernv_flash_read
> + *

Perhaps we should not add blank lines if the rest of the file does not
already have such spacing (see below).

>   * @mtd: the device
>   * @from: the offset to read from
>   * @len: the number of bytes to read
> @@ -142,6 +144,7 @@ static int powernv_flash_read(struct mtd_info *mtd, loff_t from, size_t len,
>  }
>  
>  /**
> + * powernv_flash_write
>   * @mtd: the device
>   * @to: the offset to write to
>   * @len: the number of bytes to write
> @@ -158,6 +161,7 @@ static int powernv_flash_write(struct mtd_info *mtd, loff_t to, size_t len,
>  }
>  
>  /**
> + * powernv_flash_erase
>   * @mtd: the device
>   * @erase: the erase info
>   * Returns 0 if erase successful or -ERRNO if an error occurred
> @@ -176,7 +180,7 @@ static int powernv_flash_erase(struct mtd_info *mtd, struct erase_info *erase)
>  
>  /**
>   * powernv_flash_set_driver_info - Fill the mtd_info structure and docg3
> - * structure @pdev: The platform device
> + * @dev: The device structure
>   * @mtd: The structure to fill
>   */
>  static int powernv_flash_set_driver_info(struct device *dev,


Thanks,
Miquèl

^ permalink raw reply

* [PATCH 23/23] mtd: devices: powernv_flash: Add function names to headers and fix 'dev'
From: Lee Jones @ 2020-11-02 11:54 UTC (permalink / raw)
  To: vigneshr
  Cc: linuxppc-dev, Miquel Raynal, linux-kernel, Paul Mackerras,
	Richard Weinberger, linux-mtd, Lee Jones, Cyril Bur
In-Reply-To: <20201102115406.1074327-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 drivers/mtd/devices/powernv_flash.c:129: warning: Cannot understand  * @mtd: the device
 drivers/mtd/devices/powernv_flash.c:145: warning: Cannot understand  * @mtd: the device
 drivers/mtd/devices/powernv_flash.c:161: warning: Cannot understand  * @mtd: the device
 drivers/mtd/devices/powernv_flash.c:184: warning: Function parameter or member 'dev' not described in 'powernv_flash_set_driver_info'

Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Cyril Bur <cyril.bur@au1.ibm.com>
Cc: linux-mtd@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/mtd/devices/powernv_flash.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c
index 0b757d9ba2f6b..32cb0e649096f 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -126,6 +126,8 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op,
 }
 
 /**
+ * powernv_flash_read
+ *
  * @mtd: the device
  * @from: the offset to read from
  * @len: the number of bytes to read
@@ -142,6 +144,7 @@ static int powernv_flash_read(struct mtd_info *mtd, loff_t from, size_t len,
 }
 
 /**
+ * powernv_flash_write
  * @mtd: the device
  * @to: the offset to write to
  * @len: the number of bytes to write
@@ -158,6 +161,7 @@ static int powernv_flash_write(struct mtd_info *mtd, loff_t to, size_t len,
 }
 
 /**
+ * powernv_flash_erase
  * @mtd: the device
  * @erase: the erase info
  * Returns 0 if erase successful or -ERRNO if an error occurred
@@ -176,7 +180,7 @@ static int powernv_flash_erase(struct mtd_info *mtd, struct erase_info *erase)
 
 /**
  * powernv_flash_set_driver_info - Fill the mtd_info structure and docg3
- * structure @pdev: The platform device
+ * @dev: The device structure
  * @mtd: The structure to fill
  */
 static int powernv_flash_set_driver_info(struct device *dev,
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox