LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Thomas Gleixner @ 2020-09-21  7:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, linux-sparc, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, the arch/x86 maintainers,
	Russell King, linux-csky, David Airlie, Mel Gorman,
	open list:SYNOPSYS ARC ARCHITECTURE, linux-xtensa, Paul McKenney,
	intel-gfx, linuxppc-dev, Steven Rostedt, Jani Nikula,
	Rodrigo Vivi, Dietmar Eggemann, Linux ARM, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
	Vineet Gupta, LKML, Arnd Bergmann, Daniel Vetter, Paul Mackerras,
	Andrew Morton, Daniel Bristot de Oliveira, David S. Miller,
	Greentime Hu
In-Reply-To: <CAHk-=wgF-upZVpqJWK=TK7MS9H-Rp1ZxGfOG+dDW=JThtxAzVQ@mail.gmail.com>

On Sun, Sep 20 2020 at 10:42, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 10:40 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> I think the more obvious solution is to split the whole exercise:
>>
>>   schedule()
>>      prepare_switch()
>>         unmap()
>>
>>     switch_to()
>>
>>     finish_switch()
>>         map()
>
> Yeah, that looks much easier to explain. Ack.

So far so good, but Peter Z. just pointed out to me that I completely
missed the fact that this cannot work.

If a task is migrated to a different CPU then the mapping address will
change which will explode in colourful ways.

On RT kernels this works because we ping the task to the CPU via
migrate_disable(). On a !RT kernel migrate_disable() maps to
preempt_disable() which brings us back to square one.

/me goes back to the drawing board.

Thanks,

        tglx

^ permalink raw reply

* [PATCH V2] powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints
From: Athira Rajeev @ 2020-09-21  7:10 UTC (permalink / raw)
  To: mpe; +Cc: maddy, linuxppc-dev

PMU counter support functions enforces event constraints for group of
events to check if all events in a group can be monitored. Incase of
event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ),
not all constraints are applicable, say the threshold or sample bits.
But current code includes pmc5 and pmc6 in some group constraints (like
IC_DC Qualifier bits) which is actually not applicable and hence results
in those events not getting counted when scheduled along with group of
other events. Patch fixes this by excluding PMC5/6 from constraints
which are not relevant for it.

Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changes in v2:
- Added a block comment in the fix path explaining
  why the change is needed.

 arch/powerpc/perf/isa207-common.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 964437a..12153da 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -288,6 +288,15 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 
 		mask  |= CNST_PMC_MASK(pmc);
 		value |= CNST_PMC_VAL(pmc);
+
+		/*
+		 * PMC5 and PMC6 are used to count cycles and instructions
+		 * and these doesnot support most of the constraint bits.
+		 * Add a check to exclude PMC5/6 from most of the constraints
+		 * except for ebb/bhrb.
+		 */
+		if (pmc >= 5)
+			goto ebb_bhrb;
 	}
 
 	if (pmc <= 4) {
@@ -357,6 +366,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 		}
 	}
 
+ebb_bhrb:
 	if (!pmc && ebb)
 		/* EBB events must specify the PMC */
 		return -1;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*
From: Christoph Hellwig @ 2020-09-21  6:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, sparclinux, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, x86, Russell King,
	linux-csky, David Airlie, Mel Gorman, linux-snps-arc,
	linux-xtensa, Paul McKenney, intel-gfx, linuxppc-dev,
	Steven Rostedt, Linus Torvalds, Jani Nikula, Rodrigo Vivi,
	Dietmar Eggemann, Linux ARM, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Nick Hu, Linux-MM, Vineet Gupta, LKML,
	Arnd Bergmann, Daniel Vetter, Paul Mackerras, Andrew Morton,
	Daniel Bristot de Oliveira, David S. Miller, Greentime Hu
In-Reply-To: <20200919092615.990731525@linutronix.de>

> +# ifndef ARCH_NEEDS_KMAP_HIGH_GET
> +static inline void *arch_kmap_temporary_high_get(struct page *page)
> +{
> +	return NULL;
> +}
> +# endif

Turn this into a macro and use #ifndef on the symbol name?

> +static inline void __kunmap_atomic(void *addr)
> +{
> +	kumap_atomic_indexed(addr);
> +}
> +
> +
> +#endif /* CONFIG_KMAP_ATOMIC_GENERIC */

Stange double empty line above the endif.

> -#define kunmap_atomic(addr)                                     \
> -do {                                                            \
> -	BUILD_BUG_ON(__same_type((addr), struct page *));       \
> -	kunmap_atomic_high(addr);                                  \
> -	pagefault_enable();                                     \
> -	preempt_enable();                                       \
> -} while (0)
> -
> +#define kunmap_atomic(addr)						\
> +	do {								\
> +		BUILD_BUG_ON(__same_type((addr), struct page *));	\
> +		__kunmap_atomic(addr);					\
> +		preempt_enable();					\
> +	} while (0)

Why the strange re-indent to a form that is much less common and less
readable?

> +void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
> +{
> +	pagefault_disable();
> +	return __kmap_atomic_pfn_prot(pfn, prot);
> +}
> +EXPORT_SYMBOL(kmap_atomic_pfn_prot);

The existing kmap_atomic_pfn & co implementation is EXPORT_SYMBOL_GPL,
and this stuff should preferably stay that way.

^ permalink raw reply

* Re: [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()
From: Christoph Hellwig @ 2020-09-21  6:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, sparclinux, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, x86, Russell King,
	linux-csky, David Airlie, Mel Gorman, linux-snps-arc,
	linux-xtensa, Paul McKenney, intel-gfx, linuxppc-dev,
	Steven Rostedt, Linus Torvalds, Jani Nikula, Rodrigo Vivi,
	Dietmar Eggemann, Linux ARM, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Nick Hu, Linux-MM, Vineet Gupta, LKML,
	Arnd Bergmann, Daniel Vetter, Paul Mackerras, Andrew Morton,
	Daniel Bristot de Oliveira, David S. Miller, Greentime Hu
In-Reply-To: <20200919092615.879315697@linutronix.de>

On Sat, Sep 19, 2020 at 11:17:52AM +0200, Thomas Gleixner wrote:
> Nothing in modules can use that.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: let import_iovec deal with compat_iovecs as well
From: 'Christoph Hellwig' @ 2020-09-21  4:41 UTC (permalink / raw)
  To: David Laight
  Cc: linux-aio@kvack.org, linux-mips@vger.kernel.org, David Howells,
	linux-mm@kvack.org, keyrings@vger.kernel.org,
	sparclinux@vger.kernel.org, 'Christoph Hellwig',
	linux-arch@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-scsi@vger.kernel.org, x86@kernel.org, Arnd Bergmann,
	linux-block@vger.kernel.org, Alexander Viro,
	io-uring@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Jens Axboe, linux-parisc@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2c7bf42ee4314484ae0177280cd8f5f3@AcuMS.aculab.com>

On Sat, Sep 19, 2020 at 02:24:10PM +0000, David Laight wrote:
> I thought about that change while writing my import_iovec() => iovec_import()
> patch - and thought that the io_uring code would (as usual) cause grief.
> 
> Christoph - did you see those patches?

No.

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Christoph Hellwig @ 2020-09-21  4:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Matthew Wilcox, Arnd Bergmann,
	linux-block, Al Viro, io-uring, linux-arm-kernel, Jens Axboe,
	Parisc List, Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrWHW4wHG+Z-mxsY-kvjSi+S6gRUQ+LHd9syPcm5bhi3cw@mail.gmail.com>

On Sun, Sep 20, 2020 at 12:14:49PM -0700, Andy Lutomirski wrote:
> I wonder if this is really quite cast in stone.  We could also have
> FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
> compat mode.  Then that particular struct file would be read and
> written using the compat data format.  The change would be
> user-visible, but the user that would see it would be very strange
> indeed.
> 
> I don't have a strong opinion as to whether that is better or worse
> than denying io_uring access to these things, but at least it moves
> the special case out of io_uring.

open could have happened through an io_uring thread a well, so I don't
see how this would do anything but move the problem to a different
place.

> 
> --Andy
---end quoted text---

^ permalink raw reply

* [PATCH] fsl: imx-audmix : Use devm_kcalloc() instead of devm_kzalloc()
From: Xu Wang @ 2020-09-21  1:59 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, shengjiu.wang,
	lgirdwood, broonie, perex, tiwai, shawnguo, s.hauer, kernel,
	linux-imx, alsa-devel, linuxppc-dev, linux-arm-kernel
  Cc: linux-kernel, Xu Wang

A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "devm_kcalloc".

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
---
 sound/soc/fsl/imx-audmix.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index 202fb8950078..cbdc0a2c09c5 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -185,20 +185,20 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	priv->num_dai = 2 * num_dai;
-	priv->dai = devm_kzalloc(&pdev->dev, priv->num_dai *
+	priv->dai = devm_kcalloc(&pdev->dev, priv->num_dai,
 				 sizeof(struct snd_soc_dai_link), GFP_KERNEL);
 	if (!priv->dai)
 		return -ENOMEM;
 
 	priv->num_dai_conf = num_dai;
-	priv->dai_conf = devm_kzalloc(&pdev->dev, priv->num_dai_conf *
+	priv->dai_conf = devm_kcalloc(&pdev->dev, priv->num_dai_conf,
 				      sizeof(struct snd_soc_codec_conf),
 				      GFP_KERNEL);
 	if (!priv->dai_conf)
 		return -ENOMEM;
 
 	priv->num_dapm_routes = 3 * num_dai;
-	priv->dapm_routes = devm_kzalloc(&pdev->dev, priv->num_dapm_routes *
+	priv->dapm_routes = devm_kcalloc(&pdev->dev, priv->num_dapm_routes,
 					 sizeof(struct snd_soc_dapm_route),
 					 GFP_KERNEL);
 	if (!priv->dapm_routes)
@@ -208,7 +208,7 @@ static int imx_audmix_probe(struct platform_device *pdev)
 		struct snd_soc_dai_link_component *dlc;
 
 		/* for CPU/Codec/Platform x 2 */
-		dlc = devm_kzalloc(&pdev->dev, 6 * sizeof(*dlc), GFP_KERNEL);
+		dlc = devm_kcalloc(&pdev->dev, 6, sizeof(*dlc), GFP_KERNEL);
 		if (!dlc) {
 			dev_err(&pdev->dev, "failed to allocate dai_link\n");
 			return -ENOMEM;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 21:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920192259.GU32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 08:22:59PM +0100, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
> 
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

What for?  I mean, is there any reason to try and keep those bugs as
first-class citizens?  IDGI...  Yes, we have several special files
(out of thousands) that have read()/write() user-visible semantics
broken wrt 32bit/64bit.  And we have to keep them working that way
for existing syscalls.  Why would we want to pretend that their
behaviour is normal and isn't an ABI bug, not to be repeated for
anything new?

^ permalink raw reply

* RE: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: David Laight @ 2020-09-20 21:13 UTC (permalink / raw)
  To: 'Arnd Bergmann', Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM,
	keyrings@vger.kernel.org, sparclinux, Christoph Hellwig,
	linux-arch, linux-s390, Linux SCSI List, X86 ML, Matthew Wilcox,
	linux-block, Al Viro, io-uring@vger.kernel.org, linux-arm-kernel,
	Jens Axboe, Parisc List, Network Development, LKML, LSM List,
	Linux FS Devel, Andrew Morton, linuxppc-dev
In-Reply-To: <CAK8P3a37BRFj_qg61gP2oVrjJzBrZ58y1vggeTk_5n55Ou5U2Q@mail.gmail.com>

From: Arnd Bergmann
> Sent: 20 September 2020 21:49
> 
> On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
> > On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > > > IMO it's much saner to mark those and refuse to touch them from io_uring...
> > >
> > > Simpler solution is to remove io_uring from the 32-bit syscall list.
> > > If you're a 32-bit process, you don't get to use io_uring.  Would
> > > any real users actually care about that?
> >
> > We could go one step farther and declare that we're done adding *any*
> > new compat syscalls :)
> 
> Would you also stop adding system calls to native 32-bit systems then?
> 
> On memory constrained systems (less than 2GB a.t.m.), there is still a
> strong demand for running 32-bit user space, but all of the recent Arm
> cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
> that compat mode may eventually become the primary way to run
> Linux on cheap embedded systems.
> 
> I don't think there is any chance we can realistically take away io_uring
> from the 32-bit ABI any more now.

Can't it just run requests from 32bit apps in a kernel thread that has
the 'in_compat_syscall' flag set?
Not that i recall seeing the code where it saves the 'compat' nature
of any requests.

It is already completely f*cked if you try to pass the command ring
to a child process - it uses the wrong 'mm'.
I suspect there are some really horrid security holes in that area.

	David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Arnd Bergmann @ 2020-09-20 20:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Matthew Wilcox, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrXVtBkxNJcMxf9myaKT9snHKbCWUenKHGRfp8AOtORBPg@mail.gmail.com>

On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote:
> On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > > IMO it's much saner to mark those and refuse to touch them from io_uring...
> >
> > Simpler solution is to remove io_uring from the 32-bit syscall list.
> > If you're a 32-bit process, you don't get to use io_uring.  Would
> > any real users actually care about that?
>
> We could go one step farther and declare that we're done adding *any*
> new compat syscalls :)

Would you also stop adding system calls to native 32-bit systems then?

On memory constrained systems (less than 2GB a.t.m.), there is still a
strong demand for running 32-bit user space, but all of the recent Arm
cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so
that compat mode may eventually become the primary way to run
Linux on cheap embedded systems.

I don't think there is any chance we can realistically take away io_uring
from the 32-bit ABI any more now.

      Arnd

^ permalink raw reply

* [PATCH] soc: fsl: qbman: Fix return value on success
From: Krzysztof Kozlowski @ 2020-09-20 20:26 UTC (permalink / raw)
  To: Li Yang, Roy Pledge, linuxppc-dev, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski

On error the function was meant to return -ERRNO.  This also fixes
compile warning:

  drivers/soc/fsl/qbman/bman.c:640:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]

Fixes: 0505d00c8dba ("soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to bootup")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
---
 drivers/soc/fsl/qbman/bman.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index f4fb527d8301..c5dd026fe889 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -660,7 +660,7 @@ int bm_shutdown_pool(u32 bpid)
 	}
 done:
 	put_affine_portal();
-	return 0;
+	return err;
 }
 
 struct gen_pool *bm_bpalloc;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] soc: fsl: dpio: remove set but not used 'addr_cena'
From: Krzysztof Kozlowski @ 2020-09-20 20:19 UTC (permalink / raw)
  To: Jason Yan
  Cc: Roy.Pledge, linux-kernel@vger.kernel.org, youri.querry_1,
	Hulk Robot, leoyang.li, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200910140415.1132266-1-yanaijie@huawei.com>

On Thu, 10 Sep 2020 at 16:57, Jason Yan <yanaijie@huawei.com> wrote:
>
> This addresses the following gcc warning with "make W=1":
>
> drivers/soc/fsl/dpio/qbman-portal.c: In function
> ‘qbman_swp_enqueue_multiple_direct’:
> drivers/soc/fsl/dpio/qbman-portal.c:650:11: warning: variable
> ‘addr_cena’ set but not used [-Wunused-but-set-variable]
>   650 |  uint64_t addr_cena;
>       |           ^~~~~~~~~
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Jason Yan <yanaijie@huawei.com>

This was already reported:
Reported-by: kernel test robot <lkp@intel.com>
https://lkml.org/lkml/2020/6/12/290

Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>

Best regards,
Krzysztof

> ---
>  drivers/soc/fsl/dpio/qbman-portal.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
> index 0ab85bfb116f..659b4a570d5b 100644
> --- a/drivers/soc/fsl/dpio/qbman-portal.c
> +++ b/drivers/soc/fsl/dpio/qbman-portal.c
> @@ -647,7 +647,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>         const uint32_t *cl = (uint32_t *)d;
>         uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
>         int i, num_enqueued = 0;
> -       uint64_t addr_cena;
>
>         spin_lock(&s->access_spinlock);
>         half_mask = (s->eqcr.pi_ci_mask>>1);
> @@ -701,7 +700,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>
>         /* Flush all the cacheline without load/store in between */
>         eqcr_pi = s->eqcr.pi;
> -       addr_cena = (size_t)s->addr_cena;
>         for (i = 0; i < num_enqueued; i++)
>                 eqcr_pi++;
>         s->eqcr.pi = eqcr_pi & full_mask;
> --
> 2.25.4
>

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Andy Lutomirski @ 2020-09-20 19:28 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <20200920192259.GU32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> > IMO it's much saner to mark those and refuse to touch them from io_uring...
>
> Simpler solution is to remove io_uring from the 32-bit syscall list.
> If you're a 32-bit process, you don't get to use io_uring.  Would
> any real users actually care about that?

We could go one step farther and declare that we're done adding *any*
new compat syscalls :)



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Matthew Wilcox @ 2020-09-20 19:22 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920191031.GQ3421308@ZenIV.linux.org.uk>

On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote:
> IMO it's much saner to mark those and refuse to touch them from io_uring...

Simpler solution is to remove io_uring from the 32-bit syscall list.
If you're a 32-bit process, you don't get to use io_uring.  Would
any real users actually care about that?

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Andy Lutomirski @ 2020-09-20 19:14 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Matthew Wilcox, Arnd Bergmann,
	linux-block, io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <20200920180742.GN3421308@ZenIV.linux.org.uk>

On Sun, Sep 20, 2020 at 11:07 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote:
> > On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > > Add a flag to force processing a syscall as a compat syscall.  This is
> > > required so that in_compat_syscall() works for I/O submitted by io_uring
> > > helper threads on behalf of compat syscalls.
> >
> > Al doesn't like this much, but my suggestion is to introduce two new
> > opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> > can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> > code can know what that user pointer is pointing to.
>
> Let's separate two issues:
>         1) compat syscalls want 32bit iovecs.  Nothing to do with the
> drivers, dealt with just fine.
>         2) a few drivers are really fucked in head.  They use different
> *DATA* layouts for reads/writes, depending upon the calling process.
> IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> reads from stdin in parent and child will yield different data layouts.
> On the same struct file.
>         That's what Christoph worries about (/dev/sg he'd mentioned is
> one of those).
>
>         IMO we should simply have that dozen or so of pathological files
> marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> it describes the userland ABI provided by those.  And it's cast in stone.
>

I wonder if this is really quite cast in stone.  We could also have
FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in
compat mode.  Then that particular struct file would be read and
written using the compat data format.  The change would be
user-visible, but the user that would see it would be very strange
indeed.

I don't have a strong opinion as to whether that is better or worse
than denying io_uring access to these things, but at least it moves
the special case out of io_uring.

--Andy

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 19:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920190159.GT32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 08:01:59PM +0100, Matthew Wilcox wrote:
> On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:
> > 	2) a few drivers are really fucked in head.  They use different
> > *DATA* layouts for reads/writes, depending upon the calling process.
> > IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> > reads from stdin in parent and child will yield different data layouts.
> > On the same struct file.
> > 	That's what Christoph worries about (/dev/sg he'd mentioned is
> > one of those).
> > 
> > 	IMO we should simply have that dozen or so of pathological files
> > marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> > it describes the userland ABI provided by those.  And it's cast in stone.
> > 
> > 	Any in_compat_syscall() in ->read()/->write() instances is an ABI
> > bug, plain and simple.  Some are unfixable for compatibility reasons, but
> > any new caller like that should be a big red flag.
> 
> So an IOCB_COMPAT flag would let us know whether the caller is expecting
> a 32-bit or 64-bit layout?  And io_uring could set it based on the
> ctx->compat flag.

So you want to convert all that crap to a form that would see iocb
(IOW, ->read_iter()/->write_iter())?  And, since quite a few are sysfs
attributes, propagate that through sysfs, changing the method signatures
to match that and either modifying fuckloads of instances or introducing
new special methods for the ones that are sensitive to that crap?

IMO it's much saner to mark those and refuse to touch them from io_uring...

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Matthew Wilcox @ 2020-09-20 19:01 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920180742.GN3421308@ZenIV.linux.org.uk>

On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:
> 	2) a few drivers are really fucked in head.  They use different
> *DATA* layouts for reads/writes, depending upon the calling process.
> IOW, if you fork/exec a 32bit binary and your stdin is one of those,
> reads from stdin in parent and child will yield different data layouts.
> On the same struct file.
> 	That's what Christoph worries about (/dev/sg he'd mentioned is
> one of those).
> 
> 	IMO we should simply have that dozen or so of pathological files
> marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
> it describes the userland ABI provided by those.  And it's cast in stone.
> 
> 	Any in_compat_syscall() in ->read()/->write() instances is an ABI
> bug, plain and simple.  Some are unfixable for compatibility reasons, but
> any new caller like that should be a big red flag.

So an IOCB_COMPAT flag would let us know whether the caller is expecting
a 32-bit or 64-bit layout?  And io_uring could set it based on the
ctx->compat flag.

> 	Current list of those turds:
> /dev/sg (pointer-chasing, generally insane)
> /sys/firmware/efi/vars/*/raw_var (fucked binary structure)
> /sys/firmware/efi/vars/new_var (fucked binary structure)
> /sys/firmware/efi/vars/del_var (fucked binary structure)
> /dev/uhid	(pointer-chasing for one obsolete command)
> /dev/input/event* (timestamps)
> /dev/uinput (timestamps)
> /proc/bus/input/devices (fucked bitmap-to-text representation)
> /sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 18:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920180742.GN3421308@ZenIV.linux.org.uk>

On Sun, Sep 20, 2020 at 07:07:42PM +0100, Al Viro wrote:

> /proc/bus/input/devices (fucked bitmap-to-text representation)

To illustrate the, er, beauty of that stuff:

; cat32 /proc/bus/input/devices >/tmp/a
; cat /proc/bus/input/devices >/tmp/b
; diff -u /tmp/a /tmp/b|grep '^[-+]'
--- /tmp/a      2020-09-20 14:28:43.442560691 -0400
+++ /tmp/b      2020-09-20 14:28:49.018543230 -0400
-B: KEY=1100f 2902000 8380307c f910f001 feffffdf ffefffff ffffffff fffffffe
+B: KEY=1100f02902000 8380307cf910f001 feffffdfffefffff fffffffffffffffe
-B: KEY=70000 0 0 0 0 0 0 0 0
+B: KEY=70000 0 0 0 0
-B: KEY=420 0 70000 0 0 0 0 0 0 0 0
+B: KEY=420 70000 0 0 0 0
-B: KEY=100000 0 0 0
+B: KEY=10000000000000 0
-B: KEY=4000 0 0 0 0
+B: KEY=4000 0 0
-B: KEY=8000 0 0 0 0 0 1100b 800 2 200000 0 0 0 0
+B: KEY=800000000000 0 0 1100b00000800 200200000 0 0
-B: KEY=3e000b 0 0 0 0 0 0 0
+B: KEY=3e000b00000000 0 0 0
-B: KEY=ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff fffffffe
+B: KEY=ffffffffffffffff ffffffffffffffff ffffffffffffffff fffffffffffffffe
; 
(cat32 being a 32bit binary of cat)
All the differences are due to homegrown bitmap-to-text conversion.

Note that feeding that into a pipe leaves the recepient with a lovely problem -
you can't go by the width of words (they are not zero-padded) and you can't
go by the number of words either - it varies from device to device.

And there's nothing we can do with that - it's a userland ABI, can't be
changed without breaking stuff.  I would prefer to avoid additional examples
like that, TYVM...

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: William Kucharski @ 2020-09-20 15:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, Alexander Viro, io-uring,
	linux-arm-kernel, Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920151510.GS32101@casper.infradead.org>

I really like that as it’s self-documenting and anyone debugging it can see what is actually being used at a glance.

> On Sep 20, 2020, at 09:15, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
>> Add a flag to force processing a syscall as a compat syscall.  This is
>> required so that in_compat_syscall() works for I/O submitted by io_uring
>> helper threads on behalf of compat syscalls.
> 
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 18:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, io-uring,
	linux-arm-kernel, Jens Axboe, Parisc List, Network Development,
	LKML, LSM List, Linux FS Devel, Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrWj1i-oyfA1rCXsNqdJddK6Vwm=W31YEf=k-OMBTC0vHw@mail.gmail.com>

On Sun, Sep 20, 2020 at 09:59:36AM -0700, Andy Lutomirski wrote:

> As one example, look at __sys_setsockopt().  It's called for the
> native and compat versions, and it contains an in_compat_syscall()
> check.  (This particularly check looks dubious to me, but that's
> another story.)  If this were to be done with equivalent semantics
> without a separate COMPAT_DEFINE_SYSCALL and without
> in_compat_syscall(), there would need to be some indication as to
> whether this is compat or native setsockopt.  There are other
> setsockopt implementations in the net stack with more
> legitimate-seeming uses of in_compat_syscall() that would need some
> other mechanism if in_compat_syscall() were to go away.
> 
> setsockopt is (I hope!) out of scope for io_uring, but the situation
> isn't fundamentally different from read and write.

	Except that setsockopt() had that crap very widespread; for read()
and write() those are very rare exceptions.

	Andy, please RTFS.  Or dig through archives.  The situation
with setsockopt() is *NOT* a good thing - it's (probably) the least
of the evils.  The last thing we need is making that the norm.

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Al Viro @ 2020-09-20 18:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-aio, linux-mips, David Howells, linux-mm, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390, linux-scsi,
	x86, Arnd Bergmann, linux-block, io-uring, linux-arm-kernel,
	Jens Axboe, linux-parisc, netdev, linux-kernel,
	linux-security-module, linux-fsdevel, Andrew Morton, linuxppc-dev
In-Reply-To: <20200920151510.GS32101@casper.infradead.org>

On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote:
> > Add a flag to force processing a syscall as a compat syscall.  This is
> > required so that in_compat_syscall() works for I/O submitted by io_uring
> > helper threads on behalf of compat syscalls.
> 
> Al doesn't like this much, but my suggestion is to introduce two new
> opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32.  The compat code
> can translate IORING_OP_READV to IORING_OP_READV32 and then the core
> code can know what that user pointer is pointing to.

Let's separate two issues:
	1) compat syscalls want 32bit iovecs.  Nothing to do with the
drivers, dealt with just fine.
	2) a few drivers are really fucked in head.  They use different
*DATA* layouts for reads/writes, depending upon the calling process.
IOW, if you fork/exec a 32bit binary and your stdin is one of those,
reads from stdin in parent and child will yield different data layouts.
On the same struct file.
	That's what Christoph worries about (/dev/sg he'd mentioned is
one of those).

	IMO we should simply have that dozen or so of pathological files
marked with FMODE_SHITTY_ABI; it's not about how they'd been opened -
it describes the userland ABI provided by those.  And it's cast in stone.

	Any in_compat_syscall() in ->read()/->write() instances is an ABI
bug, plain and simple.  Some are unfixable for compatibility reasons, but
any new caller like that should be a big red flag.

	How we import iovec array is none of the drivers' concern; we do
not need to mess with in_compat_syscall() reporting the matching value,
etc. for that.  It's about the instances that want in_compat_syscall() to
decide between the 32bit and 64bit data layouts.  And I believe that
we should simply have them marked as such and rejected by io_uring.  With
any new occurences getting slapped down hard.

	Current list of those turds:
/dev/sg (pointer-chasing, generally insane)
/sys/firmware/efi/vars/*/raw_var (fucked binary structure)
/sys/firmware/efi/vars/new_var (fucked binary structure)
/sys/firmware/efi/vars/del_var (fucked binary structure)
/dev/uhid	(pointer-chasing for one obsolete command)
/dev/input/event* (timestamps)
/dev/uinput (timestamps)
/proc/bus/input/devices (fucked bitmap-to-text representation)
/sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)

^ permalink raw reply

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Linus Torvalds @ 2020-09-20 17:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, linux-sparc, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, the arch/x86 maintainers,
	Russell King, linux-csky, David Airlie, Mel Gorman,
	open list:SYNOPSYS ARC ARCHITECTURE, linux-xtensa, Paul McKenney,
	intel-gfx, linuxppc-dev, Steven Rostedt, Jani Nikula,
	Rodrigo Vivi, Dietmar Eggemann, Linux ARM, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
	Vineet Gupta, LKML, Arnd Bergmann, Daniel Vetter, Paul Mackerras,
	Andrew Morton, Daniel Bristot de Oliveira, David S. Miller,
	Greentime Hu
In-Reply-To: <CAHk-=wgF-upZVpqJWK=TK7MS9H-Rp1ZxGfOG+dDW=JThtxAzVQ@mail.gmail.com>

On Sun, Sep 20, 2020 at 10:42 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, that looks much easier to explain. Ack.

Btw, one thing that might be a good idea at least initially is to add
a check for p->kmap_ctrl.idx being zero at fork, exit and maybe
syscall return time (but that last one may be too cumbersome to really
worry about).

The kmap_atomic() interface basically has a lot of coverage for leaked
as part of all the "might_sleep()" checks sprinkled around,  The new
kmap_temporary/local/whatever wouldn't have that kind of incidental
debug checking, and any leaked kmap indexes would be rather hard to
debug (much) later when they cause index overflows or whatever.

                Linus

^ permalink raw reply

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Linus Torvalds @ 2020-09-20 17:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, linux-sparc, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, the arch/x86 maintainers,
	Russell King, linux-csky, David Airlie, Mel Gorman,
	open list:SYNOPSYS ARC ARCHITECTURE, linux-xtensa, Paul McKenney,
	intel-gfx, linuxppc-dev, Steven Rostedt, Jani Nikula,
	Rodrigo Vivi, Dietmar Eggemann, Linux ARM, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
	Vineet Gupta, LKML, Arnd Bergmann, Daniel Vetter, Paul Mackerras,
	Andrew Morton, Daniel Bristot de Oliveira, David S. Miller,
	Greentime Hu
In-Reply-To: <87eemwcpnq.fsf@nanos.tec.linutronix.de>

On Sun, Sep 20, 2020 at 10:40 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I think the more obvious solution is to split the whole exercise:
>
>   schedule()
>      prepare_switch()
>         unmap()
>
>     switch_to()
>
>     finish_switch()
>         map()

Yeah, that looks much easier to explain. Ack.

               Linus

^ permalink raw reply

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Thomas Gleixner @ 2020-09-20 17:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, linux-sparc, Vincent Chen, Will Deacon, Ard Biesheuvel,
	linux-arch, Vincent Guittot, Herbert Xu, the arch/x86 maintainers,
	Russell King, linux-csky, David Airlie, Mel Gorman,
	open list:SYNOPSYS ARC ARCHITECTURE, linux-xtensa, Paul McKenney,
	intel-gfx, linuxppc-dev, Steven Rostedt, Jani Nikula,
	Rodrigo Vivi, Dietmar Eggemann, Linux ARM, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, Linux-MM,
	Vineet Gupta, LKML, Arnd Bergmann, Daniel Vetter, Paul Mackerras,
	Andrew Morton, Daniel Bristot de Oliveira, David S. Miller,
	Greentime Hu
In-Reply-To: <CAHk-=wgbmwsTOKs23Z=71EBTrULoeaH2U3TNqT2atHEWvkBKdw@mail.gmail.com>

On Sun, Sep 20 2020 at 09:57, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 1:49 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> Btw, looking at the stack code, Ithink your new implementation of it
> is a bit scary:
>
>    static inline int kmap_atomic_idx_push(void)
>    {
>   -       int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
>   +       int idx = current->kmap_ctrl.idx++;
>
> and now that 'current->kmap_ctrl.idx' is not atomic wrt
>
>  (a) NMI's (this may be ok, maybe we never do kmaps in NMIs, and with
> nesting I think it's fine anyway - the NMI will undo whatever it did)

Right. Nesting should be a non issue, but I don't think we have
kmap_atomic() in NMI context.

>  (b) the prev/next switch
>
> And that (b) part worries me. You do the kmap_switch_temporary() to
> switch the entries, but you do that *separately* from actually
> switching 'current' to the new value.
>
> So kmap_switch_temporary() looks safe, but I don't think it actually
> is. Because while it first unmaps the old entries and then remaps the
> new ones, an interrupt can come in, and at that point it matters what
> is *CURRENT*.
>
> And regardless of whether 'current' is 'prev' or 'next', that
> kmap_switch_temporary() loop may be doing the wrong thing, depending
> on which one had the deeper stack. The interrupt will be using
> whatever "current->kmap_ctrl.idx" is, but that might overwrite entries
> that are in the process of being restored (if current is still 'prev',
> but kmap_switch_temporary() is in the "restore @next's kmaps" pgase),
> or it might stomp on entries that have been pte_clear()'ed by the
> 'prev' thing.

Duh yes. Never thought about that.

> Alternatively, that process counter would need about a hundred lines
> of commentary about exactly why it's safe. Because I don't think it
> is.

I think the more obvious solution is to split the whole exercise:


  schedule()
     prepare_switch()
        unmap()

    switch_to()

    finish_switch()
        map()

That's safe because neither the unmap() nor the map() code changes
kmap_ctrl.idx. So if there is an interrupt coming in between unmap() and
switch_to() then a kmap_local() there will use the next entry. So we
could even do the unmap() with interrupts enabled (preemption disabled).
Same for the map() part.

To explain that we need only a few lines of commentry methinks.

Thanks,

        tglx


^ permalink raw reply

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
From: Thomas Gleixner @ 2020-09-20 17:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Juri Lelli, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Max Filippov,
	Guo Ren, sparclinux, Vincent Chen, Will Deacon, Ard Biesheuvel,
	open list:GENERIC INCLUDE/A..., Vincent Guittot, Herbert Xu,
	X86 ML, Russell King, linux-csky, David Airlie, Mel Gorman, arcml,
	linux-xtensa, Paul McKenney, intel-gfx, linuxppc-dev,
	Steven Rostedt, Linus Torvalds, Jani Nikula, Rodrigo Vivi,
	Dietmar Eggemann, Linux ARM, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Nick Hu, Linux-MM, Vineet Gupta, LKML,
	Arnd Bergmann, Daniel Vetter, Paul Mackerras, Andrew Morton,
	Daniel Bristot de Oliveira, David S. Miller, Greentime Hu
In-Reply-To: <20200920082353.GG438822@phenom.ffwll.local>

On Sun, Sep 20 2020 at 10:23, Daniel Vetter wrote:

> On Sun, Sep 20, 2020 at 08:23:26AM +0200, Thomas Gleixner wrote:
>> On Sat, Sep 19 2020 at 12:37, Daniel Vetter wrote:
>> > On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> I think it should be the case, but I want to double check: Will
>> >> copy_*_user be allowed within a kmap_temporary section? This would
>> >> allow us to ditch an absolute pile of slowpaths.
>> >
>> > (coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
>> > page fault you get a short read/write. This looks like it would remove
>> > the need to handle these in a slowpath, since page faults can now be
>> > served in this new kmap_temporary sections. But this sounds too good
>> > to be true, so I'm wondering what I'm missing.
>> 
>> In principle we could allow pagefaults, but not with the currently
>> proposed interface which can be called from any context. Obviously if
>> called from atomic context it can't handle user page faults.
>  
> Yeah that's clear, but does the implemention need to disable pagefaults
> unconditionally?

As I wrote in the other reply it's not required and the final interface
will neither disable preemption nor pagefaults (except for the atomic
wrapper around it).

Thanks,

        tglx

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox