All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
From: Paul E. McKenney @ 2019-06-20 16:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, frederic, tglx
In-Reply-To: <20190620121032.GU3436@hirez.programming.kicks-ass.net>

On Thu, Jun 20, 2019 at 02:10:32PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 19, 2019 at 11:19:03AM -0700, Paul E. McKenney wrote:
> > [ Hearing no objections and given no test failures in multiple weeks of
> >   rcutorture testing, I intend to submit this to the upcoming merge
> >   window.  Thoughts? ]
> 
> I can't remember seeing this before; but then, there's a ton of unread
> email in the inbox :-(

I have no idea whether or not you were on CC in the earlier thread.
But you are now!  ;-)

> > Some debugging code showed that the culprit was sched_cpu_dying().
> > It had irqs enabled after return from sched_tick_stop().  Which in turn
> > had irqs enabled after return from cancel_delayed_work_sync().  Which is a
> > wrapper around __cancel_work_timer().  Which can sleep in the case where
> > something else is concurrently trying to cancel the same delayed work,
> > and as Thomas Gleixner pointed out on IRC, sleeping is a decidedly bad
> > idea when you are invoked from take_cpu_down(), regardless of the state
> > you leave interrupts in upon return.
> > 
> > Code inspection located no reason why the delayed work absolutely
> > needed to be canceled from sched_tick_stop():  The work is not
> > bound to the outgoing CPU by design, given that the whole point is
> > to collect statistics without disturbing the outgoing CPU.
> > 
> > This commit therefore simply drops the cancel_delayed_work_sync() from
> > sched_tick_stop().  Instead, a new ->state field is added to the tick_work
> > structure so that the delayed-work handler function sched_tick_remote()
> > can avoid reposting itself.  A cpu_is_offline() check is also added to
> > sched_tick_remote() to avoid mucking with the state of an offlined CPU
> > (though it does appear safe to do so).  The sched_tick_start() and
> > sched_tick_stop() functions also update ->state, and sched_tick_start()
> > also schedules the delayed work if ->state indicates that it is not
> > already in flight.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 102dfcf0a29a..8409c83aa5fa 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3050,14 +3050,44 @@ void scheduler_tick(void)
> >  
> >  struct tick_work {
> >  	int			cpu;
> > +	int			state;
> >  	struct delayed_work	work;
> >  };
> > +// Values for ->state, see diagram below.
> > +#define TICK_SCHED_REMOTE_OFFLINE	0
> > +#define TICK_SCHED_REMOTE_RUNNING	1
> > +#define TICK_SCHED_REMOTE_OFFLINING	2
> 
> That seems a daft set of values; consider { RUNNING, OFFLINING, OFFLINE }
> and see below.

As in make it an enum?  I could do that.

> > +
> > +// State diagram for ->state:
> > +//
> > +//
> > +//      +----->OFFLINE--------------------------+
> > +//      |                                       |
> > +//      |                                       |
> > +//      |                                       | sched_tick_start()
> > +//      | sched_tick_remote()                   |
> > +//      |                                       |
> > +//      |                                       V
> > +//      |                        +---------->RUNNING
> > +//      |                        |              |
> > +//      |                        |              |
> > +//      |                        |              |
> > +//      |     sched_tick_start() |              | sched_tick_stop()
> > +//      |                        |              |
> > +//      |                        |              |
> > +//      |                        |              |
> > +//      +--------------------OFFLINING<---------+
> > +//
> > +//
> > +// Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
> > +// and sched_tick_start() are happy to leave the state in RUNNING.
> 
> Can we please stick to old skool C comments?

Your file, your rules!

> Also, I find it harder to read that needed, maybe a little something
> like so:
> 
> /*
>  *              OFFLINE
>  *               |   ^
>  *               |   | tick_remote()
>  *               |   |
>  *               +--OFFLINING
>  *               |   ^
>  *  tick_start() |   | tick_stop()
>  *               v   |
>  *              RUNNING
>  */

As in remove the leading "sched_" from the function names?  (The names
were already there, so I left them be.)

> >  static struct tick_work __percpu *tick_work_cpu;
> >  
> >  static void sched_tick_remote(struct work_struct *work)
> >  {
> >  	struct delayed_work *dwork = to_delayed_work(work);
> > +	int os;
> 
> this should go at the end, reverse xmas tree preference and all that.

Alphabetical by variable name for me, but your file, your rules!

> >  	struct tick_work *twork = container_of(dwork, struct tick_work, work);
> >  	int cpu = twork->cpu;
> >  	struct rq *rq = cpu_rq(cpu);
> > @@ -3077,7 +3107,7 @@ static void sched_tick_remote(struct work_struct *work)
> >  
> >  	rq_lock_irq(rq, &rf);
> >  	curr = rq->curr;
> > -	if (is_idle_task(curr))
> > +	if (is_idle_task(curr) || cpu_is_offline(cpu))
> >  		goto out_unlock;
> >  
> >  	update_rq_clock(rq);
> > @@ -3097,13 +3127,22 @@ static void sched_tick_remote(struct work_struct *work)
> >  	/*
> >  	 * Run the remote tick once per second (1Hz). This arbitrary
> >  	 * frequency is large enough to avoid overload but short enough
> > -	 * to keep scheduler internal stats reasonably up to date.
> > +	 * to keep scheduler internal stats reasonably up to date.  But
> > +	 * first update state to reflect hotplug activity if required.
> >  	 */
> > -	queue_delayed_work(system_unbound_wq, dwork, HZ);
> > +	do {
> > +		os = READ_ONCE(twork->state);
> > +		WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);
> > +		if (os == TICK_SCHED_REMOTE_RUNNING)
> > +			break;
> > +	} while (cmpxchg(&twork->state, os, TICK_SCHED_REMOTE_OFFLINE) != os);
> > +	if (os == TICK_SCHED_REMOTE_RUNNING)
> > +		queue_delayed_work(system_unbound_wq, dwork, HZ);
> >  }
> >  
> >  static void sched_tick_start(int cpu)
> >  {
> > +	int os;
> >  	struct tick_work *twork;
> >  
> >  	if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> > @@ -3112,14 +3151,23 @@ static void sched_tick_start(int cpu)
> >  	WARN_ON_ONCE(!tick_work_cpu);
> >  
> >  	twork = per_cpu_ptr(tick_work_cpu, cpu);
> > -	twork->cpu = cpu;
> > -	INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
> > -	queue_delayed_work(system_unbound_wq, &twork->work, HZ);
> > +	do {
> > +		os = READ_ONCE(twork->state);
> > +		if (WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING))
> > +			break;
> > +		// Either idle or offline for a short period
> > +	} while (cmpxchg(&twork->state, os, TICK_SCHED_REMOTE_RUNNING) != os);
> > +	if (os == TICK_SCHED_REMOTE_OFFLINE) {
> > +		twork->cpu = cpu;
> > +		INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
> > +		queue_delayed_work(system_unbound_wq, &twork->work, HZ);
> > +	}
> >  }
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> >  static void sched_tick_stop(int cpu)
> >  {
> > +	int os;
> >  	struct tick_work *twork;
> >  
> >  	if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> > @@ -3128,7 +3176,13 @@ static void sched_tick_stop(int cpu)
> >  	WARN_ON_ONCE(!tick_work_cpu);
> >  
> >  	twork = per_cpu_ptr(tick_work_cpu, cpu);
> > -	cancel_delayed_work_sync(&twork->work);
> > +	// There cannot be competing actions, but don't rely on stop_machine.
> > +	do {
> > +		os = READ_ONCE(twork->state);
> > +		WARN_ON_ONCE(os != TICK_SCHED_REMOTE_RUNNING);
> > +		// Either idle or offline for a short period
> > +	} while (cmpxchg(&twork->state, os, TICK_SCHED_REMOTE_OFFLINING) != os);
> > +	// Don't cancel, as this would mess up the state machine.
> >  }
> >  #endif /* CONFIG_HOTPLUG_CPU */
> 
> While not wrong, it seems overly complicated; can't we do something
> like:
> 
> tick:

As in sched_tick_remote(), right?

> 	state = atomic_read(->state);
> 	if (state) {

You sure you don't want "if (state != RUNNING)"?  But I guess you need
to understand that RUNNING==0 to understand the atomic_inc_not_zero().

> 		WARN_ON_ONCE(state != OFFLINING);
> 		if (atomic_inc_not_zero(->state))

This assumes that there cannot be concurrent calls to sched_tick_remote(),
otherwise, you can end up with ->state==3.  Which is a situation that
my version does a WARN_ON_ONCE() for, so I guess the only difference is
that mine would be guaranteed to complain and yours would complain with
high probability.  So fair enough!

> 			return;
> 	}
> 	queue_delayed_work();
> 
> 
> stop:
> 	/*
> 	 * This is hotplug; even without stop-machine, there cannot be
> 	 * concurrency on offlining specific CPUs.
> 	 */
> 	state = atomic_read(->state);

There cannot be a sched_tick_stop() or sched_tick_stop(), but there really
can be a sched_tick_remote() right here in the absence of stop-machine,
can't there?  Or am I missing something other than stop-machine that
prevents this?

Now, you could argue that concurrency is safe: Either sched_tick_remote()
sees RUNNING and doesn't touch the value, or it sees offlining and
sched_tick_stop() makes no further changes, but I am not sure that
this qualifies as simpler...

> 	WARN_ON_ONCE(state != RUNNING);
> 	atomic_set(->state, OFFLINING);

Another option would be to use atomic_xchg() as below instead of the
atomic_read()/atomic_set() pair.  Would that work for you?

> start:
> 	state = atomic_xchg(->state, RUNNING);
> 	WARN_ON_ONCE(state == RUNNING);
> 	if (state == OFFLINE) {
> 		// ...
> 		queue_delayed_work();
> 	}

This one looks to be an improvement on mine regardless of the other two.

								Thanx, Paul

^ permalink raw reply

* Re: [PATCH] bus: tegra-aconnect: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:02 UTC (permalink / raw)
  To: Sameer Pujar, thierry.reding; +Cc: linux-tegra, linux-kernel
In-Reply-To: <1561045919-15449-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:51, Sameer Pujar wrote:
> aconnect bus driver does not use pm-clk interface now and hence the
> dependency is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/bus/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
> index 1851112..4587ef2 100644
> --- a/drivers/bus/Kconfig
> +++ b/drivers/bus/Kconfig
> @@ -128,7 +128,6 @@ config TEGRA_ACONNECT
>  	tristate "Tegra ACONNECT Bus Driver"
>  	depends on ARCH_TEGRA_210_SOC
>  	depends on OF && PM
> -	select PM_CLK
>  	help
>  	  Driver for the Tegra ACONNECT bus which is used to interface with
>  	  the devices inside the Audio Processing Engine (APE) for Tegra210.

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH 2/3] drm/rockchip: Add optional support for CRTC gamma LUT
From: Ezequiel Garcia @ 2019-06-20 16:02 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: Ilia Mirkin, dri-devel, linux-rockchip, Sandy Huang, kernel,
	Sean Paul, Boris Brezillon, Douglas Anderson, Jacopo Mondi,
	Rob Herring, Mark Rutland, devicetree, LKML
In-Reply-To: <31857290.5uKucqQp3M@diego>

On Wed, 2019-06-19 at 00:18 +0200, Heiko Stübner wrote:
> Am Mittwoch, 19. Juni 2019, 00:09:57 CEST schrieb Ezequiel Garcia:
> > On Tue, 2019-06-18 at 17:47 -0400, Ilia Mirkin wrote:
> > > On Tue, Jun 18, 2019 at 5:43 PM Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > Add an optional CRTC gamma LUT support, and enable it on RK3288.
> > > > This is currently enabled via a separate address resource,
> > > > which needs to be specified in the devicetree.
> > > > 
> > > > The address resource is required because on some SoCs, such as
> > > > RK3288, the LUT address is after the MMU address, and the latter
> > > > is supported by a different driver. This prevents the DRM driver
> > > > from requesting an entire register space.
> > > > 
> > > > The current implementation works for RGB 10-bit tables, as that
> > > > is what seems to work on RK3288.
> > > > 
> > > > Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
> > > > ---
> > > > Changes from RFC:
> > > > * Request (an optional) address resource for the LUT.
> > > > * Drop support for RK3399, which doesn't seem to work
> > > >   out of the box and needs more research.
> > > > * Support pass-thru setting when GAMMA_LUT is NULL.
> > > > * Add a check for the gamma size, as suggested by Ilia.
> > > > * Move gamma setting to atomic_commit_tail, as pointed
> > > >   out by Jacopo/Laurent, is the correct way.
> > > > ---
> > > > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > > index 12ed5265a90b..5b6edbe2673f 100644
> > > > --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > > +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
> > > > +static void vop_crtc_gamma_set(struct vop *vop, struct drm_crtc *crtc,
> > > > +                              struct drm_crtc_state *old_state)
> > > > +{
> > > > +       int idle, ret, i;
> > > > +
> > > > +       spin_lock(&vop->reg_lock);
> > > > +       VOP_REG_SET(vop, common, dsp_lut_en, 0);
> > > > +       vop_cfg_done(vop);
> > > > +       spin_unlock(&vop->reg_lock);
> > > > +
> > > > +       ret = readx_poll_timeout(vop_dsp_lut_is_enable, vop,
> > > > +                          idle, !idle, 5, 30 * 1000);
> > > > +       if (ret)
> > > > +               return;
> > > > +
> > > > +       spin_lock(&vop->reg_lock);
> > > > +
> > > > +       if (crtc->state->gamma_lut) {
> > > > +               if (!old_state->gamma_lut || (crtc->state->gamma_lut->base.id !=
> > > > +                                             old_state->gamma_lut->base.id))
> > > > +                       vop_crtc_write_gamma_lut(vop, crtc);
> > > > +       } else {
> > > > +               for (i = 0; i < crtc->gamma_size; i++) {
> > > > +                       u32 word;
> > > > +
> > > > +                       word = (i << 20) | (i << 10) | i;
> > > > +                       writel(word, vop->lut_regs + i * 4);
> > > > +               }
> > > 
> > > Note - I'm not in any way familiar with the hardware, so take with a
> > > grain of salt --
> > > 
> > > Could you just leave dsp_lut_en turned off in this case?
> > > 
> > 
> > That was the first thing I tried :-)
> > 
> > It seems dsp_lut_en is not to enable the CRTC gamma LUT stage,
> > but to enable writing the gamma LUT to the internal RAM.
> 
> I guess that warants a code comment stating this, so we don't end
> up with well-meant cleanup patches in the future :-) .
> 

Sure, makes sense.

Any other feedback aside from this?

Thanks,
Ezequiel

^ permalink raw reply

* [Buildroot] [PATCH 0/3] Update Busybox to 1.31 and use mdev daemon mode
From: Titouan Christophe @ 2019-06-20 16:02 UTC (permalink / raw)
  To: buildroot
In-Reply-To: <87a7ecffta.fsf@dell.be.48ers.dk>

Hello all,

On 6/20/19 5:16 PM, Peter Korsgaard wrote:
>>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:
> 
>   > On Wed, 19 Jun 2019 18:42:54 +0200
>   > Titouan Christophe <titouan.christophe@railnova.eu> wrote:
> 
>   >> Titouan Christophe (3):
>   >> package/busybox: bump version to 1.31.0
>   >> package/busybox: convert S10mdev to the canonical init script format
>   >> package/busybox: run mdev in daemon mode
> 
>   > Series applied. Thanks!
> 
> Yes, thanks. We still need to force CONFIG_NET=y in the Linux kernel if
> mdev is enabled, right?
> 

Yes, since mdev is now launched in daemon mode CONFIG_NET is now needed. 
Didn't think of enforcing that in the kernel config, I'm preparing a 
patch right now.

Regards,

Titouan

^ permalink raw reply

* Re: [PATCH] bus: tegra-aconnect: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:02 UTC (permalink / raw)
  To: Sameer Pujar, thierry.reding; +Cc: linux-tegra, linux-kernel
In-Reply-To: <1561045919-15449-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:51, Sameer Pujar wrote:
> aconnect bus driver does not use pm-clk interface now and hence the
> dependency is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/bus/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
> index 1851112..4587ef2 100644
> --- a/drivers/bus/Kconfig
> +++ b/drivers/bus/Kconfig
> @@ -128,7 +128,6 @@ config TEGRA_ACONNECT
>  	tristate "Tegra ACONNECT Bus Driver"
>  	depends on ARCH_TEGRA_210_SOC
>  	depends on OF && PM
> -	select PM_CLK
>  	help
>  	  Driver for the Tegra ACONNECT bus which is used to interface with
>  	  the devices inside the Audio Processing Engine (APE) for Tegra210.

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH] dmaengine: tegra210-adma: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:02 UTC (permalink / raw)
  To: Sameer Pujar, vkoul, dan.j.williams
  Cc: thierry.reding, ldewangan, dmaengine, linux-tegra, linux-kernel
In-Reply-To: <1561046059-15821-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:54, Sameer Pujar wrote:
> Tegra ADMA does not use pm-clk interface now and hence the dependency
> is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/dma/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 703275c..ba660e2 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -584,7 +584,7 @@ config TEGRA20_APB_DMA
>  
>  config TEGRA210_ADMA
>  	tristate "NVIDIA Tegra210 ADMA support"
> -	depends on (ARCH_TEGRA_210_SOC || COMPILE_TEST) && PM_CLK
> +	depends on (ARCH_TEGRA_210_SOC || COMPILE_TEST)
>  	select DMA_ENGINE
>  	select DMA_VIRTUAL_CHANNELS
>  	help

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH] dmaengine: tegra210-adma: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:02 UTC (permalink / raw)
  To: Sameer Pujar, vkoul, dan.j.williams
  Cc: thierry.reding, ldewangan, dmaengine, linux-tegra, linux-kernel
In-Reply-To: <1561046059-15821-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:54, Sameer Pujar wrote:
> Tegra ADMA does not use pm-clk interface now and hence the dependency
> is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/dma/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 703275c..ba660e2 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -584,7 +584,7 @@ config TEGRA20_APB_DMA
>  
>  config TEGRA210_ADMA
>  	tristate "NVIDIA Tegra210 ADMA support"
> -	depends on (ARCH_TEGRA_210_SOC || COMPILE_TEST) && PM_CLK
> +	depends on (ARCH_TEGRA_210_SOC || COMPILE_TEST)
>  	select DMA_ENGINE
>  	select DMA_VIRTUAL_CHANNELS
>  	help

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH] irqchip/gic-pm: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:03 UTC (permalink / raw)
  To: Sameer Pujar, thierry.reding, marc.zyngier, jason, tglx
  Cc: linux-tegra, linux-kernel
In-Reply-To: <1561046268-16329-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:57, Sameer Pujar wrote:
> gic-pm driver does not use pm-clk interface now and hence the dependency
> is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/irqchip/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index 2d3b5a2..6346d6f 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -15,7 +15,6 @@ config ARM_GIC_PM
>  	bool
>  	depends on PM
>  	select ARM_GIC
> -	select PM_CLK
>  
>  config ARM_GIC_MAX_NR
>  	int
> 

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH] irqchip/gic-pm: remove PM_CLK dependency
From: Jon Hunter @ 2019-06-20 16:03 UTC (permalink / raw)
  To: Sameer Pujar, thierry.reding, marc.zyngier, jason, tglx
  Cc: linux-tegra, linux-kernel
In-Reply-To: <1561046268-16329-1-git-send-email-spujar@nvidia.com>


On 20/06/2019 16:57, Sameer Pujar wrote:
> gic-pm driver does not use pm-clk interface now and hence the dependency
> is removed from Kconfig.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  drivers/irqchip/Kconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index 2d3b5a2..6346d6f 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -15,7 +15,6 @@ config ARM_GIC_PM
>  	bool
>  	depends on PM
>  	select ARM_GIC
> -	select PM_CLK
>  
>  config ARM_GIC_MAX_NR
>  	int
> 

Thanks. We should probably populate the 'Fixes:' tag for this to show
which commit this fixes. Otherwise ...

Acked-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [Qemu-devel] [PATCH 05/12] hbitmap: enable merging across granularities
From: Max Reitz @ 2019-06-20 15:47 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block
  Cc: Kevin Wolf, Fam Zheng, vsementsov, Wen Congyang, Xie Changlong,
	Markus Armbruster
In-Reply-To: <20190620010356.19164-6-jsnow@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 1701 bytes --]

On 20.06.19 03:03, John Snow wrote:
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  util/hbitmap.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/util/hbitmap.c b/util/hbitmap.c
> index 45d1725daf..0d6724b7bc 100644
> --- a/util/hbitmap.c
> +++ b/util/hbitmap.c
> @@ -777,7 +777,17 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size)
>  
>  bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b)
>  {
> -    return (a->size == b->size) && (a->granularity == b->granularity);
> +    return (a->size == b->size);
> +}
> +
> +static void hbitmap_sparse_merge(HBitmap *dst, const HBitmap *src)
> +{
> +    uint64_t offset = 0;
> +    uint64_t count = src->orig_size;
> +
> +    while (hbitmap_next_dirty_area(src, &offset, &count)) {
> +        hbitmap_set(dst, offset, count);
> +    }
>  }
>  
>  /**
> @@ -804,6 +814,16 @@ bool hbitmap_merge(const HBitmap *a, const HBitmap *b, HBitmap *result)
>          return true;
>      }
>  
> +    if (a->size != b->size) {

Don’t you mean s/size/granularity/?

Right now, this is dead code, which leads me to asking for a test.
(Well, no, I would’ve asked anyway.)

Max

> +        if (a != result) {
> +            hbitmap_sparse_merge(result, a);
> +        }
> +        if (b != result) {
> +            hbitmap_sparse_merge(result, b);
> +        }
> +        return true;
> +    }
> +
>      /* This merge is O(size), as BITS_PER_LONG and HBITMAP_LEVELS are constant.
>       * It may be possible to improve running times for sparsely populated maps
>       * by using hbitmap_iter_next, but this is suboptimal for dense maps.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns
From: Nathan Chancellor @ 2019-06-20 16:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, Dan Williams, Andrew Morton, linuxppc-dev,
	linux-acpi, linux-mm, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Rafael J. Wysocki, Len Brown,
	Greg Kroah-Hartman, Rashmica Gupta, Pavel Tatashin,
	Anshuman Khandual, Michael Neuling, Thomas Gleixner,
	Oscar Salvador, Michal Hocko, Wei Yang, Juergen Gross, Qian Cai,
	Arun KS
In-Reply-To: <20190620103520.23481-5-david@redhat.com>

On Thu, Jun 20, 2019 at 12:35:18PM +0200, David Hildenbrand wrote:
> walk_memory_range() was once used to iterate over sections. Now, it
> iterates over memory blocks. Rename the function, fixup the
> documentation. Also, pass start+size instead of PFNs, which is what most
> callers already have at hand. (we'll rework link_mem_sections() most
> probably soon)
> 
> Follow-up patches wil rework, simplify, and move walk_memory_blocks() to
> drivers/base/memory.c.
> 
> Note: walk_memory_blocks() only works correctly right now if the
> start_pfn is aligned to a section start. This is the case right now,
> but we'll generalize the function in a follow up patch so the semantics
> match the documentation.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 22 ++++++++++-----------
>  drivers/acpi/acpi_memhotplug.c            | 19 ++++--------------
>  drivers/base/node.c                       |  5 +++--
>  include/linux/memory_hotplug.h            |  2 +-
>  mm/memory_hotplug.c                       | 24 ++++++++++++-----------
>  5 files changed, 32 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 5e53c1392d3b..8c82c041afe6 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -70,23 +70,24 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
>  /* called with device_hotplug_lock held */
>  static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>  {
> +	const unsigned long start = PFN_PHYS(start_pfn);
> +	const unsigned long size = PFN_PHYS(nr_pages);
>  	u64 end_pfn = start_pfn + nr_pages - 1;

This variable should be removed:

arch/powerpc/platforms/powernv/memtrace.c:75:6: warning: unused variable 'end_pfn' [-Wunused-variable]
        u64 end_pfn = start_pfn + nr_pages - 1;
            ^
1 warning generated.

https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/209576737

Cheers,
Nathan

>  
> -	if (walk_memory_range(start_pfn, end_pfn, NULL,
> -	    check_memblock_online))
> +	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
>  		return false;
>  
> -	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
> -			  change_memblock_state);
> +	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
> +			   change_memblock_state);
>  
>  	if (offline_pages(start_pfn, nr_pages)) {
> -		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
> -				  change_memblock_state);
> +		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
> +				   change_memblock_state);
>  		return false;
>  	}
>  
> -	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
> -			  change_memblock_state);
> +	walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
> +			   change_memblock_state);
>  
>  
>  	return true;
> @@ -242,9 +243,8 @@ static int memtrace_online(void)
>  		 */
>  		if (!memhp_auto_online) {
>  			lock_device_hotplug();
> -			walk_memory_range(PFN_DOWN(ent->start),
> -					  PFN_UP(ent->start + ent->size - 1),
> -					  NULL, online_mem_block);
> +			walk_memory_blocks(ent->start, ent->size, NULL,
> +					   online_mem_block);
>  			unlock_device_hotplug();
>  		}
>  
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index db013dc21c02..e294f44a7850 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -155,16 +155,6 @@ static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
>  	return 0;
>  }
>  
> -static unsigned long acpi_meminfo_start_pfn(struct acpi_memory_info *info)
> -{
> -	return PFN_DOWN(info->start_addr);
> -}
> -
> -static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
> -{
> -	return PFN_UP(info->start_addr + info->length-1);
> -}
> -
>  static int acpi_bind_memblk(struct memory_block *mem, void *arg)
>  {
>  	return acpi_bind_one(&mem->dev, arg);
> @@ -173,9 +163,8 @@ static int acpi_bind_memblk(struct memory_block *mem, void *arg)
>  static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
>  				   struct acpi_device *adev)
>  {
> -	return walk_memory_range(acpi_meminfo_start_pfn(info),
> -				 acpi_meminfo_end_pfn(info), adev,
> -				 acpi_bind_memblk);
> +	return walk_memory_blocks(info->start_addr, info->length, adev,
> +				  acpi_bind_memblk);
>  }
>  
>  static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
> @@ -186,8 +175,8 @@ static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
>  
>  static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
>  {
> -	walk_memory_range(acpi_meminfo_start_pfn(info),
> -			  acpi_meminfo_end_pfn(info), NULL, acpi_unbind_memblk);
> +	walk_memory_blocks(info->start_addr, info->length, NULL,
> +			   acpi_unbind_memblk);
>  }
>  
>  static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index e6364e3e3e31..d8c02e65df68 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -833,8 +833,9 @@ void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  
>  int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
>  {
> -	return walk_memory_range(start_pfn, end_pfn, (void *)&nid,
> -					register_mem_sect_under_node);
> +	return walk_memory_blocks(PFN_PHYS(start_pfn),
> +				  PFN_PHYS(end_pfn - start_pfn), (void *)&nid,
> +				  register_mem_sect_under_node);
>  }
>  
>  #ifdef CONFIG_HUGETLBFS
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 79e0add6a597..d9fffc34949f 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -340,7 +340,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  extern void __ref free_area_init_core_hotplug(int nid);
> -extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> +extern int walk_memory_blocks(unsigned long start, unsigned long size,
>  		void *arg, int (*func)(struct memory_block *, void *));
>  extern int __add_memory(int nid, u64 start, u64 size);
>  extern int add_memory(int nid, u64 start, u64 size);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index a88c5f334e5a..122a7d31efdd 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1126,8 +1126,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  
>  	/* online pages if requested */
>  	if (memhp_auto_online)
> -		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
> -				  NULL, online_memory_block);
> +		walk_memory_blocks(start, size, NULL, online_memory_block);
>  
>  	return ret;
>  error:
> @@ -1665,20 +1664,24 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /**
> - * walk_memory_range - walks through all mem sections in [start_pfn, end_pfn)
> - * @start_pfn: start pfn of the memory range
> - * @end_pfn: end pfn of the memory range
> + * walk_memory_blocks - walk through all present memory blocks overlapped
> + *			by the range [start, start + size)
> + *
> + * @start: start address of the memory range
> + * @size: size of the memory range
>   * @arg: argument passed to func
> - * @func: callback for each memory section walked
> + * @func: callback for each memory block walked
>   *
> - * This function walks through all present mem sections in range
> - * [start_pfn, end_pfn) and call func on each mem section.
> + * This function walks through all present memory blocks overlapped by the
> + * range [start, start + size), calling func on each memory block.
>   *
>   * Returns the return value of func.
>   */
> -int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> +int walk_memory_blocks(unsigned long start, unsigned long size,
>  		void *arg, int (*func)(struct memory_block *, void *))
>  {
> +	const unsigned long start_pfn = PFN_DOWN(start);
> +	const unsigned long end_pfn = PFN_UP(start + size - 1);
>  	struct memory_block *mem = NULL;
>  	struct mem_section *section;
>  	unsigned long pfn, section_nr;
> @@ -1824,8 +1827,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	 * whether all memory blocks in question are offline and return error
>  	 * if this is not the case.
>  	 */
> -	rc = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> -			       check_memblock_offlined_cb);
> +	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
>  	if (rc)
>  		goto done;
>  
> -- 
> 2.21.0
> 

^ permalink raw reply

* Hello
From: John M. @ 2019-06-20 16:05 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

Hello...Did you get my pervious message ?

John

^ permalink raw reply

* [PATCH v2 0/8] staging: erofs: decompression inplace approach
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)


This is patch v2 of erofs decompression inplace approach, no major
issues observed after v1 preliminarily applied to our products, and
changes from v1 are minor, so I drop "RFC" tag from this version.

See the bottom lines which are taken from RFC PATCH v1 and describe
the principle of these technologies.

The series is based on the latest staging-next since all dependencies
have already been merged.

changelog from v1:
 - keep Z_EROFS_NR_INLINE_PAGEVECS in unzip_vle.h after switching to
   new decompression backend;
 - add some DBG_BUGONs in new decompression backend to observe
   potential issues;
 - minor code cleanup.

8<--------

Hi,

After working on for more than half a year, the detail of erofs decompression
inplace is almost determined and ready for linux-next.

Currently, inplace IO is used if the whole compressed data is used
in order to reduce compressed pages extra memory overhead and an extra
memcpy (the only one memcpy) will be used for each inplace IO since
temporary buffer is needed to keep decompressing safe for inplace IO.

However, most of lz-based decompression algorithms support decompression
inplace by their algorithm designs, such as LZ4, LZO, etc.

If iend - oend margin is large enough, decompression inplace can be done
in the same buffer safely, as illustrated below:

         start of compressed logical extent
           |                          end of this logical extent
           |                           |
     ______v___________________________v________
... |  page 6  |  page 7  |  page 8  |  page 9  | ...
    |__________|__________|__________|__________|
           .                         ^ .        ^
           .                         |compressed|
           .                         |   data   |
           .                           .        .
           |<          dstsize        >|<margin>|
                                       oend     iend
           op                        ip

Fixed-size output compression can make the full use of this feature
to reduce memory overhead and avoid extra memcpy compared with fixed-size
input compression since iend is strictly not less than oend for fixed-size
output compression with inplace IO to last pages.

In addition, erofs compression indexes have been improved as well by
introducing compacted compression indexes.

These two techniques all benefit sequential read (on x86_64, 710.8MiB/s ->
755.4MiB/s; on Kirin980, 725MiB/s -> 812MiB/s) therefore erofs could have
similar sequential read performance against ext4 in a larger CR range
on high-spend SSD / NVMe devices as well.

However, note that it is _cpu vs storage device_ tradeoff, there is no
absolute performance conclusion for all on-market combinations.

Test images:
 name                       size                 CR
 enwik9                     1000000000           1.00
 enwik9_4k.squashfs.img      621211648           1.61
 enwik9_4k.erofs.img         558133248           1.79
 enwik9_8k.squashfs.img      556191744           1.80
 enwik9_16k.squashfs.img     502661120           1.99
 enwik9_128k.squashfs.img    398204928           2.51

Test Environment:
CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz (4 cores, 8 threads)
DDR: 8G
SSD: INTEL SSDPEKKF360G7H
Kernel: Linux 5.2-rc3+ (with lz4-1.8.3 algorithm)

Test configuration:
squashfs:
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
erofs:
CONFIG_EROFS_FS_USE_VM_MAP_RAM=y
CONFIG_EROFS_FS_ZIP=y
CONFIG_EROFS_FS_CLUSTER_PAGE_LIMIT=1
CONFIG_EROFS_FS_ZIP_CACHE_BIPOLAR=y

with intel_pstate=disable,
     8 cpus on at 1801000 scaling_{min,max}_freq,
     userspace scaling_governor

Sequential read results (MiB/s):
                           1      2      3      4      5      avg
 enwik9_4k.ext4.img        767    770    738    726    724    745
 enwik9_4k.erofs.img       756    745    770    746    760    755.4
 enwik9_4k.squashfs.img    90.3   83.0   94.3   90.7   92.6   90.18
 enwik9_8k.squashfs.img    111    108    110    108    110    109.4
 enwik9_16k.squashfs.img   158    163    146    165    174    161.2
 enwik9_128k.squashfs.img  324    314    262    262    296    291.6


Thanks,
Gao Xiang

Gao Xiang (8):
  staging: erofs: add compacted ondisk compression indexes
  staging: erofs: add compacted compression indexes support
  staging: erofs: move per-CPU buffers implementation to utils.c
  staging: erofs: move stagingpage operations to compress.h
  staging: erofs: introduce generic decompression backend
  staging: erofs: introduce LZ4 decompression inplace
  staging: erofs: switch to new decompression backend
  staging: erofs: integrate decompression inplace

 drivers/staging/erofs/Makefile        |   2 +-
 drivers/staging/erofs/compress.h      |  62 ++++
 drivers/staging/erofs/data.c          |   4 +-
 drivers/staging/erofs/decompressor.c  | 322 ++++++++++++++++++
 drivers/staging/erofs/erofs_fs.h      |  60 +++-
 drivers/staging/erofs/inode.c         |  12 +-
 drivers/staging/erofs/internal.h      |  52 ++-
 drivers/staging/erofs/unzip_vle.c     | 368 ++------------------
 drivers/staging/erofs/unzip_vle.h     |  38 +--
 drivers/staging/erofs/unzip_vle_lz4.c | 229 -------------
 drivers/staging/erofs/utils.c         |  12 +
 drivers/staging/erofs/zmap.c          | 463 ++++++++++++++++++++++++++
 12 files changed, 993 insertions(+), 631 deletions(-)
 create mode 100644 drivers/staging/erofs/compress.h
 create mode 100644 drivers/staging/erofs/decompressor.c
 delete mode 100644 drivers/staging/erofs/unzip_vle_lz4.c
 create mode 100644 drivers/staging/erofs/zmap.c

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 1/8] staging: erofs: add compacted ondisk compression indexes
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch introduces new compacted compression indexes.

In contract to legacy compression indexes that
   each 4k logical cluster has an 8-byte index,
compacted ondisk compression indexes will have
   amortized 2 bytes for each 4k logical cluster (compacted 2B)
   amortized 4 bytes for each 4k logical cluster (compacted 4B)

In detail, several continuous clusters will be encoded in
a compacted pack with cluster types, offsets, and one blkaddr
at the end of the pack to leave 4-byte margin for better
decoding performance, as illustrated below:
   _____________________________________________
  |___ at _____ encoded bits __________|_ blkaddr _|
  0       .                                     amortized * vcnt
  .          .
  .             .                   amortized * vcnt - 4
  .                .
  .___________________.
  |_type_|_clusterofs_|

Note that compacted 2 / 4B should be aligned with 32 / 8 bytes
in order to avoid each pack crossing page boundary.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/data.c      |  4 +--
 drivers/staging/erofs/erofs_fs.h  | 57 +++++++++++++++++++++++++------
 drivers/staging/erofs/inode.c     |  5 +--
 drivers/staging/erofs/internal.h  | 11 ++----
 drivers/staging/erofs/unzip_vle.c |  8 ++---
 5 files changed, 56 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c
index 746685f90564..cc31c3e5984c 100644
--- a/drivers/staging/erofs/data.c
+++ b/drivers/staging/erofs/data.c
@@ -124,7 +124,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	trace_erofs_map_blocks_flatmode_enter(inode, map, flags);
 
 	nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
-	lastblk = nblocks - is_inode_layout_inline(inode);
+	lastblk = nblocks - is_inode_flat_inline(inode);
 
 	if (unlikely(offset >= inode->i_size)) {
 		/* leave out-of-bound access unmapped */
@@ -139,7 +139,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	if (offset < blknr_to_addr(lastblk)) {
 		map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la;
 		map->m_plen = blknr_to_addr(lastblk) - offset;
-	} else if (is_inode_layout_inline(inode)) {
+	} else if (is_inode_flat_inline(inode)) {
 		/* 2 - inode inline B: inode, [xattrs], inline last blk... */
 		struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
 
diff --git a/drivers/staging/erofs/erofs_fs.h b/drivers/staging/erofs/erofs_fs.h
index 8ddb2b3e7d39..a05139f1df60 100644
--- a/drivers/staging/erofs/erofs_fs.h
+++ b/drivers/staging/erofs/erofs_fs.h
@@ -49,19 +49,29 @@ struct erofs_super_block {
  * erofs inode data mapping:
  * 0 - inode plain without inline data A:
  * inode, [xattrs], ... | ... | no-holed data
- * 1 - inode VLE compression B:
+ * 1 - inode VLE compression B (legacy):
  * inode, [xattrs], extents ... | ...
  * 2 - inode plain with inline data C:
  * inode, [xattrs], last_inline_data, ... | ... | no-holed data
- * 3~7 - reserved
+ * 3 - inode compression D:
+ * inode, [xattrs], map_header, extents ... | ...
+ * 4~7 - reserved
  */
 enum {
-	EROFS_INODE_LAYOUT_PLAIN,
-	EROFS_INODE_LAYOUT_COMPRESSION,
-	EROFS_INODE_LAYOUT_INLINE,
+	EROFS_INODE_FLAT_PLAIN,
+	EROFS_INODE_FLAT_COMPRESSION_LEGACY,
+	EROFS_INODE_FLAT_INLINE,
+	EROFS_INODE_FLAT_COMPRESSION,
 	EROFS_INODE_LAYOUT_MAX
 };
 
+static bool erofs_inode_is_data_compressed(unsigned int datamode)
+{
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION)
+		return true;
+	return datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY;
+}
+
 /* bit definitions of inode i_advise */
 #define EROFS_I_VERSION_BITS            1
 #define EROFS_I_DATA_MAPPING_BITS       3
@@ -176,11 +186,37 @@ struct erofs_xattr_entry {
 	sizeof(struct erofs_xattr_entry) + \
 	(entry)->e_name_len + le16_to_cpu((entry)->e_value_size))
 
-/* have to be aligned with 8 bytes on disk */
-struct erofs_extent_header {
-	__le32 eh_checksum;
-	__le32 eh_reserved[3];
-} __packed;
+/* available compression algorithm types */
+enum {
+	Z_EROFS_COMPRESSION_LZ4,
+	Z_EROFS_COMPRESSION_MAX
+};
+
+/*
+ * bit 0 : COMPACTED_2B indexes (0 - off; 1 - on)
+ *  e.g. for 4k logical cluster size,      4B        if compacted 2B is off;
+ *                                  (4B) + 2B + (4B) if compacted 2B is on.
+ */
+#define Z_EROFS_ADVISE_COMPACTED_2B_BIT         0
+
+#define Z_EROFS_ADVISE_COMPACTED_2B     (1 << Z_EROFS_ADVISE_COMPACTED_2B_BIT)
+
+struct z_erofs_map_header {
+	__le32	h_reserved1;
+	__le16	h_advise;
+	/*
+	 * bit 0-3 : algorithm type of head 1 (logical cluster type 01);
+	 * bit 4-7 : algorithm type of head 2 (logical cluster type 11).
+	 */
+	__u8	h_algorithmtype;
+	/*
+	 * bit 0-2 : logical cluster bits - 12, e.g. 0 for 4096;
+	 * bit 3-4 : (physical - logical) cluster bits of head 1:
+	 *       For example, if logical clustersize = 4096, 1 for 8192.
+	 * bit 5-7 : (physical - logical) cluster bits of head 2.
+	 */
+	__u8	h_clusterbits;
+};
 
 /*
  * Z_EROFS Variable-sized Logical Extent cluster type:
@@ -270,7 +306,6 @@ static inline void erofs_check_ondisk_layout_definitions(void)
 	BUILD_BUG_ON(sizeof(struct erofs_inode_v2) != 64);
 	BUILD_BUG_ON(sizeof(struct erofs_xattr_ibody_header) != 12);
 	BUILD_BUG_ON(sizeof(struct erofs_xattr_entry) != 4);
-	BUILD_BUG_ON(sizeof(struct erofs_extent_header) != 16);
 	BUILD_BUG_ON(sizeof(struct z_erofs_vle_decompressed_index) != 8);
 	BUILD_BUG_ON(sizeof(struct erofs_dirent) != 12);
 
diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c
index e51348f7e838..3539290b8e45 100644
--- a/drivers/staging/erofs/inode.c
+++ b/drivers/staging/erofs/inode.c
@@ -127,12 +127,9 @@ static int fill_inline_data(struct inode *inode, void *data,
 {
 	struct erofs_vnode *vi = EROFS_V(inode);
 	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	const int mode = vi->datamode;
-
-	DBG_BUGON(mode >= EROFS_INODE_LAYOUT_MAX);
 
 	/* should be inode inline C */
-	if (mode != EROFS_INODE_LAYOUT_INLINE)
+	if (!is_inode_flat_inline(inode))
 		return 0;
 
 	/* fast symlink (following ext4) */
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index 1666cceecb3c..c851d0be6cf6 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -382,19 +382,14 @@ static inline unsigned long inode_datablocks(struct inode *inode)
 	return DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
 }
 
-static inline bool is_inode_layout_plain(struct inode *inode)
-{
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_PLAIN;
-}
-
 static inline bool is_inode_layout_compression(struct inode *inode)
 {
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_COMPRESSION;
+	return erofs_inode_is_data_compressed(EROFS_V(inode)->datamode);
 }
 
-static inline bool is_inode_layout_inline(struct inode *inode)
+static inline bool is_inode_flat_inline(struct inode *inode)
 {
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_INLINE;
+	return EROFS_V(inode)->datamode == EROFS_INODE_FLAT_INLINE;
 }
 
 extern const struct super_operations erofs_sops;
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index f3d0d2c03939..0db9bc50f67c 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -1643,8 +1643,8 @@ vle_extent_blkaddr(struct inode *inode, pgoff_t index)
 	struct erofs_vnode *vi = EROFS_V(inode);
 
 	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-		vi->xattr_isize) + sizeof(struct erofs_extent_header) +
-		index * sizeof(struct z_erofs_vle_decompressed_index);
+						    vi->xattr_isize) +
+		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
 
 	return erofs_blknr(iloc(sbi, vi->nid) + ofs);
 }
@@ -1656,8 +1656,8 @@ vle_extent_blkoff(struct inode *inode, pgoff_t index)
 	struct erofs_vnode *vi = EROFS_V(inode);
 
 	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-		vi->xattr_isize) + sizeof(struct erofs_extent_header) +
-		index * sizeof(struct z_erofs_vle_decompressed_index);
+						    vi->xattr_isize) +
+		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
 
 	return erofs_blkoff(iloc(sbi, vi->nid) + ofs);
 }
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 2/8] staging: erofs: add compacted compression indexes support
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch aims at compacted compression indexes:
 1) cleanup z_erofs_map_blocks_iter and move into zmap.c;
 2) add compacted 4/2B decoding support.

On kirin980 platform, sequential read is increased about
6% (725MiB/s -> 770MiB/s) on enwik9 dataset if compacted 2B
feature is enabled.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/Makefile    |   2 +-
 drivers/staging/erofs/inode.c     |   7 +-
 drivers/staging/erofs/internal.h  |  18 +-
 drivers/staging/erofs/unzip_vle.c | 286 ------------------
 drivers/staging/erofs/zmap.c      | 462 ++++++++++++++++++++++++++++++
 5 files changed, 480 insertions(+), 295 deletions(-)
 create mode 100644 drivers/staging/erofs/zmap.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index a34248a2a16a..84b412c7a991 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o
 
diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c
index 3539290b8e45..1d467322bacf 100644
--- a/drivers/staging/erofs/inode.c
+++ b/drivers/staging/erofs/inode.c
@@ -210,12 +210,7 @@ static int fill_inode(struct inode *inode, int isdir)
 		}
 
 		if (is_inode_layout_compression(inode)) {
-#ifdef CONFIG_EROFS_FS_ZIP
-			inode->i_mapping->a_ops =
-				&z_erofs_vle_normalaccess_aops;
-#else
-			err = -ENOTSUPP;
-#endif
+			err = z_erofs_fill_inode(inode);
 			goto out_unlock;
 		}
 
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index c851d0be6cf6..f3063b13c117 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -339,9 +339,11 @@ static inline erofs_off_t iloc(struct erofs_sb_info *sbi, erofs_nid_t nid)
 
 /* atomic flag definitions */
 #define EROFS_V_EA_INITED_BIT	0
+#define EROFS_V_Z_INITED_BIT	1
 
 /* bitlock definitions (arranged in reverse order) */
 #define EROFS_V_BL_XATTR_BIT	(BITS_PER_LONG - 1)
+#define EROFS_V_BL_Z_BIT	(BITS_PER_LONG - 2)
 
 struct erofs_vnode {
 	erofs_nid_t nid;
@@ -356,8 +358,17 @@ struct erofs_vnode {
 	unsigned xattr_shared_count;
 	unsigned *xattr_shared_xattrs;
 
-	erofs_blk_t raw_blkaddr;
-
+	union {
+		erofs_blk_t raw_blkaddr;
+#ifdef CONFIG_EROFS_FS_ZIP
+		struct {
+			unsigned short z_advise;
+			unsigned char  z_algorithmtype[2];
+			unsigned char  z_logical_clusterbits;
+			unsigned char  z_physical_clusterbits[2];
+		};
+#endif
+	};
 	/* the corresponding vfs inode */
 	struct inode vfs_inode;
 };
@@ -447,11 +458,14 @@ struct erofs_map_blocks {
 /* Flags used by erofs_map_blocks() */
 #define EROFS_GET_BLOCKS_RAW    0x0001
 
+/* zmap.c */
 #ifdef CONFIG_EROFS_FS_ZIP
+int z_erofs_fill_inode(struct inode *inode);
 int z_erofs_map_blocks_iter(struct inode *inode,
 			    struct erofs_map_blocks *map,
 			    int flags);
 #else
+static inline int z_erofs_fill_inode(struct inode *inode) { return -ENOTSUPP; }
 static inline int z_erofs_map_blocks_iter(struct inode *inode,
 					  struct erofs_map_blocks *map,
 					  int flags)
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 0db9bc50f67c..8aea938172df 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -1600,289 +1600,3 @@ const struct address_space_operations z_erofs_vle_normalaccess_aops = {
 	.readpages = z_erofs_vle_normalaccess_readpages,
 };
 
-/*
- * Variable-sized Logical Extent (Fixed Physical Cluster) Compression Mode
- * ---
- * VLE compression mode attempts to compress a number of logical data into
- * a physical cluster with a fixed size.
- * VLE compression mode uses "struct z_erofs_vle_decompressed_index".
- */
-#define __vle_cluster_advise(x, bit, bits) \
-	((le16_to_cpu(x) >> (bit)) & ((1 << (bits)) - 1))
-
-#define __vle_cluster_type(advise) __vle_cluster_advise(advise, \
-	Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT, Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS)
-
-#define vle_cluster_type(di)	\
-	__vle_cluster_type((di)->di_advise)
-
-static int
-vle_decompressed_index_clusterofs(unsigned int *clusterofs,
-				  unsigned int clustersize,
-				  struct z_erofs_vle_decompressed_index *di)
-{
-	switch (vle_cluster_type(di)) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		*clusterofs = clustersize;
-		break;
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		*clusterofs = le16_to_cpu(di->di_clusterofs);
-		break;
-	default:
-		DBG_BUGON(1);
-		return -EIO;
-	}
-	return 0;
-}
-
-static inline erofs_blk_t
-vle_extent_blkaddr(struct inode *inode, pgoff_t index)
-{
-	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	struct erofs_vnode *vi = EROFS_V(inode);
-
-	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-						    vi->xattr_isize) +
-		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
-
-	return erofs_blknr(iloc(sbi, vi->nid) + ofs);
-}
-
-static inline unsigned int
-vle_extent_blkoff(struct inode *inode, pgoff_t index)
-{
-	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	struct erofs_vnode *vi = EROFS_V(inode);
-
-	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-						    vi->xattr_isize) +
-		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
-
-	return erofs_blkoff(iloc(sbi, vi->nid) + ofs);
-}
-
-struct vle_map_blocks_iter_ctx {
-	struct inode *inode;
-	struct super_block *sb;
-	unsigned int clusterbits;
-
-	struct page **mpage_ret;
-	void **kaddr_ret;
-};
-
-static int
-vle_get_logical_extent_head(const struct vle_map_blocks_iter_ctx *ctx,
-			    unsigned int lcn,	/* logical cluster number */
-			    unsigned long long *ofs,
-			    erofs_blk_t *pblk,
-			    unsigned int *flags)
-{
-	const unsigned int clustersize = 1 << ctx->clusterbits;
-	const erofs_blk_t mblk = vle_extent_blkaddr(ctx->inode, lcn);
-	struct page *mpage = *ctx->mpage_ret;	/* extent metapage */
-
-	struct z_erofs_vle_decompressed_index *di;
-	unsigned int cluster_type, delta0;
-
-	if (mpage->index != mblk) {
-		kunmap_atomic(*ctx->kaddr_ret);
-		unlock_page(mpage);
-		put_page(mpage);
-
-		mpage = erofs_get_meta_page(ctx->sb, mblk, false);
-		if (IS_ERR(mpage)) {
-			*ctx->mpage_ret = NULL;
-			return PTR_ERR(mpage);
-		}
-		*ctx->mpage_ret = mpage;
-		*ctx->kaddr_ret = kmap_atomic(mpage);
-	}
-
-	di = *ctx->kaddr_ret + vle_extent_blkoff(ctx->inode, lcn);
-
-	cluster_type = vle_cluster_type(di);
-	switch (cluster_type) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		delta0 = le16_to_cpu(di->di_u.delta[0]);
-		if (unlikely(!delta0 || delta0 > lcn)) {
-			errln("invalid NONHEAD dl0 %u at lcn %u of nid %llu",
-			      delta0, lcn, EROFS_V(ctx->inode)->nid);
-			DBG_BUGON(1);
-			return -EIO;
-		}
-		return vle_get_logical_extent_head(ctx,
-			lcn - delta0, ofs, pblk, flags);
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		*flags ^= EROFS_MAP_ZIPPED;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		/* clustersize should be a power of two */
-		*ofs = ((u64)lcn << ctx->clusterbits) +
-			(le16_to_cpu(di->di_clusterofs) & (clustersize - 1));
-		*pblk = le32_to_cpu(di->di_u.blkaddr);
-		break;
-	default:
-		errln("unknown cluster type %u at lcn %u of nid %llu",
-		      cluster_type, lcn, EROFS_V(ctx->inode)->nid);
-		DBG_BUGON(1);
-		return -EIO;
-	}
-	return 0;
-}
-
-int z_erofs_map_blocks_iter(struct inode *inode,
-			    struct erofs_map_blocks *map,
-			    int flags)
-{
-	void *kaddr;
-	const struct vle_map_blocks_iter_ctx ctx = {
-		.inode = inode,
-		.sb = inode->i_sb,
-		.clusterbits = EROFS_I_SB(inode)->clusterbits,
-		.mpage_ret = &map->mpage,
-		.kaddr_ret = &kaddr
-	};
-	const unsigned int clustersize = 1 << ctx.clusterbits;
-	/* if both m_(l,p)len are 0, regularize l_lblk, l_lofs, etc... */
-	const bool initial = !map->m_llen;
-
-	/* logicial extent (start, end) offset */
-	unsigned long long ofs, end;
-	unsigned int lcn;
-	u32 ofs_rem;
-
-	/* initialize `pblk' to keep gcc from printing foolish warnings */
-	erofs_blk_t mblk, pblk = 0;
-	struct page *mpage = map->mpage;
-	struct z_erofs_vle_decompressed_index *di;
-	unsigned int cluster_type, logical_cluster_ofs;
-	int err = 0;
-
-	trace_z_erofs_map_blocks_iter_enter(inode, map, flags);
-
-	/* when trying to read beyond EOF, leave it unmapped */
-	if (unlikely(map->m_la >= inode->i_size)) {
-		DBG_BUGON(!initial);
-		map->m_llen = map->m_la + 1 - inode->i_size;
-		map->m_la = inode->i_size;
-		map->m_flags = 0;
-		goto out;
-	}
-
-	debugln("%s, m_la %llu m_llen %llu --- start", __func__,
-		map->m_la, map->m_llen);
-
-	ofs = map->m_la + map->m_llen;
-
-	/* clustersize should be power of two */
-	lcn = ofs >> ctx.clusterbits;
-	ofs_rem = ofs & (clustersize - 1);
-
-	mblk = vle_extent_blkaddr(inode, lcn);
-
-	if (!mpage || mpage->index != mblk) {
-		if (mpage)
-			put_page(mpage);
-
-		mpage = erofs_get_meta_page(ctx.sb, mblk, false);
-		if (IS_ERR(mpage)) {
-			err = PTR_ERR(mpage);
-			goto out;
-		}
-		map->mpage = mpage;
-	} else {
-		lock_page(mpage);
-		DBG_BUGON(!PageUptodate(mpage));
-	}
-
-	kaddr = kmap_atomic(mpage);
-	di = kaddr + vle_extent_blkoff(inode, lcn);
-
-	debugln("%s, lcn %u mblk %u e_blkoff %u", __func__, lcn,
-		mblk, vle_extent_blkoff(inode, lcn));
-
-	err = vle_decompressed_index_clusterofs(&logical_cluster_ofs,
-						clustersize, di);
-	if (unlikely(err))
-		goto unmap_out;
-
-	if (!initial) {
-		/* [walking mode] 'map' has been already initialized */
-		map->m_llen += logical_cluster_ofs;
-		goto unmap_out;
-	}
-
-	/* by default, compressed */
-	map->m_flags |= EROFS_MAP_ZIPPED;
-
-	end = ((u64)lcn + 1) * clustersize;
-
-	cluster_type = vle_cluster_type(di);
-
-	switch (cluster_type) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		if (ofs_rem >= logical_cluster_ofs)
-			map->m_flags ^= EROFS_MAP_ZIPPED;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		if (ofs_rem == logical_cluster_ofs) {
-			pblk = le32_to_cpu(di->di_u.blkaddr);
-			goto exact_hitted;
-		}
-
-		if (ofs_rem > logical_cluster_ofs) {
-			ofs = (u64)lcn * clustersize | logical_cluster_ofs;
-			pblk = le32_to_cpu(di->di_u.blkaddr);
-			break;
-		}
-
-		/* logical cluster number should be >= 1 */
-		if (unlikely(!lcn)) {
-			errln("invalid logical cluster 0 at nid %llu",
-			      EROFS_V(inode)->nid);
-			err = -EIO;
-			goto unmap_out;
-		}
-		end = ((u64)lcn-- * clustersize) | logical_cluster_ofs;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		/* get the correspoinding first chunk */
-		err = vle_get_logical_extent_head(&ctx, lcn, &ofs,
-						  &pblk, &map->m_flags);
-		mpage = map->mpage;
-
-		if (unlikely(err)) {
-			if (mpage)
-				goto unmap_out;
-			goto out;
-		}
-		break;
-	default:
-		errln("unknown cluster type %u at offset %llu of nid %llu",
-		      cluster_type, ofs, EROFS_V(inode)->nid);
-		err = -EIO;
-		goto unmap_out;
-	}
-
-	map->m_la = ofs;
-exact_hitted:
-	map->m_llen = end - ofs;
-	map->m_plen = clustersize;
-	map->m_pa = blknr_to_addr(pblk);
-	map->m_flags |= EROFS_MAP_MAPPED;
-unmap_out:
-	kunmap_atomic(kaddr);
-	unlock_page(mpage);
-out:
-	debugln("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
-		__func__, map->m_la, map->m_pa,
-		map->m_llen, map->m_plen, map->m_flags);
-
-	trace_z_erofs_map_blocks_iter_exit(inode, map, flags, err);
-
-	/* aggressively BUG_ON iff CONFIG_EROFS_FS_DEBUG is on */
-	DBG_BUGON(err < 0 && err != -ENOMEM);
-	return err;
-}
-
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
new file mode 100644
index 000000000000..77bc49c07846
--- /dev/null
+++ b/drivers/staging/erofs/zmap.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/drivers/staging/erofs/zmap.c
+ *
+ * Copyright (C) 2018-2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25 at huawei.com>
+ */
+#include "internal.h"
+#include <asm/unaligned.h>
+#include <trace/events/erofs.h>
+
+int z_erofs_fill_inode(struct inode *inode)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct super_block *const sb = inode->i_sb;
+
+	if (vi->datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY) {
+		vi->z_advise = 0;
+		vi->z_algorithmtype[0] = 0;
+		vi->z_algorithmtype[1] = 0;
+		vi->z_logical_clusterbits = EROFS_SB(sb)->clusterbits;
+		vi->z_physical_clusterbits[0] = vi->z_logical_clusterbits;
+		vi->z_physical_clusterbits[1] = vi->z_logical_clusterbits;
+		set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
+	}
+
+	inode->i_mapping->a_ops = &z_erofs_vle_normalaccess_aops;
+	return 0;
+}
+
+static int fill_inode_lazy(struct inode *inode)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct super_block *const sb = inode->i_sb;
+	int err;
+	erofs_off_t pos;
+	struct page *page;
+	void *kaddr;
+	struct z_erofs_map_header *h;
+
+	if (test_bit(EROFS_V_Z_INITED_BIT, &vi->flags))
+		return 0;
+
+	if (wait_on_bit_lock(&vi->flags, EROFS_V_BL_Z_BIT, TASK_KILLABLE))
+		return -ERESTARTSYS;
+
+	err = 0;
+	if (test_bit(EROFS_V_Z_INITED_BIT, &vi->flags))
+		goto out_unlock;
+
+	DBG_BUGON(vi->datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY);
+
+	pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize +
+		    vi->xattr_isize, 8);
+	page = erofs_get_meta_page(sb, erofs_blknr(pos), false);
+	if (IS_ERR(page)) {
+		err = PTR_ERR(page);
+		goto out_unlock;
+	}
+
+	kaddr = kmap_atomic(page);
+
+	h = kaddr + erofs_blkoff(pos);
+	vi->z_advise = le16_to_cpu(h->h_advise);
+	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
+	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
+
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
+		errln("unknown compression format %u for nid %llu, please upgrade kernel",
+		      vi->z_algorithmtype[0], vi->nid);
+		err = -ENOTSUPP;
+		goto unmap_done;
+	}
+
+	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
+	vi->z_physical_clusterbits[0] = vi->z_logical_clusterbits +
+					((h->h_clusterbits >> 3) & 3);
+
+	if (vi->z_physical_clusterbits[0] != LOG_BLOCK_SIZE) {
+		errln("unsupported physical clusterbits %u for nid %llu, please upgrade kernel",
+		      vi->z_physical_clusterbits[0], vi->nid);
+		err = -ENOTSUPP;
+		goto unmap_done;
+	}
+
+	vi->z_physical_clusterbits[1] = vi->z_logical_clusterbits +
+					((h->h_clusterbits >> 5) & 7);
+unmap_done:
+	kunmap_atomic(kaddr);
+	unlock_page(page);
+	put_page(page);
+
+	set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
+out_unlock:
+	clear_and_wake_up_bit(EROFS_V_BL_Z_BIT, &vi->flags);
+	return err;
+}
+
+struct z_erofs_maprecorder {
+	struct inode *inode;
+	struct erofs_map_blocks *map;
+	void *kaddr;
+
+	unsigned long lcn;
+	/* compression extent information gathered */
+	u8  type;
+	u16 clusterofs;
+	u16 delta[2];
+	erofs_blk_t pblk;
+};
+
+static int z_erofs_reload_indexes(struct z_erofs_maprecorder *m,
+				  erofs_blk_t eblk)
+{
+	struct super_block *const sb = m->inode->i_sb;
+	struct erofs_map_blocks *const map = m->map;
+	struct page *mpage = map->mpage;
+
+	if (mpage) {
+		if (mpage->index == eblk) {
+			if (!m->kaddr)
+				m->kaddr = kmap_atomic(mpage);
+			return 0;
+		}
+
+		if (m->kaddr) {
+			kunmap_atomic(m->kaddr);
+			m->kaddr = NULL;
+		}
+		put_page(mpage);
+	}
+
+	mpage = erofs_get_meta_page(sb, eblk, false);
+	if (IS_ERR(mpage)) {
+		map->mpage = NULL;
+		return PTR_ERR(mpage);
+	}
+	m->kaddr = kmap_atomic(mpage);
+	unlock_page(mpage);
+	map->mpage = mpage;
+	return 0;
+}
+
+static int vle_legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+					     unsigned long lcn)
+{
+	struct inode *const inode = m->inode;
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	const erofs_off_t ibase = iloc(EROFS_I_SB(inode), vi->nid);
+	const erofs_off_t pos = Z_EROFS_VLE_EXTENT_ALIGN(ibase +
+							 vi->inode_isize +
+							 vi->xattr_isize) +
+		16 + lcn * sizeof(struct z_erofs_vle_decompressed_index);
+	struct z_erofs_vle_decompressed_index *di;
+	unsigned int advise, type;
+	int err;
+
+	err = z_erofs_reload_indexes(m, erofs_blknr(pos));
+	if (err)
+		return err;
+
+	m->lcn = lcn;
+	di = m->kaddr + erofs_blkoff(pos);
+
+	advise = le16_to_cpu(di->di_advise);
+	type = (advise >> Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT) &
+		((1 << Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS) - 1);
+	switch (type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		m->clusterofs = 1 << vi->z_logical_clusterbits;
+		m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
+		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
+		break;
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		m->clusterofs = le16_to_cpu(di->di_clusterofs);
+		m->pblk = le32_to_cpu(di->di_u.blkaddr);
+		break;
+	default:
+		DBG_BUGON(1);
+		return -EIO;
+	}
+	m->type = type;
+	return 0;
+}
+
+static unsigned int decode_compactedbits(unsigned int lobits,
+					 unsigned int lomask,
+					 u8 *in, unsigned int pos, u8 *type)
+{
+	const unsigned int v = get_unaligned_le32(in + pos / 8) >> (pos & 7);
+	const unsigned int lo = v & lomask;
+
+	*type = (v >> lobits) & 3;
+	return lo;
+}
+
+static int unpack_compacted_index(struct z_erofs_maprecorder *m,
+				  unsigned int amortizedshift,
+				  unsigned int eofs)
+{
+	struct erofs_vnode *const vi = EROFS_V(m->inode);
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	const unsigned int lomask = (1 << lclusterbits) - 1;
+	unsigned int vcnt, base, lo, encodebits, nblk;
+	int i;
+	u8 *in, type;
+
+	if (1 << amortizedshift == 4)
+		vcnt = 2;
+	else if (1 << amortizedshift == 2 && lclusterbits == 12)
+		vcnt = 16;
+	else
+		return -ENOTSUPP;
+
+	encodebits = ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt;
+	base = round_down(eofs, vcnt << amortizedshift);
+	in = m->kaddr + base;
+
+	i = (eofs - base) >> amortizedshift;
+
+	lo = decode_compactedbits(lclusterbits, lomask,
+				  in, encodebits * i, &type);
+	m->type = type;
+	if (type == Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD) {
+		m->clusterofs = 1 << lclusterbits;
+		if (i + 1 != vcnt) {
+			m->delta[0] = lo;
+			return 0;
+		}
+		/*
+		 * since the last lcluster in the pack is special,
+		 * of which lo saves delta[1] rather than delta[0].
+		 * Hence, get delta[0] by the previous lcluster indirectly.
+		 */
+		lo = decode_compactedbits(lclusterbits, lomask,
+					  in, encodebits * (i - 1), &type);
+		if (type != Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD)
+			lo = 0;
+		m->delta[0] = lo + 1;
+		return 0;
+	}
+	m->clusterofs = lo;
+	m->delta[0] = 0;
+	/* figout out blkaddr (pblk) for HEAD lclusters */
+	nblk = 1;
+	while (i > 0) {
+		--i;
+		lo = decode_compactedbits(lclusterbits, lomask,
+					  in, encodebits * i, &type);
+		if (type == Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD)
+			i -= lo;
+
+		if (i >= 0)
+			++nblk;
+	}
+	in += (vcnt << amortizedshift) - sizeof(__le32);
+	m->pblk = le32_to_cpu(*(__le32 *)in) + nblk;
+	return 0;
+}
+
+static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+					    unsigned long lcn)
+{
+	struct inode *const inode = m->inode;
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	const erofs_off_t ebase = ALIGN(iloc(EROFS_I_SB(inode), vi->nid) +
+					vi->inode_isize + vi->xattr_isize, 8) +
+		sizeof(struct z_erofs_map_header);
+	const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
+	unsigned int compacted_4b_initial, compacted_2b;
+	unsigned int amortizedshift;
+	erofs_off_t pos;
+	int err;
+
+	if (lclusterbits != 12)
+		return -ENOTSUPP;
+
+	if (lcn >= totalidx)
+		return -EINVAL;
+
+	m->lcn = lcn;
+	/* used to align to 32-byte (compacted_2b) alignment */
+	compacted_4b_initial = (32 - ebase % 32) / 4;
+	if (compacted_4b_initial == 32 / 4)
+		compacted_4b_initial = 0;
+
+	if (vi->z_advise & Z_EROFS_ADVISE_COMPACTED_2B)
+		compacted_2b = rounddown(totalidx - compacted_4b_initial, 16);
+	else
+		compacted_2b = 0;
+
+	pos = ebase;
+	if (lcn < compacted_4b_initial) {
+		amortizedshift = 2;
+		goto out;
+	}
+	pos += compacted_4b_initial * 4;
+	lcn -= compacted_4b_initial;
+
+	if (lcn < compacted_2b) {
+		amortizedshift = 1;
+		goto out;
+	}
+	pos += compacted_2b * 2;
+	lcn -= compacted_2b;
+	amortizedshift = 2;
+out:
+	pos += lcn * (1 << amortizedshift);
+	err = z_erofs_reload_indexes(m, erofs_blknr(pos));
+	if (err)
+		return err;
+	return unpack_compacted_index(m, amortizedshift, erofs_blkoff(pos));
+}
+
+static int vle_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+				      unsigned int lcn)
+{
+	const unsigned int datamode = EROFS_V(m->inode)->datamode;
+
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY)
+		return vle_legacy_load_cluster_from_disk(m, lcn);
+
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION)
+		return compacted_load_cluster_from_disk(m, lcn);
+
+	return -EINVAL;
+}
+
+static int vle_extent_lookback(struct z_erofs_maprecorder *m,
+			       unsigned int lookback_distance)
+{
+	struct erofs_vnode *const vi = EROFS_V(m->inode);
+	struct erofs_map_blocks *const map = m->map;
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	unsigned long lcn = m->lcn;
+	int err;
+
+	if (lcn < lookback_distance) {
+		DBG_BUGON(1);
+		return -EIO;
+	}
+
+	/* load extent head logical cluster if needed */
+	lcn -= lookback_distance;
+	err = vle_load_cluster_from_disk(m, lcn);
+	if (err)
+		return err;
+
+	switch (m->type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		return vle_extent_lookback(m, m->delta[0]);
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+		map->m_flags &= ~EROFS_MAP_ZIPPED;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		map->m_la = (lcn << lclusterbits) | m->clusterofs;
+		break;
+	default:
+		errln("unknown type %u at lcn %lu of nid %llu",
+		      m->type, lcn, vi->nid);
+		DBG_BUGON(1);
+		return -EIO;
+	}
+	return 0;
+}
+
+int z_erofs_map_blocks_iter(struct inode *inode,
+			    struct erofs_map_blocks *map,
+			    int flags)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct z_erofs_maprecorder m = {
+		.inode = inode,
+		.map = map,
+	};
+	int err = 0;
+	unsigned int lclusterbits, endoff;
+	unsigned long long ofs, end;
+
+	trace_z_erofs_map_blocks_iter_enter(inode, map, flags);
+
+	/* when trying to read beyond EOF, leave it unmapped */
+	if (unlikely(map->m_la >= inode->i_size)) {
+		map->m_llen = map->m_la + 1 - inode->i_size;
+		map->m_la = inode->i_size;
+		map->m_flags = 0;
+		goto out;
+	}
+
+	err = fill_inode_lazy(inode);
+	if (err)
+		goto out;
+
+	lclusterbits = vi->z_logical_clusterbits;
+	ofs = map->m_la;
+	m.lcn = ofs >> lclusterbits;
+	endoff = ofs & ((1 << lclusterbits) - 1);
+
+	err = vle_load_cluster_from_disk(&m, m.lcn);
+	if (err)
+		goto unmap_out;
+
+	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
+	end = (m.lcn + 1ULL) << lclusterbits;
+
+	switch (m.type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+		if (endoff >= m.clusterofs)
+			map->m_flags &= ~EROFS_MAP_ZIPPED;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		if (endoff >= m.clusterofs) {
+			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
+			break;
+		}
+		/* m.lcn should be >= 1 if endoff < m.clusterofs */
+		if (unlikely(!m.lcn)) {
+			errln("invalid logical cluster 0 at nid %llu",
+			      vi->nid);
+			err = -EIO;
+			goto unmap_out;
+		}
+		end = (m.lcn << lclusterbits) | m.clusterofs;
+		m.delta[0] = 1;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		/* get the correspoinding first chunk */
+		err = vle_extent_lookback(&m, m.delta[0]);
+		if (unlikely(err))
+			goto unmap_out;
+		break;
+	default:
+		errln("unknown type %u at offset %llu of nid %llu",
+		      m.type, ofs, vi->nid);
+		err = -EIO;
+		goto unmap_out;
+	}
+
+	map->m_llen = end - map->m_la;
+	map->m_plen = 1 << lclusterbits;
+	map->m_pa = blknr_to_addr(m.pblk);
+	map->m_flags |= EROFS_MAP_MAPPED;
+
+unmap_out:
+	if (m.kaddr)
+		kunmap_atomic(m.kaddr);
+
+out:
+	debugln("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
+		__func__, map->m_la, map->m_pa,
+		map->m_llen, map->m_plen, map->m_flags);
+
+	trace_z_erofs_map_blocks_iter_exit(inode, map, flags, err);
+
+	/* aggressively BUG_ON iff CONFIG_EROFS_FS_DEBUG is on */
+	DBG_BUGON(err < 0 && err != -ENOMEM);
+	return err;
+}
+
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 3/8] staging: erofs: move per-CPU buffers implementation to utils.c
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch moves per-CPU buffers to utils.c in order for
the upcoming generic decompression framework to use it.

Note that I tried to use generic per-CPU buffer or
per-CPU page approaches to clean up further, but obvious
performanace regression (about 2% for sequential read) was
observed.

Therefore let's leave it as it is instead, just move
to utils.c and I'll try to dig into the root cause later.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/internal.h      | 26 ++++++++++++++++++++++
 drivers/staging/erofs/unzip_vle.c     |  5 ++---
 drivers/staging/erofs/unzip_vle.h     |  4 +---
 drivers/staging/erofs/unzip_vle_lz4.c | 31 ++++++++-------------------
 drivers/staging/erofs/utils.c         | 12 +++++++++++
 5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index f3063b13c117..dcbe6f7f5dae 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -321,6 +321,16 @@ static inline void z_erofs_exit_zip_subsystem(void) {}
 
 /* page count of a compressed cluster */
 #define erofs_clusterpages(sbi)         ((1 << (sbi)->clusterbits) / PAGE_SIZE)
+#define Z_EROFS_NR_INLINE_PAGEVECS      3
+
+#if (Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_NR_INLINE_PAGEVECS)
+#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_CLUSTER_MAX_PAGES
+#else
+#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_NR_INLINE_PAGEVECS
+#endif
+
+#else
+#define EROFS_PCPUBUF_NR_PAGES          0
 #endif
 
 typedef u64 erofs_off_t;
@@ -608,6 +618,22 @@ static inline void erofs_vunmap(const void *mem, unsigned int count)
 extern struct shrinker erofs_shrinker_info;
 
 struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp);
+
+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+void *erofs_get_pcpubuf(unsigned int pagenr);
+#define erofs_put_pcpubuf(buf) do { \
+	(void)&(buf);	\
+	preempt_enable();	\
+} while (0)
+#else
+static inline void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+	return ERR_PTR(-ENOTSUPP);
+}
+
+#define erofs_put_pcpubuf(buf) do {} while (0)
+#endif
+
 void erofs_register_super(struct super_block *sb);
 void erofs_unregister_super(struct super_block *sb);
 
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 8aea938172df..08f2d4302ecb 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -552,8 +552,7 @@ static int z_erofs_vle_work_iter_begin(struct z_erofs_vle_work_builder *builder,
 	if (IS_ERR(work))
 		return PTR_ERR(work);
 got_it:
-	z_erofs_pagevec_ctor_init(&builder->vector,
-				  Z_EROFS_VLE_INLINE_PAGEVECS,
+	z_erofs_pagevec_ctor_init(&builder->vector, Z_EROFS_NR_INLINE_PAGEVECS,
 				  work->pagevec, work->vcnt);
 
 	if (builder->role >= Z_EROFS_VLE_WORK_PRIMARY) {
@@ -936,7 +935,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	for (i = 0; i < nr_pages; ++i)
 		pages[i] = NULL;
 
-	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_VLE_INLINE_PAGEVECS,
+	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
 				  work->pagevec, 0);
 
 	for (i = 0; i < work->vcnt; ++i) {
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 902e67d04029..9c53009700cf 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -44,8 +44,6 @@ static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool,
  *
  */
 
-#define Z_EROFS_VLE_INLINE_PAGEVECS     3
-
 struct z_erofs_vle_work {
 	struct mutex lock;
 
@@ -58,7 +56,7 @@ struct z_erofs_vle_work {
 
 	union {
 		/* L: pagevec */
-		erofs_vtptr_t pagevec[Z_EROFS_VLE_INLINE_PAGEVECS];
+		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
 		struct rcu_head rcu;
 	};
 };
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c
index 0daac9b984a8..399c3e3a3ff3 100644
--- a/drivers/staging/erofs/unzip_vle_lz4.c
+++ b/drivers/staging/erofs/unzip_vle_lz4.c
@@ -34,16 +34,6 @@ static int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
 	return -EIO;
 }
 
-#if Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_VLE_INLINE_PAGEVECS
-#define EROFS_PERCPU_NR_PAGES   Z_EROFS_CLUSTER_MAX_PAGES
-#else
-#define EROFS_PERCPU_NR_PAGES   Z_EROFS_VLE_INLINE_PAGEVECS
-#endif
-
-static struct {
-	char data[PAGE_SIZE * EROFS_PERCPU_NR_PAGES];
-} erofs_pcpubuf[NR_CPUS];
-
 int z_erofs_vle_plain_copy(struct page **compressed_pages,
 			   unsigned int clusterpages,
 			   struct page **pages,
@@ -56,8 +46,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
 	char *percpu_data;
 	bool mirrored[Z_EROFS_CLUSTER_MAX_PAGES] = { 0 };
 
-	preempt_disable();
-	percpu_data = erofs_pcpubuf[smp_processor_id()].data;
+	percpu_data = erofs_get_pcpubuf(0);
 
 	j = 0;
 	for (i = 0; i < nr_pages; j = i++) {
@@ -117,7 +106,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
 	if (src && !mirrored[j])
 		kunmap_atomic(src);
 
-	preempt_enable();
+	erofs_put_pcpubuf(percpu_data);
 	return 0;
 }
 
@@ -131,7 +120,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 	unsigned int nr_pages, i, j;
 	int ret;
 
-	if (outlen + pageofs > EROFS_PERCPU_NR_PAGES * PAGE_SIZE)
+	if (outlen + pageofs > EROFS_PCPUBUF_NR_PAGES * PAGE_SIZE)
 		return -ENOTSUPP;
 
 	nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE);
@@ -144,8 +133,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 			return -ENOMEM;
 	}
 
-	preempt_disable();
-	vout = erofs_pcpubuf[smp_processor_id()].data;
+	vout = erofs_get_pcpubuf(0);
 
 	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
 				clusterpages * PAGE_SIZE, outlen);
@@ -174,7 +162,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 	}
 
 out:
-	preempt_enable();
+	erofs_put_pcpubuf(vout);
 
 	if (clusterpages == 1)
 		kunmap_atomic(vin);
@@ -196,8 +184,7 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
 	int ret;
 
 	if (overlapped) {
-		preempt_disable();
-		vin = erofs_pcpubuf[smp_processor_id()].data;
+		vin = erofs_get_pcpubuf(0);
 
 		for (i = 0; i < clusterpages; ++i) {
 			void *t = kmap_atomic(compressed_pages[i]);
@@ -216,13 +203,13 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
 	if (ret > 0)
 		ret = 0;
 
-	if (!overlapped) {
+	if (overlapped) {
+		erofs_put_pcpubuf(vin);
+	} else {
 		if (clusterpages == 1)
 			kunmap_atomic(vin);
 		else
 			erofs_vunmap(vin, clusterpages);
-	} else {
-		preempt_enable();
 	}
 	return ret;
 }
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c
index 3e7d30b6de1d..4bbd3bf34acd 100644
--- a/drivers/staging/erofs/utils.c
+++ b/drivers/staging/erofs/utils.c
@@ -27,6 +27,18 @@ struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp)
 	return page;
 }
 
+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+static struct {
+	u8 data[PAGE_SIZE * EROFS_PCPUBUF_NR_PAGES];
+} ____cacheline_aligned_in_smp erofs_pcpubuf[NR_CPUS];
+
+void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+	preempt_disable();
+	return &erofs_pcpubuf[smp_processor_id()].data[pagenr * PAGE_SIZE];
+}
+#endif
+
 /* global shrink count (for all mounted EROFS instances) */
 static atomic_long_t erofs_global_shrink_cnt;
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 4/8] staging: erofs: move stagingpage operations to compress.h
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

stagingpages are behaved as bounce pages for temporary use.
Move to compress.h since the upcoming decompressor will
allocate stagingpages as well.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/compress.h  | 40 +++++++++++++++++++++++++++++++
 drivers/staging/erofs/unzip_vle.c | 11 +++++----
 drivers/staging/erofs/unzip_vle.h | 20 ----------------
 3 files changed, 46 insertions(+), 25 deletions(-)
 create mode 100644 drivers/staging/erofs/compress.h

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
new file mode 100644
index 000000000000..1dcfc3b35118
--- /dev/null
+++ b/drivers/staging/erofs/compress.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * linux/drivers/staging/erofs/compress.h
+ *
+ * Copyright (C) 2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25 at huawei.com>
+ */
+#ifndef __EROFS_FS_COMPRESS_H
+#define __EROFS_FS_COMPRESS_H
+
+/*
+ * - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
+ * used to mark temporary allocated pages from other
+ * file/cached pages and NULL mapping pages.
+ */
+#define Z_EROFS_MAPPING_STAGING         ((void *)0x5A110C8D)
+
+/* check if a page is marked as staging */
+static inline bool z_erofs_page_is_staging(struct page *page)
+{
+	return page->mapping == Z_EROFS_MAPPING_STAGING;
+}
+
+static inline bool z_erofs_put_stagingpage(struct list_head *pagepool,
+					   struct page *page)
+{
+	if (!z_erofs_page_is_staging(page))
+		return false;
+
+	/* staging pages should not be used by others at the same time */
+	if (page_ref_count(page) > 1)
+		put_page(page);
+	else
+		list_add(&page->lru, pagepool);
+	return true;
+}
+
+#endif
+
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 08f2d4302ecb..d95f985936b6 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -11,6 +11,7 @@
  * distribution for more details.
  */
 #include "unzip_vle.h"
+#include "compress.h"
 #include <linux/prefetch.h>
 
 #include <trace/events/erofs.h>
@@ -855,7 +856,7 @@ static inline void z_erofs_vle_read_endio(struct bio *bio)
 		DBG_BUGON(PageUptodate(page));
 		DBG_BUGON(!page->mapping);
 
-		if (unlikely(!sbi && !z_erofs_is_stagingpage(page))) {
+		if (unlikely(!sbi && !z_erofs_page_is_staging(page))) {
 			sbi = EROFS_SB(page->mapping->host->i_sb);
 
 			if (time_to_inject(sbi, FAULT_READ_IO)) {
@@ -947,7 +948,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page);
 		DBG_BUGON(!page->mapping);
 
-		if (z_erofs_gather_if_stagingpage(page_pool, page))
+		if (z_erofs_put_stagingpage(page_pool, page))
 			continue;
 
 		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
@@ -977,7 +978,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page);
 		DBG_BUGON(!page->mapping);
 
-		if (!z_erofs_is_stagingpage(page)) {
+		if (!z_erofs_page_is_staging(page)) {
 			if (erofs_page_is_managed(sbi, page)) {
 				if (unlikely(!PageUptodate(page)))
 					err = -EIO;
@@ -1055,7 +1056,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 			continue;
 
 		/* recycle all individual staging pages */
-		(void)z_erofs_gather_if_stagingpage(page_pool, page);
+		(void)z_erofs_put_stagingpage(page_pool, page);
 
 		WRITE_ONCE(compressed_pages[i], NULL);
 	}
@@ -1068,7 +1069,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page->mapping);
 
 		/* recycle all individual staging pages */
-		if (z_erofs_gather_if_stagingpage(page_pool, page))
+		if (z_erofs_put_stagingpage(page_pool, page))
 			continue;
 
 		if (unlikely(err < 0))
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 9c53009700cf..6c3e0deb63e7 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -16,26 +16,6 @@
 #include "internal.h"
 #include "unzip_pagevec.h"
 
-/*
- *  - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
- * used for temporary allocated pages (via erofs_allocpage),
- * in order to seperate those from NULL mapping (eg. truncated pages)
- */
-#define Z_EROFS_MAPPING_STAGING		((void *)0x5A110C8D)
-
-#define z_erofs_is_stagingpage(page)	\
-	((page)->mapping == Z_EROFS_MAPPING_STAGING)
-
-static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool,
-						 struct page *page)
-{
-	if (z_erofs_is_stagingpage(page)) {
-		list_add(&page->lru, page_pool);
-		return true;
-	}
-	return false;
-}
-
 /*
  * Structure fields follow one of the following exclusion rules.
  *
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 5/8] staging: erofs: introduce generic decompression backend
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch adds a new generic decompression framework
in order to replace the old LZ4-specific decompression code.

Even though LZ4 is still the only supported algorithm, yet
it is more cleaner and easy to integrate new algorithm than
the old almost hard-coded decompression backend.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/Makefile       |   2 +-
 drivers/staging/erofs/compress.h     |  21 ++
 drivers/staging/erofs/decompressor.c | 307 +++++++++++++++++++++++++++
 3 files changed, 329 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/erofs/decompressor.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index 84b412c7a991..adeb5d6e2668 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o decompressor.o
 
diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index 1dcfc3b35118..ebeccb1f4eae 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -9,6 +9,24 @@
 #ifndef __EROFS_FS_COMPRESS_H
 #define __EROFS_FS_COMPRESS_H
 
+#include "internal.h"
+
+enum {
+	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+	Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
+struct z_erofs_decompress_req {
+	struct page **in, **out;
+
+	unsigned short pageofs_out;
+	unsigned int inputsize, outputsize;
+
+	/* indicate the algorithm will be used for decompression */
+	unsigned int alg;
+	bool inplace_io, partial_decoding;
+};
+
 /*
  * - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
  * used to mark temporary allocated pages from other
@@ -36,5 +54,8 @@ static inline bool z_erofs_put_stagingpage(struct list_head *pagepool,
 	return true;
 }
 
+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+		       struct list_head *pagepool);
+
 #endif
 
diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
new file mode 100644
index 000000000000..c68d17b579e0
--- /dev/null
+++ b/drivers/staging/erofs/decompressor.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/drivers/staging/erofs/decompressor.c
+ *
+ * Copyright (C) 2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25 at huawei.com>
+ */
+#include "compress.h"
+#include <linux/lz4.h>
+
+#ifndef LZ4_DISTANCE_MAX	/* history window size */
+#define LZ4_DISTANCE_MAX 65535	/* set to maximum value by default */
+#endif
+
+#define LZ4_MAX_DISTANCE_PAGES	DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+
+struct z_erofs_decompressor {
+	/*
+	 * if destpages have sparsed pages, fill them with bounce pages.
+	 * it also check whether destpages indicate continuous physical memory.
+	 */
+	int (*prepare_destpages)(struct z_erofs_decompress_req *rq,
+				 struct list_head *pagepool);
+	int (*decompress)(struct z_erofs_decompress_req *rq, u8 *out);
+	char *name;
+};
+
+static int lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
+				 struct list_head *pagepool)
+{
+	const unsigned int nr =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	struct page *availables[LZ4_MAX_DISTANCE_PAGES] = { NULL };
+	unsigned long unused[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
+					  BITS_PER_LONG)] = { 0 };
+	void *kaddr = NULL;
+	unsigned int i, j, k;
+
+	for (i = 0; i < nr; ++i) {
+		struct page *const page = rq->out[i];
+
+		j = i & (LZ4_MAX_DISTANCE_PAGES - 1);
+		if (availables[j])
+			__set_bit(j, unused);
+
+		if (page) {
+			if (kaddr) {
+				if (kaddr + PAGE_SIZE == page_address(page))
+					kaddr += PAGE_SIZE;
+				else
+					kaddr = NULL;
+			} else if (!i) {
+				kaddr = page_address(page);
+			}
+			continue;
+		}
+		kaddr = NULL;
+
+		k = find_first_bit(unused, LZ4_MAX_DISTANCE_PAGES);
+		if (k < LZ4_MAX_DISTANCE_PAGES) {
+			j = k;
+			get_page(availables[j]);
+		} else {
+			DBG_BUGON(availables[j]);
+
+			if (!list_empty(pagepool)) {
+				availables[j] = lru_to_page(pagepool);
+				list_del(&availables[j]->lru);
+				DBG_BUGON(page_ref_count(availables[j]) != 1);
+			} else {
+				availables[j] = alloc_pages(GFP_KERNEL, 0);
+				if (!availables[j])
+					return -ENOMEM;
+			}
+			availables[j]->mapping = Z_EROFS_MAPPING_STAGING;
+		}
+		rq->out[i] = availables[j];
+		__clear_bit(j, unused);
+	}
+	return kaddr ? 1 : 0;
+}
+
+static void *generic_copy_inplace_data(struct z_erofs_decompress_req *rq,
+				       u8 *src, unsigned int pageofs_in)
+{
+	/*
+	 * if in-place decompression is ongoing, those decompressed
+	 * pages should be copied in order to avoid being overlapped.
+	 */
+	struct page **in = rq->in;
+	u8 *const tmp = erofs_get_pcpubuf(0);
+	u8 *tmpp = tmp;
+	unsigned int inlen = rq->inputsize - pageofs_in;
+	unsigned int count = min_t(uint, inlen, PAGE_SIZE - pageofs_in);
+
+	while (tmpp < tmp + inlen) {
+		if (!src)
+			src = kmap_atomic(*in);
+		memcpy(tmpp, src + pageofs_in, count);
+		kunmap_atomic(src);
+		src = NULL;
+		tmpp += count;
+		pageofs_in = 0;
+		count = PAGE_SIZE;
+		++in;
+	}
+	return tmp;
+}
+
+static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
+{
+	unsigned int inputmargin, inlen;
+	u8 *src;
+	bool copied;
+	int ret;
+
+	if (rq->inputsize > PAGE_SIZE)
+		return -ENOTSUPP;
+
+	src = kmap_atomic(*rq->in);
+	inputmargin = 0;
+	while (!src[inputmargin & ~PAGE_MASK])
+		if (!(++inputmargin & ~PAGE_MASK))
+			break;
+
+	if (inputmargin >= rq->inputsize) {
+		kunmap_atomic(src);
+		return -EIO;
+	}
+
+	copied = false;
+	inlen = rq->inputsize - inputmargin;
+	if (rq->inplace_io) {
+		src = generic_copy_inplace_data(rq, src, inputmargin);
+		inputmargin = 0;
+		copied = true;
+	}
+
+	ret = LZ4_decompress_safe_partial(src + inputmargin, out,
+					  inlen, rq->outputsize,
+					  rq->outputsize);
+	if (ret < 0) {
+		errln("%s, failed to decompress, in[%p, %u, %u] out[%p, %u]",
+		      __func__, src + inputmargin, inlen, inputmargin,
+		      out, rq->outputsize);
+		WARN_ON(1);
+		print_hex_dump(KERN_DEBUG, "[ in]: ", DUMP_PREFIX_OFFSET,
+			       16, 1, src + inputmargin, inlen, true);
+		print_hex_dump(KERN_DEBUG, "[out]: ", DUMP_PREFIX_OFFSET,
+			       16, 1, out, rq->outputsize, true);
+		ret = -EIO;
+	}
+
+	if (copied)
+		erofs_put_pcpubuf(src);
+	else
+		kunmap_atomic(src);
+	return ret;
+}
+
+static struct z_erofs_decompressor decompressors[] = {
+	[Z_EROFS_COMPRESSION_SHIFTED] = {
+		.name = "shifted"
+	},
+	[Z_EROFS_COMPRESSION_LZ4] = {
+		.prepare_destpages = lz4_prepare_destpages,
+		.decompress = lz4_decompress,
+		.name = "lz4"
+	},
+};
+
+static void copy_from_pcpubuf(struct page **out, const char *dst,
+			      unsigned short pageofs_out,
+			      unsigned int outputsize)
+{
+	const char *end = dst + outputsize;
+	const unsigned int righthalf = PAGE_SIZE - pageofs_out;
+	const char *cur = dst - pageofs_out;
+
+	while (cur < end) {
+		struct page *const page = *out++;
+
+		if (page) {
+			char *buf = kmap_atomic(page);
+
+			if (cur >= dst) {
+				memcpy(buf, cur, min_t(uint, PAGE_SIZE,
+						       end - cur));
+			} else {
+				memcpy(buf + pageofs_out, cur + pageofs_out,
+				       min_t(uint, righthalf, end - cur));
+			}
+			kunmap_atomic(buf);
+		}
+		cur += PAGE_SIZE;
+	}
+}
+
+static int decompress_generic(struct z_erofs_decompress_req *rq,
+			      struct list_head *pagepool)
+{
+	const unsigned int nrpages_out =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	const struct z_erofs_decompressor *alg = decompressors + rq->alg;
+	unsigned int dst_maptype;
+	void *dst;
+	int ret;
+
+	if (nrpages_out == 1 && !rq->inplace_io) {
+		DBG_BUGON(!*rq->out);
+		dst = kmap_atomic(*rq->out);
+		dst_maptype = 0;
+		goto dstmap_out;
+	}
+
+	/*
+	 * For the case of small output size (especially much less
+	 * than PAGE_SIZE), memcpy the decompressed data rather than
+	 * compressed data is preferred.
+	 */
+	if (rq->outputsize <= PAGE_SIZE * 7 / 8) {
+		dst = erofs_get_pcpubuf(0);
+
+		rq->inplace_io = false;
+		ret = alg->decompress(rq, dst);
+		if (!ret)
+			copy_from_pcpubuf(rq->out, dst, rq->pageofs_out,
+					  rq->outputsize);
+
+		erofs_put_pcpubuf(dst);
+		return ret;
+	}
+
+	ret = alg->prepare_destpages(rq, pagepool);
+	if (ret < 0) {
+		return ret;
+	} else if (ret) {
+		dst = page_address(*rq->out);
+		dst_maptype = 1;
+		goto dstmap_out;
+	}
+
+	dst = erofs_vmap(rq->out, nrpages_out);
+	if (!dst)
+		return -ENOMEM;
+	dst_maptype = 2;
+
+dstmap_out:
+	ret = alg->decompress(rq, dst + rq->pageofs_out);
+
+	if (!dst_maptype)
+		kunmap_atomic(dst);
+	else if (dst_maptype == 2)
+		erofs_vunmap(dst, nrpages_out);
+	return ret;
+}
+
+static int shifted_decompress(const struct z_erofs_decompress_req *rq,
+			      struct list_head *pagepool)
+{
+	const unsigned int nrpages_out =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	const unsigned int righthalf = PAGE_SIZE - rq->pageofs_out;
+	unsigned char *src, *dst;
+
+	if (nrpages_out > 2) {
+		DBG_BUGON(1);
+		return -EIO;
+	}
+
+	if (rq->out[0] == *rq->in) {
+		DBG_BUGON(nrpages_out != 1);
+		return 0;
+	}
+
+	src = kmap_atomic(*rq->in);
+	if (!rq->out[0]) {
+		dst = NULL;
+	} else {
+		dst = kmap_atomic(rq->out[0]);
+		memcpy(dst + rq->pageofs_out, src, righthalf);
+	}
+
+	if (rq->out[1] == *rq->in) {
+		memmove(src, src + righthalf, rq->pageofs_out);
+	} else if (nrpages_out == 2) {
+		if (dst)
+			kunmap_atomic(dst);
+		DBG_BUGON(!rq->out[1]);
+		dst = kmap_atomic(rq->out[1]);
+		memcpy(dst, src + righthalf, rq->pageofs_out);
+	}
+	if (dst)
+		kunmap_atomic(dst);
+	kunmap_atomic(src);
+	return 0;
+}
+
+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+		       struct list_head *pagepool)
+{
+	if (rq->alg == Z_EROFS_COMPRESSION_SHIFTED)
+		return shifted_decompress(rq, pagepool);
+	return decompress_generic(rq, pagepool);
+}
+
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 6/8] staging: erofs: introduce LZ4 decompression inplace
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

compressed data will be usually loaded into last pages of
the extent (the last page for 4k) for in-place decompression
(more specifically, in-place IO), as ilustration below,

         start of compressed logical extent
           |                          end of this logical extent
           |                           |
     ______v___________________________v________
... |  page 6  |  page 7  |  page 8  |  page 9  | ...
    |__________|__________|__________|__________|
           .                         ^ .        ^
           .                         |compressed|
           .                         |   data   |
           .                           .        .
           |<          dstsize        >|<margin>|
                                       oend     iend
           op                        ip

Therefore, it's possible to do decompression inplace (thus no
memcpy at all) if the margin is sufficient and safe enough [1],
and it can be implemented only for fixed-size output compression
compared with fixed-size input compression.

No memcpy for most of in-place IO (about 99% of enwik9) after
decompression inplace is implemented and sequential read will
be improved of course (see the following patches for test results).

[1] https://github.com/lz4/lz4/commit/b17f578a919b7e6b078cede2d52be29dd48c8e8c
    https://github.com/lz4/lz4/commit/5997e139f53169fa3a1c1b4418d2452a90b01602

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/compress.h     |  1 +
 drivers/staging/erofs/decompressor.c | 21 ++++++++++++++++++---
 drivers/staging/erofs/erofs_fs.h     |  3 ++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index ebeccb1f4eae..c43aa3374d28 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -17,6 +17,7 @@ enum {
 };
 
 struct z_erofs_decompress_req {
+	struct super_block *sb;
 	struct page **in, **out;
 
 	unsigned short pageofs_out;
diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
index c68d17b579e0..689fb8ec7032 100644
--- a/drivers/staging/erofs/decompressor.c
+++ b/drivers/staging/erofs/decompressor.c
@@ -14,6 +14,9 @@
 #endif
 
 #define LZ4_MAX_DISTANCE_PAGES	DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+#ifndef LZ4_DECOMPRESS_INPLACE_MARGIN
+#define LZ4_DECOMPRESS_INPLACE_MARGIN(srcsize)  (((srcsize) >> 8) + 32)
+#endif
 
 struct z_erofs_decompressor {
 	/*
@@ -132,9 +135,21 @@ static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
 	copied = false;
 	inlen = rq->inputsize - inputmargin;
 	if (rq->inplace_io) {
-		src = generic_copy_inplace_data(rq, src, inputmargin);
-		inputmargin = 0;
-		copied = true;
+		const uint oend = (rq->pageofs_out +
+				   rq->outputsize) & ~PAGE_MASK;
+		const uint nr = PAGE_ALIGN(rq->pageofs_out +
+					   rq->outputsize) >> PAGE_SHIFT;
+
+		if (rq->partial_decoding ||
+		    !(EROFS_SB(rq->sb)->requirements &
+		      EROFS_REQUIREMENT_LZ4_0PADDING) ||
+		    rq->out[nr - 1] != rq->in[0] ||
+		    rq->inputsize - oend <
+		      LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) {
+			src = generic_copy_inplace_data(rq, src, inputmargin);
+			inputmargin = 0;
+			copied = true;
+		}
 	}
 
 	ret = LZ4_decompress_safe_partial(src + inputmargin, out,
diff --git a/drivers/staging/erofs/erofs_fs.h b/drivers/staging/erofs/erofs_fs.h
index a05139f1df60..353322a3206c 100644
--- a/drivers/staging/erofs/erofs_fs.h
+++ b/drivers/staging/erofs/erofs_fs.h
@@ -21,7 +21,8 @@
  * Any bits that aren't in EROFS_ALL_REQUIREMENTS should be
  * incompatible with this kernel version.
  */
-#define EROFS_ALL_REQUIREMENTS  0
+#define EROFS_REQUIREMENT_LZ4_0PADDING	0x00000001
+#define EROFS_ALL_REQUIREMENTS		EROFS_REQUIREMENT_LZ4_0PADDING
 
 struct erofs_super_block {
 /*  0 */__le32 magic;           /* in the little endian */
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 7/8] staging: erofs: switch to new decompression backend
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch integrates new decompression framework to
erofs decompression path, and remove the old
decompression implementation as well.

On kirin980 platform, sequential read is slightly
improved to 778MiB/s after the new decompression
backend is used.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/Makefile        |   2 +-
 drivers/staging/erofs/internal.h      |   6 -
 drivers/staging/erofs/unzip_vle.c     |  59 +++----
 drivers/staging/erofs/unzip_vle.h     |  15 +-
 drivers/staging/erofs/unzip_vle_lz4.c | 216 --------------------------
 5 files changed, 24 insertions(+), 274 deletions(-)
 delete mode 100644 drivers/staging/erofs/unzip_vle_lz4.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index adeb5d6e2668..e704d9e51514 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o decompressor.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o zmap.o decompressor.o
 
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index dcbe6f7f5dae..6c8767d4a1d5 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -321,14 +321,8 @@ static inline void z_erofs_exit_zip_subsystem(void) {}
 
 /* page count of a compressed cluster */
 #define erofs_clusterpages(sbi)         ((1 << (sbi)->clusterbits) / PAGE_SIZE)
-#define Z_EROFS_NR_INLINE_PAGEVECS      3
 
-#if (Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_NR_INLINE_PAGEVECS)
 #define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_CLUSTER_MAX_PAGES
-#else
-#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_NR_INLINE_PAGEVECS
-#endif
-
 #else
 #define EROFS_PCPUBUF_NR_PAGES          0
 #endif
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index d95f985936b6..cb870b83f3c8 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -897,12 +897,12 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	unsigned int sparsemem_pages = 0;
 	struct page *pages_onstack[Z_EROFS_VLE_VMAP_ONSTACK_PAGES];
 	struct page **pages, **compressed_pages, *page;
-	unsigned int i, llen;
+	unsigned int algorithm;
+	unsigned int i, outputsize;
 
 	enum z_erofs_page_type page_type;
 	bool overlapped;
 	struct z_erofs_vle_work *work;
-	void *vout;
 	int err;
 
 	might_sleep();
@@ -1009,43 +1009,26 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	if (unlikely(err))
 		goto out;
 
-	llen = (nr_pages << PAGE_SHIFT) - work->pageofs;
-
-	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN) {
-		err = z_erofs_vle_plain_copy(compressed_pages, clusterpages,
-					     pages, nr_pages, work->pageofs);
-		goto out;
-	}
-
-	if (llen > grp->llen)
-		llen = grp->llen;
-
-	err = z_erofs_vle_unzip_fast_percpu(compressed_pages, clusterpages,
-					    pages, llen, work->pageofs);
-	if (err != -ENOTSUPP)
-		goto out;
-
-	if (sparsemem_pages >= nr_pages)
-		goto skip_allocpage;
-
-	for (i = 0; i < nr_pages; ++i) {
-		if (pages[i])
-			continue;
-
-		pages[i] = __stagingpage_alloc(page_pool, GFP_NOFS);
-	}
-
-skip_allocpage:
-	vout = erofs_vmap(pages, nr_pages);
-	if (!vout) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	err = z_erofs_vle_unzip_vmap(compressed_pages, clusterpages, vout,
-				     llen, work->pageofs, overlapped);
+	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen)
+		outputsize = grp->llen;
+	else
+		outputsize = (nr_pages << PAGE_SHIFT) - work->pageofs;
 
-	erofs_vunmap(vout, nr_pages);
+	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN)
+		algorithm = Z_EROFS_COMPRESSION_SHIFTED;
+	else
+		algorithm = Z_EROFS_COMPRESSION_LZ4;
+
+	err = z_erofs_decompress(&(struct z_erofs_decompress_req) {
+					.sb = sb,
+					.in = compressed_pages,
+					.out = pages,
+					.pageofs_out = work->pageofs,
+					.inputsize = PAGE_SIZE,
+					.outputsize = outputsize,
+					.alg = algorithm,
+					.inplace_io = overlapped,
+					.partial_decoding = true }, page_pool);
 
 out:
 	/* must handle all compressed pages before endding pages */
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 6c3e0deb63e7..a2d9b60beebd 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -16,6 +16,8 @@
 #include "internal.h"
 #include "unzip_pagevec.h"
 
+#define Z_EROFS_NR_INLINE_PAGEVECS      3
+
 /*
  * Structure fields follow one of the following exclusion rules.
  *
@@ -189,18 +191,5 @@ static inline void z_erofs_onlinepage_endio(struct page *page)
 	min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
 #define Z_EROFS_VLE_VMAP_GLOBAL_PAGES	2048
 
-/* unzip_vle_lz4.c */
-int z_erofs_vle_plain_copy(struct page **compressed_pages,
-			   unsigned int clusterpages, struct page **pages,
-			   unsigned int nr_pages, unsigned short pageofs);
-int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
-				  unsigned int clusterpages,
-				  struct page **pages, unsigned int outlen,
-				  unsigned short pageofs);
-int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   void *vaddr, unsigned int llen,
-			   unsigned short pageofs, bool overlapped);
-
 #endif
 
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c
deleted file mode 100644
index 399c3e3a3ff3..000000000000
--- a/drivers/staging/erofs/unzip_vle_lz4.c
+++ /dev/null
@@ -1,216 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * linux/drivers/staging/erofs/unzip_vle_lz4.c
- *
- * Copyright (C) 2018 HUAWEI, Inc.
- *             http://www.huawei.com/
- * Created by Gao Xiang <gaoxiang25 at huawei.com>
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file COPYING in the main directory of the Linux
- * distribution for more details.
- */
-#include "unzip_vle.h"
-#include <linux/lz4.h>
-
-static int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
-{
-	int ret = LZ4_decompress_safe_partial(in, out, inlen, outlen, outlen);
-
-	if (ret >= 0)
-		return ret;
-
-	/*
-	 * LZ4_decompress_safe_partial will return an error code
-	 * (< 0) if decompression failed
-	 */
-	errln("%s, failed to decompress, in[%p, %zu] outlen[%p, %zu]",
-	      __func__, in, inlen, out, outlen);
-	WARN_ON(1);
-	print_hex_dump(KERN_DEBUG, "raw data [in]: ", DUMP_PREFIX_OFFSET,
-		       16, 1, in, inlen, true);
-	print_hex_dump(KERN_DEBUG, "raw data [out]: ", DUMP_PREFIX_OFFSET,
-		       16, 1, out, outlen, true);
-	return -EIO;
-}
-
-int z_erofs_vle_plain_copy(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   struct page **pages,
-			   unsigned int nr_pages,
-			   unsigned short pageofs)
-{
-	unsigned int i, j;
-	void *src = NULL;
-	const unsigned int righthalf = PAGE_SIZE - pageofs;
-	char *percpu_data;
-	bool mirrored[Z_EROFS_CLUSTER_MAX_PAGES] = { 0 };
-
-	percpu_data = erofs_get_pcpubuf(0);
-
-	j = 0;
-	for (i = 0; i < nr_pages; j = i++) {
-		struct page *page = pages[i];
-		void *dst;
-
-		if (!page) {
-			if (src) {
-				if (!mirrored[j])
-					kunmap_atomic(src);
-				src = NULL;
-			}
-			continue;
-		}
-
-		dst = kmap_atomic(page);
-
-		for (; j < clusterpages; ++j) {
-			if (compressed_pages[j] != page)
-				continue;
-
-			DBG_BUGON(mirrored[j]);
-			memcpy(percpu_data + j * PAGE_SIZE, dst, PAGE_SIZE);
-			mirrored[j] = true;
-			break;
-		}
-
-		if (i) {
-			if (!src)
-				src = mirrored[i - 1] ?
-					percpu_data + (i - 1) * PAGE_SIZE :
-					kmap_atomic(compressed_pages[i - 1]);
-
-			memcpy(dst, src + righthalf, pageofs);
-
-			if (!mirrored[i - 1])
-				kunmap_atomic(src);
-
-			if (unlikely(i >= clusterpages)) {
-				kunmap_atomic(dst);
-				break;
-			}
-		}
-
-		if (!righthalf) {
-			src = NULL;
-		} else {
-			src = mirrored[i] ? percpu_data + i * PAGE_SIZE :
-				kmap_atomic(compressed_pages[i]);
-
-			memcpy(dst + pageofs, src, righthalf);
-		}
-
-		kunmap_atomic(dst);
-	}
-
-	if (src && !mirrored[j])
-		kunmap_atomic(src);
-
-	erofs_put_pcpubuf(percpu_data);
-	return 0;
-}
-
-int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
-				  unsigned int clusterpages,
-				  struct page **pages,
-				  unsigned int outlen,
-				  unsigned short pageofs)
-{
-	void *vin, *vout;
-	unsigned int nr_pages, i, j;
-	int ret;
-
-	if (outlen + pageofs > EROFS_PCPUBUF_NR_PAGES * PAGE_SIZE)
-		return -ENOTSUPP;
-
-	nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE);
-
-	if (clusterpages == 1) {
-		vin = kmap_atomic(compressed_pages[0]);
-	} else {
-		vin = erofs_vmap(compressed_pages, clusterpages);
-		if (!vin)
-			return -ENOMEM;
-	}
-
-	vout = erofs_get_pcpubuf(0);
-
-	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
-				clusterpages * PAGE_SIZE, outlen);
-
-	if (ret < 0)
-		goto out;
-	ret = 0;
-
-	for (i = 0; i < nr_pages; ++i) {
-		j = min((unsigned int)PAGE_SIZE - pageofs, outlen);
-
-		if (pages[i]) {
-			if (clusterpages == 1 &&
-			    pages[i] == compressed_pages[0]) {
-				memcpy(vin + pageofs, vout + pageofs, j);
-			} else {
-				void *dst = kmap_atomic(pages[i]);
-
-				memcpy(dst + pageofs, vout + pageofs, j);
-				kunmap_atomic(dst);
-			}
-		}
-		vout += PAGE_SIZE;
-		outlen -= j;
-		pageofs = 0;
-	}
-
-out:
-	erofs_put_pcpubuf(vout);
-
-	if (clusterpages == 1)
-		kunmap_atomic(vin);
-	else
-		erofs_vunmap(vin, clusterpages);
-
-	return ret;
-}
-
-int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   void *vout,
-			   unsigned int llen,
-			   unsigned short pageofs,
-			   bool overlapped)
-{
-	void *vin;
-	unsigned int i;
-	int ret;
-
-	if (overlapped) {
-		vin = erofs_get_pcpubuf(0);
-
-		for (i = 0; i < clusterpages; ++i) {
-			void *t = kmap_atomic(compressed_pages[i]);
-
-			memcpy(vin + PAGE_SIZE * i, t, PAGE_SIZE);
-			kunmap_atomic(t);
-		}
-	} else if (clusterpages == 1) {
-		vin = kmap_atomic(compressed_pages[0]);
-	} else {
-		vin = erofs_vmap(compressed_pages, clusterpages);
-	}
-
-	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
-				clusterpages * PAGE_SIZE, llen);
-	if (ret > 0)
-		ret = 0;
-
-	if (overlapped) {
-		erofs_put_pcpubuf(vin);
-	} else {
-		if (clusterpages == 1)
-			kunmap_atomic(vin);
-		else
-			erofs_vunmap(vin, clusterpages);
-	}
-	return ret;
-}
-
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 8/8] staging: erofs: integrate decompression inplace
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)

In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

Decompressor needs to know whether it's a partial
or full decompression since only full decompression
can be decompressed in-place.

On kirin980 platform, sequential read is finally
increased to 812MiB/s after decompression inplace
is enabled.

Signed-off-by: Gao Xiang <gaoxiang25 at huawei.com>
---
 drivers/staging/erofs/internal.h  |  3 +++
 drivers/staging/erofs/unzip_vle.c | 15 +++++++++++----
 drivers/staging/erofs/unzip_vle.h |  1 +
 drivers/staging/erofs/zmap.c      |  1 +
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index 6c8767d4a1d5..963cc1b8b896 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -441,6 +441,7 @@ extern const struct address_space_operations z_erofs_vle_normalaccess_aops;
  */
 enum {
 	BH_Zipped = BH_PrivateStart,
+	BH_FullMapped,
 };
 
 /* Has a disk mapping */
@@ -449,6 +450,8 @@ enum {
 #define EROFS_MAP_META		(1 << BH_Meta)
 /* The extent has been compressed */
 #define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
+/* The length of extent is full */
+#define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
 
 struct erofs_map_blocks {
 	erofs_off_t m_pa, m_la;
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index cb870b83f3c8..316382d33783 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -469,6 +469,9 @@ z_erofs_vle_work_register(const struct z_erofs_vle_work_finder *f,
 				    Z_EROFS_VLE_WORKGRP_FMT_LZ4 :
 				    Z_EROFS_VLE_WORKGRP_FMT_PLAIN);
 
+	if (map->m_flags & EROFS_MAP_FULL_MAPPED)
+		grp->flags |= Z_EROFS_VLE_WORKGRP_FULL_LENGTH;
+
 	/* new workgrps have been claimed as type 1 */
 	WRITE_ONCE(grp->next, *f->owned_head);
 	/* primary and followed work for all new workgrps */
@@ -901,7 +904,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	unsigned int i, outputsize;
 
 	enum z_erofs_page_type page_type;
-	bool overlapped;
+	bool overlapped, partial;
 	struct z_erofs_vle_work *work;
 	int err;
 
@@ -1009,10 +1012,13 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	if (unlikely(err))
 		goto out;
 
-	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen)
+	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen) {
 		outputsize = grp->llen;
-	else
+		partial = !(grp->flags & Z_EROFS_VLE_WORKGRP_FULL_LENGTH);
+	} else {
 		outputsize = (nr_pages << PAGE_SHIFT) - work->pageofs;
+		partial = true;
+	}
 
 	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN)
 		algorithm = Z_EROFS_COMPRESSION_SHIFTED;
@@ -1028,7 +1034,8 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 					.outputsize = outputsize,
 					.alg = algorithm,
 					.inplace_io = overlapped,
-					.partial_decoding = true }, page_pool);
+					.partial_decoding = partial
+				 }, page_pool);
 
 out:
 	/* must handle all compressed pages before endding pages */
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index a2d9b60beebd..ab509d75aefd 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -46,6 +46,7 @@ struct z_erofs_vle_work {
 #define Z_EROFS_VLE_WORKGRP_FMT_PLAIN        0
 #define Z_EROFS_VLE_WORKGRP_FMT_LZ4          1
 #define Z_EROFS_VLE_WORKGRP_FMT_MASK         1
+#define Z_EROFS_VLE_WORKGRP_FULL_LENGTH      2
 
 typedef void *z_erofs_vle_owned_workgrp_t;
 
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
index 77bc49c07846..ce6d955f0112 100644
--- a/drivers/staging/erofs/zmap.c
+++ b/drivers/staging/erofs/zmap.c
@@ -424,6 +424,7 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 			goto unmap_out;
 		}
 		end = (m.lcn << lclusterbits) | m.clusterofs;
+		map->m_flags |= EROFS_MAP_FULL_MAPPED;
 		m.delta[0] = 1;
 		/* fallthrough */
 	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2 4/6] mm/memory_hotplug: Rename walk_memory_range() and pass start+size instead of pfns
From: Nathan Chancellor @ 2019-06-20 16:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, Wei Yang, linux-mm, Paul Mackerras, Rashmica Gupta,
	Dan Williams, Michael Neuling, linux-acpi, Len Brown,
	Pavel Tatashin, Anshuman Khandual, Qian Cai, Thomas Gleixner,
	Oscar Salvador, Juergen Gross, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Arun KS, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20190620103520.23481-5-david@redhat.com>

On Thu, Jun 20, 2019 at 12:35:18PM +0200, David Hildenbrand wrote:
> walk_memory_range() was once used to iterate over sections. Now, it
> iterates over memory blocks. Rename the function, fixup the
> documentation. Also, pass start+size instead of PFNs, which is what most
> callers already have at hand. (we'll rework link_mem_sections() most
> probably soon)
> 
> Follow-up patches wil rework, simplify, and move walk_memory_blocks() to
> drivers/base/memory.c.
> 
> Note: walk_memory_blocks() only works correctly right now if the
> start_pfn is aligned to a section start. This is the case right now,
> but we'll generalize the function in a follow up patch so the semantics
> match the documentation.
> 
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 22 ++++++++++-----------
>  drivers/acpi/acpi_memhotplug.c            | 19 ++++--------------
>  drivers/base/node.c                       |  5 +++--
>  include/linux/memory_hotplug.h            |  2 +-
>  mm/memory_hotplug.c                       | 24 ++++++++++++-----------
>  5 files changed, 32 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 5e53c1392d3b..8c82c041afe6 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -70,23 +70,24 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
>  /* called with device_hotplug_lock held */
>  static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>  {
> +	const unsigned long start = PFN_PHYS(start_pfn);
> +	const unsigned long size = PFN_PHYS(nr_pages);
>  	u64 end_pfn = start_pfn + nr_pages - 1;

This variable should be removed:

arch/powerpc/platforms/powernv/memtrace.c:75:6: warning: unused variable 'end_pfn' [-Wunused-variable]
        u64 end_pfn = start_pfn + nr_pages - 1;
            ^
1 warning generated.

https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/209576737

Cheers,
Nathan

>  
> -	if (walk_memory_range(start_pfn, end_pfn, NULL,
> -	    check_memblock_online))
> +	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
>  		return false;
>  
> -	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
> -			  change_memblock_state);
> +	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
> +			   change_memblock_state);
>  
>  	if (offline_pages(start_pfn, nr_pages)) {
> -		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
> -				  change_memblock_state);
> +		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
> +				   change_memblock_state);
>  		return false;
>  	}
>  
> -	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
> -			  change_memblock_state);
> +	walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
> +			   change_memblock_state);
>  
>  
>  	return true;
> @@ -242,9 +243,8 @@ static int memtrace_online(void)
>  		 */
>  		if (!memhp_auto_online) {
>  			lock_device_hotplug();
> -			walk_memory_range(PFN_DOWN(ent->start),
> -					  PFN_UP(ent->start + ent->size - 1),
> -					  NULL, online_mem_block);
> +			walk_memory_blocks(ent->start, ent->size, NULL,
> +					   online_mem_block);
>  			unlock_device_hotplug();
>  		}
>  
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index db013dc21c02..e294f44a7850 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -155,16 +155,6 @@ static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
>  	return 0;
>  }
>  
> -static unsigned long acpi_meminfo_start_pfn(struct acpi_memory_info *info)
> -{
> -	return PFN_DOWN(info->start_addr);
> -}
> -
> -static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
> -{
> -	return PFN_UP(info->start_addr + info->length-1);
> -}
> -
>  static int acpi_bind_memblk(struct memory_block *mem, void *arg)
>  {
>  	return acpi_bind_one(&mem->dev, arg);
> @@ -173,9 +163,8 @@ static int acpi_bind_memblk(struct memory_block *mem, void *arg)
>  static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
>  				   struct acpi_device *adev)
>  {
> -	return walk_memory_range(acpi_meminfo_start_pfn(info),
> -				 acpi_meminfo_end_pfn(info), adev,
> -				 acpi_bind_memblk);
> +	return walk_memory_blocks(info->start_addr, info->length, adev,
> +				  acpi_bind_memblk);
>  }
>  
>  static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
> @@ -186,8 +175,8 @@ static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
>  
>  static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
>  {
> -	walk_memory_range(acpi_meminfo_start_pfn(info),
> -			  acpi_meminfo_end_pfn(info), NULL, acpi_unbind_memblk);
> +	walk_memory_blocks(info->start_addr, info->length, NULL,
> +			   acpi_unbind_memblk);
>  }
>  
>  static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index e6364e3e3e31..d8c02e65df68 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -833,8 +833,9 @@ void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  
>  int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
>  {
> -	return walk_memory_range(start_pfn, end_pfn, (void *)&nid,
> -					register_mem_sect_under_node);
> +	return walk_memory_blocks(PFN_PHYS(start_pfn),
> +				  PFN_PHYS(end_pfn - start_pfn), (void *)&nid,
> +				  register_mem_sect_under_node);
>  }
>  
>  #ifdef CONFIG_HUGETLBFS
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 79e0add6a597..d9fffc34949f 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -340,7 +340,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  extern void __ref free_area_init_core_hotplug(int nid);
> -extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> +extern int walk_memory_blocks(unsigned long start, unsigned long size,
>  		void *arg, int (*func)(struct memory_block *, void *));
>  extern int __add_memory(int nid, u64 start, u64 size);
>  extern int add_memory(int nid, u64 start, u64 size);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index a88c5f334e5a..122a7d31efdd 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1126,8 +1126,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  
>  	/* online pages if requested */
>  	if (memhp_auto_online)
> -		walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1),
> -				  NULL, online_memory_block);
> +		walk_memory_blocks(start, size, NULL, online_memory_block);
>  
>  	return ret;
>  error:
> @@ -1665,20 +1664,24 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  /**
> - * walk_memory_range - walks through all mem sections in [start_pfn, end_pfn)
> - * @start_pfn: start pfn of the memory range
> - * @end_pfn: end pfn of the memory range
> + * walk_memory_blocks - walk through all present memory blocks overlapped
> + *			by the range [start, start + size)
> + *
> + * @start: start address of the memory range
> + * @size: size of the memory range
>   * @arg: argument passed to func
> - * @func: callback for each memory section walked
> + * @func: callback for each memory block walked
>   *
> - * This function walks through all present mem sections in range
> - * [start_pfn, end_pfn) and call func on each mem section.
> + * This function walks through all present memory blocks overlapped by the
> + * range [start, start + size), calling func on each memory block.
>   *
>   * Returns the return value of func.
>   */
> -int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> +int walk_memory_blocks(unsigned long start, unsigned long size,
>  		void *arg, int (*func)(struct memory_block *, void *))
>  {
> +	const unsigned long start_pfn = PFN_DOWN(start);
> +	const unsigned long end_pfn = PFN_UP(start + size - 1);
>  	struct memory_block *mem = NULL;
>  	struct mem_section *section;
>  	unsigned long pfn, section_nr;
> @@ -1824,8 +1827,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	 * whether all memory blocks in question are offline and return error
>  	 * if this is not the case.
>  	 */
> -	rc = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> -			       check_memblock_offlined_cb);
> +	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
>  	if (rc)
>  		goto done;
>  
> -- 
> 2.21.0
> 

^ permalink raw reply

* Re: [PATCH] ASoC: rk3399_gru_sound: Support 32, 44.1 and 88.2 kHz sample rates
From: Enric Balletbo Serra @ 2019-06-20 16:08 UTC (permalink / raw)
  To: Mark Brown
  Cc: Enric Balletbo i Serra, alsa-devel, Heiko Stuebner, Xing Zheng,
	Liam Girdwood, Takashi Iwai, linux-kernel, Jaroslav Kysela,
	open list:ARM/Rockchip SoC..., Collabora Kernel ML, Benson Leung,
	Linux ARM
In-Reply-To: <20190620154150.GE5316@sirena.org.uk>

Hi Mark,

Missatge de Mark Brown <broonie@kernel.org> del dia dj., 20 de juny
2019 a les 17:42:
>
> On Thu, Jun 20, 2019 at 03:47:08PM +0200, Enric Balletbo i Serra wrote:
> > According to the datasheet the max98357a also supports 32, 44.1 and
> > 88.2 kHz sample rate. This support was also introduced recently by
> > commit fdf34366d324 ("ASoC: max98357a: add missing supported rates").
> > This patch adds support for these rates also for the machine driver so
> > we get rid of the errors like the below and we are able to play files
> > using these sample rates.
>
> Does the machine actually need to validate this at all?  The component
> drivers can all apply whatever constraints are needed and do their own
> validation, the machine driver is just getting in the way here.

I think you have reason, I'll test by removing these validation and
respin the patch.

Thanks,
~ Enric

> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply

* [PATCH] mm: fix regression with deferred struct page init
From: Juergen Gross @ 2019-06-20 16:08 UTC (permalink / raw)
  To: xen-devel, linux-mm, linux-kernel; +Cc: Juergen Gross, Alexander Duyck

Commit 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time
instead of doing larger sections") is causing a regression on some
systems when the kernel is booted as Xen dom0.

The system will just hang in early boot.

Reason is an endless loop in get_page_from_freelist() in case the first
zone looked at has no free memory. deferred_grow_zone() is always
returning true due to the following code snipplet:

  /* If the zone is empty somebody else may have cleared out the zone */
  if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
                                           first_deferred_pfn)) {
          pgdat->first_deferred_pfn = ULONG_MAX;
          pgdat_resize_unlock(pgdat, &flags);
          return true;
  }

This in turn results in the loop as get_page_from_freelist() is
assuming forward progress can be made by doing some more struct page
initialization.

Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Fixes: 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..8e3bc949ebcc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1826,7 +1826,8 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
 						 first_deferred_pfn)) {
 		pgdat->first_deferred_pfn = ULONG_MAX;
 		pgdat_resize_unlock(pgdat, &flags);
-		return true;
+		/* Retry only once. */
+		return first_deferred_pfn != ULONG_MAX;
 	}
 
 	/*
-- 
2.16.4


^ permalink raw reply related

* [U-Boot] [PATCH] fastboot: Remove "bootloader-version" variable
From: Sam Protsenko @ 2019-06-20 16:08 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <CAByghJZvbGKbz8m2-nzGNXoauOSbmBQ7X+zn2xYf_GqnPiZ9Eg@mail.gmail.com>

Hi Igor,

On Thu, Jun 20, 2019 at 5:55 PM Igor Opaniuk <igor.opaniuk@gmail.com> wrote:
>
> Hi Sam,
>
> On Thu, Jun 20, 2019 at 5:00 PM Sam Protsenko
> <semen.protsenko@linaro.org> wrote:
> >
> > As per [1], there is no such fastboot variable as "bootloader-version".
> > Only "version-bootloader" is supported. Let's reflect this and not
> > confuse users further.
> >
> > [1] https://android.googlesource.com/platform/system/core/+/master/fastboot/README.md#client-variables
> >
> > Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
> > ---
> >  doc/README.android-fastboot  | 4 ++--
> >  drivers/fastboot/fb_getvar.c | 9 +++------
> >  2 files changed, 5 insertions(+), 8 deletions(-)
> >
> > diff --git a/doc/README.android-fastboot b/doc/README.android-fastboot
> > index 431191c473..ce852a4fd1 100644
> > --- a/doc/README.android-fastboot
> > +++ b/doc/README.android-fastboot
> > @@ -169,8 +169,8 @@ On the client side you can fetch the bootloader version for instance:
> >
> >  ::
> >
> > -   $ fastboot getvar bootloader-version
> > -   bootloader-version: U-Boot 2014.04-00005-gd24cabc
> > +   $ fastboot getvar version-bootloader
> > +   version-bootloader: U-Boot 2014.04-00005-gd24cabc
> >     finished. total time: 0.000s
> >
> >  or initiate a reboot:
> > diff --git a/drivers/fastboot/fb_getvar.c b/drivers/fastboot/fb_getvar.c
> > index fd0823b2bf..ebe5c8a104 100644
> > --- a/drivers/fastboot/fb_getvar.c
> > +++ b/drivers/fastboot/fb_getvar.c
> > @@ -12,7 +12,7 @@
> >  #include <version.h>
> >
> >  static void getvar_version(char *var_parameter, char *response);
> > -static void getvar_bootloader_version(char *var_parameter, char *response);
> > +static void getvar_version_bootloader(char *var_parameter, char *response);
> >  static void getvar_downloadsize(char *var_parameter, char *response);
> >  static void getvar_serialno(char *var_parameter, char *response);
> >  static void getvar_version_baseband(char *var_parameter, char *response);
> > @@ -37,12 +37,9 @@ static const struct {
> >         {
> >                 .variable = "version",
> >                 .dispatch = getvar_version
> > -       }, {
> > -               .variable = "bootloader-version",
> > -               .dispatch = getvar_bootloader_version
> >         }, {
> >                 .variable = "version-bootloader",
> > -               .dispatch = getvar_bootloader_version
> > +               .dispatch = getvar_version_bootloader
> >         }, {
> >                 .variable = "downloadsize",
> >                 .dispatch = getvar_downloadsize
> > @@ -131,7 +128,7 @@ static void getvar_version(char *var_parameter, char *response)
> >         fastboot_okay(FASTBOOT_VERSION, response);
> >  }
> >
> > -static void getvar_bootloader_version(char *var_parameter, char *response)
> > +static void getvar_version_bootloader(char *var_parameter, char *response)
> >  {
> >         fastboot_okay(U_BOOT_VERSION, response);
> >  }
> > --
> > 2.20.1
> >
>
> My two cents here,
>
> Based on the commit messages from "git log --grep=bootloader-version"
> people prefer to use "bootloader-version" instead of
> "version-bootloader", and totally removing it will probably affect
> usual workflow with fastboot (probably someone will be suprised that
> "bootloader-version" doesn't work anymore), including some CI automate
> testing etc (if there is any);
>

We need to decide what has more value in this particular case:
  1. Keeping protocol clean, correct and up-to-date
  2. Supporting all erroneous choices we've done before

If we follow golden rule of kernel, (2) is proffered. But I don't
think in this particular case a lot of harm will be done. So from my
POV (1) is preferred. Otherwise we can clutter the protocol, causing
some confusion.

You can check fastboot project in AOSP [1]

    $ git log -S'bootloader-version' -- fastboot/

No occurrences, ever. AOSP is unaware we have 'bootloader-version'
variable in U-Boot, but AOSP defines 'version-bootloader' variable,
for the same stuff. I would really prefer we avoid using some weird
undocumented stuff, and I think in this particular case the
cleanliness of protocol overrules golden rule of kernel development,
as it seems relatively easy to fix that command (matter of one sed
execution).

[1] https://android.googlesource.com/platform/system/core/+/master/

> I think I does make sense to involve all of them to this discussion
> also (already added to CC).
>
> --
> Best regards - Freundliche Grüsse - Meilleures salutations
>
> Igor Opaniuk
>
> mailto: igor.opaniuk at gmail.com
> skype: igor.opanyuk
> +380 (93) 836 40 67
> http://ua.linkedin.com/in/iopaniuk

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.