All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/28] Removing struct page from P2PDMA
From: Logan Gunthorpe @ 2019-06-20 16:12 UTC (permalink / raw)


For eons there has been a debate over whether or not to use
struct pages for peer-to-peer DMA transactions. Pro-pagers have
argued that struct pages are necessary for interacting with
existing code like scatterlists or the bio_vecs. Anti-pagers
assert that the tracking of the memory is unecessary and
allocating the pages is a waste of memory. Both viewpoints are
valid, however developers working on GPUs and RDMA tend to be
able to do away with struct pages relatively easily compared to
those wanting to work with NVMe devices through the block layer.
So it would be of great value to be able to universally do P2PDMA
transactions without the use of struct pages.

Previously, there have been multiple attempts[1][2] to replace
struct page usage with pfn_t but this has been unpopular seeing
it creates dangerous edge cases where unsuspecting code might
run accross pfn_t's they are not ready for.

Currently, we have P2PDMA using struct pages through the block layer
and the dangerous cases are avoided by using a queue flag that
indicates support for the special pages.

This RFC proposes a new solution: allow the block layer to take
DMA addresses directly for queues that indicate support. This will
provide a more general path for doing P2PDMA-like requests and will
allow us to remove the struct pages that back P2PDMA memory thus paving
the way to build a more uniform P2PDMA ecosystem.

This is a fairly long patch set but most of the patches are quite
small. Patches 1 through 18 introduce the concept of a dma_vec that
is similar to a bio_vec (except it takes dma_addr_t's instead of pages
and offsets) as well as a special dma-direct bio/request. Most of these
patches just prevent the new type of bio from being mis-used and
also support splitting and mapping them in the same way that struct
page bios can be operated on. Patches 19 through 22 modify the existing
P2PDMA support in nvme-pci, ib-core and nvmet to use DMA addresses
directly. Patches 23 through 25 remove the P2PDMA specific
code from the block layer and ib-core. Finally, patches 26 through 28
remove the struct pages from the PCI P2PDMA code.

This RFC is based on v5.2-rc5 and a git branch is available here:

https://github.com/sbates130272/linux-p2pmem.git dma_direct_rfc1

[1] https://lwn.net/Articles/647404/
[2] https://lore.kernel.org/lkml/1495662147-18277-1-git-send-email-logang at deltatee.com/

--

Logan Gunthorpe (28):
  block: Introduce DMA direct request type
  block: Add dma_vec structure
  block: Warn on mis-use of dma-direct bios
  block: Never bounce dma-direct bios
  block: Skip dma-direct bios in bio_integrity_prep()
  block: Support dma-direct bios in bio_advance_iter()
  block: Use dma_vec length in bio_cur_bytes() for dma-direct bios
  block: Introduce dmavec_phys_mergeable()
  block: Introduce vec_gap_to_prev()
  block: Create generic vec_split_segs() from bvec_split_segs()
  block: Create blk_segment_split_ctx
  block: Create helper for bvec_should_split()
  block: Generalize bvec_should_split()
  block: Support splitting dma-direct bios
  block: Support counting dma-direct bio segments
  block: Implement mapping dma-direct requests to SGs in blk_rq_map_sg()
  block: Introduce queue flag to indicate support for dma-direct bios
  block: Introduce bio_add_dma_addr()
  nvme-pci: Support dma-direct bios
  IB/core: Introduce API for initializing a RW ctx from a DMA address
  nvmet: Split nvmet_bdev_execute_rw() into a helper function
  nvmet: Use DMA addresses instead of struct pages for P2P
  nvme-pci: Remove support for PCI_P2PDMA requests
  block: Remove PCI_P2PDMA queue flag
  IB/core: Remove P2PDMA mapping support in rdma_rw_ctx
  PCI/P2PDMA: Remove SGL helpers
  PCI/P2PDMA: Remove struct pages that back P2PDMA memory
  memremap: Remove PCI P2PDMA page memory type

 Documentation/driver-api/pci/p2pdma.rst |   9 +-
 block/bio-integrity.c                   |   4 +
 block/bio.c                             |  71 +++++++
 block/blk-core.c                        |   3 +
 block/blk-merge.c                       | 256 ++++++++++++++++++------
 block/blk.h                             |  49 ++++-
 block/bounce.c                          |   8 +
 drivers/infiniband/core/rw.c            |  85 ++++++--
 drivers/nvme/host/core.c                |   4 +-
 drivers/nvme/host/nvme.h                |   2 +-
 drivers/nvme/host/pci.c                 |  29 ++-
 drivers/nvme/target/core.c              |  12 +-
 drivers/nvme/target/io-cmd-bdev.c       |  82 +++++---
 drivers/nvme/target/nvmet.h             |   5 +-
 drivers/nvme/target/rdma.c              |  43 +++-
 drivers/pci/p2pdma.c                    | 202 +++----------------
 include/linux/bio.h                     |  32 ++-
 include/linux/blk_types.h               |  14 +-
 include/linux/blkdev.h                  |  16 +-
 include/linux/bvec.h                    |  43 ++++
 include/linux/memremap.h                |   5 -
 include/linux/mm.h                      |  13 --
 include/linux/pci-p2pdma.h              |  19 --
 include/rdma/rw.h                       |   6 +
 24 files changed, 648 insertions(+), 364 deletions(-)

--
2.20.1

^ permalink raw reply

* Re: CKI hackfest @Plumbers invite
From: Veronika Kabatova @ 2019-06-20 16:11 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci, automated-testing, info, Tim Bird, syzkaller, lkp,
	stable, Laura Abbott, Eliska Slobodova, CKI Project
In-Reply-To: <CAH1_8nAx-1+uqOwAOCfGbqdWzgWD1-oikAfoVBqw4qPcu8v4fw@mail.gmail.com>



----- Original Message -----
> From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
> To: kernelci@groups.io, vkabatov@redhat.com
> Cc: automated-testing@yoctoproject.org, info@kernelci.org, "Tim Bird" <Tim.Bird@sony.com>, khilamn@baylibre.org,
> syzkaller@googlegroups.com, lkp@lists.01.org, stable@vger.kernel.org, "Laura Abbott" <labbott@redhat.com>, "Eliska
> Slobodova" <eslobodo@redhat.com>, "CKI Project" <cki-project@redhat.com>
> Sent: Thursday, June 20, 2019 5:42:11 PM
> Subject: Re: CKI hackfest @Plumbers invite
> 
> Hi Veronika,
> 
> On Tue, May 21, 2019 at 3:55 PM Veronika Kabatova <vkabatov@redhat.com>
> wrote:
> 
> > Hi,
> >
> > as some of you have heard, CKI Project is planning hackfest CI meetings
> > after
> > Plumbers conference this year (Sept. 12-13). We would like to invite
> > everyone
> > who has interest in CI for kernel to come and join us.
> >
> > The early agenda with summary is at the end of the email. If you think
> > there's
> > something important missing let us know! Also let us know in case you'd
> > want to
> > lead any of the sessions, we'd be happy to delegate out some work :)
> >
> >
> > Please send us an email as soon as you decide to come and feel free to
> > invite
> > other people who should be present. We are not planning to cap the
> > attendance
> > right now but need to solve the logistics based on the interest. The event
> > is
> > free to attend, no additional registration except letting us know is
> > needed.
> >
> 
> Please do count me in as well!
> 

\o/

> One topic I would like to add to the agenda is:
> 
> - Open testing philosophy
>   - Connecting components from different origins: builders, test
>     labs, databases, dashboards...
>   - Interoperability: documented remote APIs to let components
>     talk to each other
>   - kernelci.org already does this with distributed builds and
>     test labs, it would be good to apply the same principles to
>     to other existing systems doing upstream kernel testing for
>     everyone's benefit
>   - Optimal utilisation of available resources
>   - Enable more high-level features by joining
>     forces (bisections, cross-referencing of results, bug
>     tracking...)
> 
> This does have some commonality with "Common hardware pools"
> and "Avoiding effort duplication" but I think it makes sense to
> keep it together as a general approach.
> 

I agree that this topic is important (and I believe some other CKI people
made that clear as well) so I added it to the agenda topics. The list of
those is getting long so we'd definitely need to curate it properly soon
but I'll make sure this stays there.


Thanks for the interest!
Veronika

> Thanks,
> Guillaume
> 
> Feel free to contact us if you have any questions,
> > Veronika
> > CKI Project
> >
> >
> > -----------------------------------------------------------
> > Here is an early agenda we put together:
> > - Introductions
> > - Common place for upstream results, result publishing in general
> >   - The discussion on the mailing list is going strong so we might be able
> > to
> >     substitute this session for a different one in case everything is
> > solved by
> >     September.
> > - Test result interpretation and bug detection
> >   - How to autodetect infrastructure failures, regressions/new bugs and
> > test
> >     bugs? How to handle continuous failures due to known bugs in both
> > tests and
> >     kernel? What's your solution? Can people always trust the results they
> >     receive?
> > - Getting results to developers/maintainers
> >   - Aimed at kernel developers and maintainers, share your feedback and
> >     expectations.
> >   - How much data should be sent in the initial communication vs. a click
> > away
> >     in a dashboard? Do you want incremental emails with new results as
> > they come
> >     in?
> >   - What about adding checks to tested patches in Patchwork when patch
> > series
> >     are being tested?
> >   - Providing enough data/script to reproduce the failure. What if special
> > HW
> >     is needed?
> > - Onboarding new kernel trees to test
> >   - Aimed at kernel developers and maintainers.
> >   - Which trees are most prone to bring in new problems? Which are the most
> >     critical ones? Do you want them to be tested? Which tests do you feel
> > are
> >     most beneficial for specific trees or in general?
> > - Security when testing untrusted patches
> >   - How do we merge, compile, and test patches that have untrusted code in
> > them
> >     and have not yet been reviewed? How do we avoid abuse of systems,
> >     information theft, or other damage?
> >   - Check out the original patch that sparked the discussion at
> >     https://patchwork.ozlabs.org/patch/862123/
> > - Avoiding effort duplication
> >   - Food for thought by GregKH
> >   - X different CI systems running ${TEST} on latest stable kernel on
> > x86_64
> >     might look useless on the first look but is it? AMD/Intel CPUs,
> > different
> >     network cards, different graphic drivers, compilers, kernel
> > configuration...
> >     How do we distribute the workload to avoid doing the same thing all
> > over
> >     again while still running in enough different environments to get the
> > most
> >     coverage?
> > - Common hardware pools
> >   - Is this something people are interested in? Would be helpful
> > especially for
> >     HW that's hard to access, eg. ppc64le or s390x systems. Companies
> > could also
> >     sing up to share their HW for testing to ensure kernel works with their
> >     products.
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> >
> > View/Reply Online (#404): https://groups.io/g/kernelci/message/404
> > Mute This Topic: https://groups.io/mt/31697554/924702
> > Group Owner: kernelci+owner@groups.io
> > Unsubscribe: https://groups.io/g/kernelci/unsub  [
> > guillaume.tucker@gmail.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >
> 

^ permalink raw reply

* Re: CKI hackfest @Plumbers invite
From: Veronika Kabatova @ 2019-06-20 16:11 UTC (permalink / raw)
  To: lkp
In-Reply-To: <CAH1_8nAx-1+uqOwAOCfGbqdWzgWD1-oikAfoVBqw4qPcu8v4fw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6144 bytes --]



----- Original Message -----
> From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
> To: kernelci(a)groups.io, vkabatov(a)redhat.com
> Cc: automated-testing(a)yoctoproject.org, info(a)kernelci.org, "Tim Bird" <Tim.Bird@sony.com>, khilamn(a)baylibre.org,
> syzkaller(a)googlegroups.com, lkp(a)lists.01.org, stable(a)vger.kernel.org, "Laura Abbott" <labbott@redhat.com>, "Eliska
> Slobodova" <eslobodo@redhat.com>, "CKI Project" <cki-project@redhat.com>
> Sent: Thursday, June 20, 2019 5:42:11 PM
> Subject: Re: CKI hackfest @Plumbers invite
> 
> Hi Veronika,
> 
> On Tue, May 21, 2019 at 3:55 PM Veronika Kabatova <vkabatov@redhat.com>
> wrote:
> 
> > Hi,
> >
> > as some of you have heard, CKI Project is planning hackfest CI meetings
> > after
> > Plumbers conference this year (Sept. 12-13). We would like to invite
> > everyone
> > who has interest in CI for kernel to come and join us.
> >
> > The early agenda with summary is at the end of the email. If you think
> > there's
> > something important missing let us know! Also let us know in case you'd
> > want to
> > lead any of the sessions, we'd be happy to delegate out some work :)
> >
> >
> > Please send us an email as soon as you decide to come and feel free to
> > invite
> > other people who should be present. We are not planning to cap the
> > attendance
> > right now but need to solve the logistics based on the interest. The event
> > is
> > free to attend, no additional registration except letting us know is
> > needed.
> >
> 
> Please do count me in as well!
> 

\o/

> One topic I would like to add to the agenda is:
> 
> - Open testing philosophy
>   - Connecting components from different origins: builders, test
>     labs, databases, dashboards...
>   - Interoperability: documented remote APIs to let components
>     talk to each other
>   - kernelci.org already does this with distributed builds and
>     test labs, it would be good to apply the same principles to
>     to other existing systems doing upstream kernel testing for
>     everyone's benefit
>   - Optimal utilisation of available resources
>   - Enable more high-level features by joining
>     forces (bisections, cross-referencing of results, bug
>     tracking...)
> 
> This does have some commonality with "Common hardware pools"
> and "Avoiding effort duplication" but I think it makes sense to
> keep it together as a general approach.
> 

I agree that this topic is important (and I believe some other CKI people
made that clear as well) so I added it to the agenda topics. The list of
those is getting long so we'd definitely need to curate it properly soon
but I'll make sure this stays there.


Thanks for the interest!
Veronika

> Thanks,
> Guillaume
> 
> Feel free to contact us if you have any questions,
> > Veronika
> > CKI Project
> >
> >
> > -----------------------------------------------------------
> > Here is an early agenda we put together:
> > - Introductions
> > - Common place for upstream results, result publishing in general
> >   - The discussion on the mailing list is going strong so we might be able
> > to
> >     substitute this session for a different one in case everything is
> > solved by
> >     September.
> > - Test result interpretation and bug detection
> >   - How to autodetect infrastructure failures, regressions/new bugs and
> > test
> >     bugs? How to handle continuous failures due to known bugs in both
> > tests and
> >     kernel? What's your solution? Can people always trust the results they
> >     receive?
> > - Getting results to developers/maintainers
> >   - Aimed at kernel developers and maintainers, share your feedback and
> >     expectations.
> >   - How much data should be sent in the initial communication vs. a click
> > away
> >     in a dashboard? Do you want incremental emails with new results as
> > they come
> >     in?
> >   - What about adding checks to tested patches in Patchwork when patch
> > series
> >     are being tested?
> >   - Providing enough data/script to reproduce the failure. What if special
> > HW
> >     is needed?
> > - Onboarding new kernel trees to test
> >   - Aimed at kernel developers and maintainers.
> >   - Which trees are most prone to bring in new problems? Which are the most
> >     critical ones? Do you want them to be tested? Which tests do you feel
> > are
> >     most beneficial for specific trees or in general?
> > - Security when testing untrusted patches
> >   - How do we merge, compile, and test patches that have untrusted code in
> > them
> >     and have not yet been reviewed? How do we avoid abuse of systems,
> >     information theft, or other damage?
> >   - Check out the original patch that sparked the discussion at
> >     https://patchwork.ozlabs.org/patch/862123/
> > - Avoiding effort duplication
> >   - Food for thought by GregKH
> >   - X different CI systems running ${TEST} on latest stable kernel on
> > x86_64
> >     might look useless on the first look but is it? AMD/Intel CPUs,
> > different
> >     network cards, different graphic drivers, compilers, kernel
> > configuration...
> >     How do we distribute the workload to avoid doing the same thing all
> > over
> >     again while still running in enough different environments to get the
> > most
> >     coverage?
> > - Common hardware pools
> >   - Is this something people are interested in? Would be helpful
> > especially for
> >     HW that's hard to access, eg. ppc64le or s390x systems. Companies
> > could also
> >     sing up to share their HW for testing to ensure kernel works with their
> >     products.
> >
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> >
> > View/Reply Online (#404): https://groups.io/g/kernelci/message/404
> > Mute This Topic: https://groups.io/mt/31697554/924702
> > Group Owner: kernelci+owner(a)groups.io
> > Unsubscribe: https://groups.io/g/kernelci/unsub  [
> > guillaume.tucker(a)gmail.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >
> >
> 

^ permalink raw reply

* Re: CKI hackfest @Plumbers invite
From: Veronika Kabatova @ 2019-06-20 16:11 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci, automated-testing, info, Tim Bird, syzkaller, lkp,
	stable, Laura Abbott, Eliska Slobodova, CKI Project
In-Reply-To: <CAH1_8nAx-1+uqOwAOCfGbqdWzgWD1-oikAfoVBqw4qPcu8v4fw@mail.gmail.com>



----- Original Message -----
> From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
> To: kernelci@groups.io, vkabatov@redhat.com
> Cc: automated-testing@yoctoproject.org, info@kernelci.org, "Tim Bird" <Tim.Bird@sony.com>, khilamn@baylibre.org,
> syzkaller@googlegroups.com, lkp@lists.01.org, stable@vger.kernel.org, "Laura Abbott" <labbott@redhat.com>, "Eliska
> Slobodova" <eslobodo@redhat.com>, "CKI Project" <cki-project@redhat.com>
> Sent: Thursday, June 20, 2019 5:42:11 PM
> Subject: Re: CKI hackfest @Plumbers invite
> 
> Hi Veronika,
> 
> On Tue, May 21, 2019 at 3:55 PM Veronika Kabatova <vkabatov@redhat.com>
> wrote:
> 
> > Hi,
> >
> > as some of you have heard, CKI Project is planning hackfest CI meetings
> > after
> > Plumbers conference this year (Sept. 12-13). We would like to invite
> > everyone
> > who has interest in CI for kernel to come and join us.
> >
> > The early agenda with summary is at the end of the email. If you think
> > there's
> > something important missing let us know! Also let us know in case you'd
> > want to
> > lead any of the sessions, we'd be happy to delegate out some work :)
> >
> >
> > Please send us an email as soon as you decide to come and feel free to
> > invite
> > other people who should be present. We are not planning to cap the
> > attendance
> > right now but need to solve the logistics based on the interest. The event
> > is
> > free to attend, no additional registration except letting us know is
> > needed.
> >
> 
> Please do count me in as well!
> 

\o/

> One topic I would like to add to the agenda is:
> 
> - Open testing philosophy
>   - Connecting components from different origins: builders, test
>     labs, databases, dashboards...
>   - Interoperability: documented remote APIs to let components
>     talk to each other
>   - kernelci.org already does this with distributed builds and
>     test labs, it would be good to apply the same principles to
>     to other existing systems doing upstream kernel testing for
>     everyone's benefit
>   - Optimal utilisation of available resources
>   - Enable more high-level features by joining
>     forces (bisections, cross-referencing of results, bug
>     tracking...)
> 
> This does have some commonality with "Common hardware pools"
> and "Avoiding effort duplication" but I think it makes sense to
> keep it together as a general approach.
> 

I agree that this topic is important (and I believe some other CKI people
made that clear as well) so I added it to the agenda topics. The list of
those is getting long so we'd definitely need to curate it properly soon
but I'll make sure this stays there.


Thanks for the interest!
Veronika

> Thanks,
> Guillaume
> 
> Feel free to contact us if you have any questions,
> > Veronika
> > CKI Project
> >
> >
> > -----------------------------------------------------------
> > Here is an early agenda we put together:
> > - Introductions
> > - Common place for upstream results, result publishing in general
> >   - The discussion on the mailing list is going strong so we might be able
> > to
> >     substitute this session for a different one in case everything is
> > solved by
> >     September.
> > - Test result interpretation and bug detection
> >   - How to autodetect infrastructure failures, regressions/new bugs and
> > test
> >     bugs? How to handle continuous failures due to known bugs in both
> > tests and
> >     kernel? What's your solution? Can people always trust the results they
> >     receive?
> > - Getting results to developers/maintainers
> >   - Aimed at kernel developers and maintainers, share your feedback and
> >     expectations.
> >   - How much data should be sent in the initial communication vs. a click
> > away
> >     in a dashboard? Do you want incremental emails with new results as
> > they come
> >     in?
> >   - What about adding checks to tested patches in Patchwork when patch
> > series
> >     are being tested?
> >   - Providing enough data/script to reproduce the failure. What if special
> > HW
> >     is needed?
> > - Onboarding new kernel trees to test
> >   - Aimed at kernel developers and maintainers.
> >   - Which trees are most prone to bring in new problems? Which are the most
> >     critical ones? Do you want them to be tested? Which tests do you feel
> > are
> >     most beneficial for specific trees or in general?
> > - Security when testing untrusted patches
> >   - How do we merge, compile, and test patches that have untrusted code in
> > them
> >     and have not yet been reviewed? How do we avoid abuse of systems,
> >     information theft, or other damage?
> >   - Check out the original patch that sparked the discussion at
> >     https://patchwork.ozlabs.org/patch/862123/
> > - Avoiding effort duplication
> >   - Food for thought by GregKH
> >   - X different CI systems running ${TEST} on latest stable kernel on
> > x86_64
> >     might look useless on the first look but is it? AMD/Intel CPUs,
> > different
> >     network cards, different graphic drivers, compilers, kernel
> > configuration...
> >     How do we distribute the workload to avoid doing the same thing all
> > over
> >     again while still running in enough different environments to get the
> > most
> >     coverage?
> > - Common hardware pools
> >   - Is this something people are interested in? Would be helpful
> > especially for
> >     HW that's hard to access, eg. ppc64le or s390x systems. Companies
> > could also
> >     sing up to share their HW for testing to ensure kernel works with their
> >     products.
> >
> > 
> >
> >
> 

^ permalink raw reply

* Re: [Xen-devel] [PATCH] mm: fix regression with deferred struct page init
From: Alexander Duyck @ 2019-06-20 16:10 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-mm, linux-kernel
In-Reply-To: <20190620160821.4210-1-jgross@suse.com>

On Thu, 2019-06-20 at 18:08 +0200, Juergen Gross wrote:
> Commit 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time
> instead of doing larger sections") is causing a regression on some
> systems when the kernel is booted as Xen dom0.
> 
> The system will just hang in early boot.
> 
> Reason is an endless loop in get_page_from_freelist() in case the first
> zone looked at has no free memory. deferred_grow_zone() is always
> returning true due to the following code snipplet:
> 
>   /* If the zone is empty somebody else may have cleared out the zone */
>   if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
>                                            first_deferred_pfn)) {
>           pgdat->first_deferred_pfn = ULONG_MAX;
>           pgdat_resize_unlock(pgdat, &flags);
>           return true;
>   }
> 
> This in turn results in the loop as get_page_from_freelist() is
> assuming forward progress can be made by doing some more struct page
> initialization.
> 
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Fixes: 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>

> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..8e3bc949ebcc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1826,7 +1826,8 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
>  						 first_deferred_pfn)) {
>  		pgdat->first_deferred_pfn = ULONG_MAX;
>  		pgdat_resize_unlock(pgdat, &flags);
> -		return true;
> +		/* Retry only once. */
> +		return first_deferred_pfn != ULONG_MAX;
>  	}
>  
>  	/*



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply

* Re: [PATCH] drm: Dump mode picture aspect ratio
From: Ville Syrjälä @ 2019-06-20 16:11 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: Intel Graphics Development, dri-devel
In-Reply-To: <CAKb7Uvi7krst3nO+CcQXJZu4fJK4zvBXh3-t-KYuz85iNK9f-Q@mail.gmail.com>

On Thu, Jun 20, 2019 at 11:59:37AM -0400, Ilia Mirkin wrote:
> On Thu, Jun 20, 2019 at 11:46 AM Ville Syrjala
> <ville.syrjala@linux.intel.com> wrote:
> >
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > Currently the logs show nothing about the mode's aspect ratio.
> > Include that information in the normal mode dump.
> >
> > Cc: Ilia Mirkin <imirkin@alum.mit.edu>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/video/hdmi.c    | 3 ++-
> >  include/drm/drm_modes.h | 4 ++--
> >  include/linux/hdmi.h    | 3 +++
> >  3 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/video/hdmi.c b/drivers/video/hdmi.c
> > index b939bc28d886..bc593fe1c268 100644
> > --- a/drivers/video/hdmi.c
> > +++ b/drivers/video/hdmi.c
> > @@ -1057,7 +1057,7 @@ static const char *hdmi_colorimetry_get_name(enum hdmi_colorimetry colorimetry)
> >         return "Invalid";
> >  }
> >
> > -static const char *
> > +const char *
> >  hdmi_picture_aspect_get_name(enum hdmi_picture_aspect picture_aspect)
> >  {
> >         switch (picture_aspect) {
> > @@ -1076,6 +1076,7 @@ hdmi_picture_aspect_get_name(enum hdmi_picture_aspect picture_aspect)
> >         }
> >         return "Invalid";
> >  }
> > +EXPORT_SYMBOL(hdmi_picture_aspect_get_name);
> 
> So this will return "No Data" most of the time (since the DRM_CAP
> won't be enabled)? This will look awkward, esp since the person seeing
> this print will have no idea what "No Data" is referring to.

I suppose we could do 
picture_aspet_ratio ? hdmi_picture_aspect_get_name(picture_aspet_ratio) : ""

> 
> >
> >  static const char *
> >  hdmi_active_aspect_get_name(enum hdmi_active_aspect active_aspect)
> > diff --git a/include/drm/drm_modes.h b/include/drm/drm_modes.h
> > index 083f16747369..2b1809c74fbe 100644
> > --- a/include/drm/drm_modes.h
> > +++ b/include/drm/drm_modes.h
> > @@ -431,7 +431,7 @@ struct drm_display_mode {
> >  /**
> >   * DRM_MODE_FMT - printf string for &struct drm_display_mode
> >   */
> > -#define DRM_MODE_FMT    "\"%s\": %d %d %d %d %d %d %d %d %d %d 0x%x 0x%x"
> > +#define DRM_MODE_FMT    "\"%s\": %d %d %d %d %d %d %d %d %d %d 0x%x 0x%x %s"
> >
> >  /**
> >   * DRM_MODE_ARG - printf arguments for &struct drm_display_mode
> > @@ -441,7 +441,7 @@ struct drm_display_mode {
> >         (m)->name, (m)->vrefresh, (m)->clock, \
> >         (m)->hdisplay, (m)->hsync_start, (m)->hsync_end, (m)->htotal, \
> >         (m)->vdisplay, (m)->vsync_start, (m)->vsync_end, (m)->vtotal, \
> > -       (m)->type, (m)->flags
> > +       (m)->type, (m)->flags, hdmi_picture_aspect_get_name((m)->picture_aspect_ratio)
> 
> Flags are printed as a literal integer value. Given that AR stuff is
> fairly esoteric, I might just print an integer here too.

I hate those thing. I can't even remember which is one the type
(absolutely useless) and which on is the flags. And I always end up
going through the headers to figure out which bit is what sync polarity.
So I'd rather like to decode those too, just been too lazy to do it.

> (Why was
> picture_aspect_ratio not stuffed into ->flags in the first place?)

It's also in there. I think the reason for having this odd duplication
was that we didn't have the flags initially, and just had the prop.
I've not looked how hard it would be to get rid of that.


> 
> >
> >  #define obj_to_mode(x) container_of(x, struct drm_display_mode, base)
> >
> > diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h
> > index 9918a6c910c5..de7cbe385dba 100644
> > --- a/include/linux/hdmi.h
> > +++ b/include/linux/hdmi.h
> > @@ -434,4 +434,7 @@ int hdmi_infoframe_unpack(union hdmi_infoframe *frame,
> >  void hdmi_infoframe_log(const char *level, struct device *dev,
> >                         const union hdmi_infoframe *frame);
> >
> > +const char *
> > +hdmi_picture_aspect_get_name(enum hdmi_picture_aspect picture_aspect);
> > +
> >  #endif /* _DRM_HDMI_H */
> > --
> > 2.21.0
> >

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* Re: [PATCH] mm: fix regression with deferred struct page init
From: Alexander Duyck @ 2019-06-20 16:10 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-mm, linux-kernel
In-Reply-To: <20190620160821.4210-1-jgross@suse.com>

On Thu, 2019-06-20 at 18:08 +0200, Juergen Gross wrote:
> Commit 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time
> instead of doing larger sections") is causing a regression on some
> systems when the kernel is booted as Xen dom0.
> 
> The system will just hang in early boot.
> 
> Reason is an endless loop in get_page_from_freelist() in case the first
> zone looked at has no free memory. deferred_grow_zone() is always
> returning true due to the following code snipplet:
> 
>   /* If the zone is empty somebody else may have cleared out the zone */
>   if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
>                                            first_deferred_pfn)) {
>           pgdat->first_deferred_pfn = ULONG_MAX;
>           pgdat_resize_unlock(pgdat, &flags);
>           return true;
>   }
> 
> This in turn results in the loop as get_page_from_freelist() is
> assuming forward progress can be made by doing some more struct page
> initialization.
> 
> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Fixes: 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
> Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>

> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d66bc8abe0af..8e3bc949ebcc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1826,7 +1826,8 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
>  						 first_deferred_pfn)) {
>  		pgdat->first_deferred_pfn = ULONG_MAX;
>  		pgdat_resize_unlock(pgdat, &flags);
> -		return true;
> +		/* Retry only once. */
> +		return first_deferred_pfn != ULONG_MAX;
>  	}
>  
>  	/*



^ permalink raw reply

* Re: [PATCH 2/2] coresight: Abort probe for missing CPU phandle
From: Sai Prakash Ranjan @ 2019-06-20 16:10 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier, leo.yan, alexander.shishkin,
	david.brown, mark.rutland
  Cc: rnayak, linux-arm-msm, linux-kernel, sibis, vivek.gautam,
	linux-arm-kernel
In-Reply-To: <0182216b-5495-bcf7-bb0e-869133b24830@arm.com>

On 6/20/2019 8:53 PM, Suzuki K Poulose wrote:
> 
> 
> 
> Please wait for Mathieu's thoughts on it. And in general I would wait
> for feedback from the people in a version, before posting another one,
> to reduce the number of respins.
> 

Mathieu already said he was OK in the other thread, but I will wait for
some more feedbacks.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v3 6/9] leds: multicolor: Introduce a multicolor class definition
From: Jacek Anaszewski @ 2019-06-20 16:10 UTC (permalink / raw)
  To: Dan Murphy, pavel, robh+dt; +Cc: devicetree, linux-leds, linux-kernel
In-Reply-To: <20190523190820.29375-7-dmurphy@ti.com>

Hi Dan,

Thank you for the v5.

I will confine myself to commenting only some parts since
the rest will undergo rework due to removal of sync API.

On 5/23/19 9:08 PM, Dan Murphy wrote:
> Introduce a multicolor class that groups colored LEDs
> within a LED node.
> 
> The framework allows for dynamically setting individual LEDs
> or setting brightness levels of LEDs and updating them virtually
> simultaneously.
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
>   drivers/leds/Kconfig                 |  10 +
>   drivers/leds/Makefile                |   1 +
>   drivers/leds/led-class-multicolor.c  | 421 +++++++++++++++++++++++++++
>   include/linux/led-class-multicolor.h |  95 ++++++
>   4 files changed, 527 insertions(+)
>   create mode 100644 drivers/leds/led-class-multicolor.c
>   create mode 100644 include/linux/led-class-multicolor.h
> 
> diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
> index 0414adebb177..0696a13c9527 100644
> --- a/drivers/leds/Kconfig
> +++ b/drivers/leds/Kconfig
> @@ -29,6 +29,16 @@ config LEDS_CLASS_FLASH
>   	  for the flash related features of a LED device. It can be built
>   	  as a module.
>   
> +config LEDS_CLASS_MULTI_COLOR
> +	tristate "LED Mulit Color LED Class Support"
> +	depends on LEDS_CLASS
> +	help
> +	  This option enables the multicolor LED sysfs class in /sys/class/leds.
> +	  It wraps LED Class and adds multicolor LED specific sysfs attributes
> +	  and kernel internal API to it. You'll need this to provide support
> +	  for multicolor LEDs that are grouped together. This class is not
> +	  intended for single color LEDs.  It can be built as a module.

extra whitespace:

s/ It can/It can/

[...]
> +
> +static int multicolor_set_brightness(struct led_classdev *led_cdev,
> +			     enum led_brightness brightness)
> +{
> +	struct led_classdev_mc *mcled_cdev = lcdev_to_mccdev(led_cdev);
> +	struct led_classdev_mc_data *data = mcled_cdev->data;
> +	struct led_multicolor_ops *ops = mcled_cdev->ops;
> +	struct led_classdev_mc_priv *priv;
> +	unsigned long state = brightness;
> +	int adj_value;
> +	ssize_t ret = -EINVAL;
> +
> +	mutex_lock(&led_cdev->led_access);
> +
> +	if (ops->set_module_brightness) {
> +		ret = ops->set_module_brightness(mcled_cdev, state);
> +		goto unlock;
> +	}
> +
> +	list_for_each_entry(priv, &data->color_list, list) {
> +		if (state && priv->brightness && priv->max_brightness) {
> +			adj_value = state * ((priv->brightness * 100) / priv->max_brightness);
> +			adj_value = adj_value / 100;

Why the multiplication an then division by 100? And priv->max_brightness
stays unaltered? This changes the proportions. My python script works
just fine without those.

> +		} else
> +			adj_value = LED_OFF;
> +
> +		ret = ops->set_color_brightness(priv->mcled_cdev,
> +						priv->color_id,	adj_value);
> +		if (ret < 0)
> +			goto unlock;
> +	}
> +
> +unlock:
> +	mutex_unlock(&led_cdev->led_access);
> +	return ret;
> +}
[...]
> +int led_classdev_multicolor_register_ext(struct device *parent,
> +				     struct led_classdev_mc *mcled_cdev,
> +				     struct led_init_data *init_data)
> +{
> +	struct led_classdev *led_cdev;
> +	struct led_multicolor_ops *ops;
> +	struct led_classdev_mc_data *data;
> +	int ret;
> +	int i;
> +
> +	if (!mcled_cdev)
> +		return -EINVAL;
> +
> +	ops = mcled_cdev->ops;
> +	if (!ops || !ops->set_color_brightness)
> +		return -EINVAL;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	mcled_cdev->data = data;
> +	led_cdev = &mcled_cdev->led_cdev;
> +
> +	if (led_cdev->brightness_set_blocking)
> +		led_cdev->brightness_set_blocking = multicolor_set_brightness;

This is weird. In leds-lp50xx.c you don't initialize
brightness_set_blocking and this still works?

I believe this is kind of omission.

And it is not reasonable to just override driver supplied op with
generic one just like that.

I propose to initialize brightness_set or brightness_set_blocking
op as we used to do it for monochrome LEDs. Those function(s) on
driver side will either use device's hardware support for setting
color lightness, or will call a generic function provided by
LED multi color class for calculating intensities of all colors
it comprises in the cluster.

I know this is different to what we've discussed on IRC, but now
it looks for me the most reasonable way to go.

> +	INIT_LIST_HEAD(&data->color_list);
> +
> +	/* Register led class device */
> +	ret = led_classdev_register_ext(parent, led_cdev, init_data);
> +	if (ret)
> +		return ret;
> +
> +	ret = led_multicolor_init_color_dir(data, mcled_cdev);
> +	if (ret)
> +		return ret;
> +
> +	/* Select the sysfs attributes to be created for the device */
> +	for (i = 0; i < mcled_cdev->num_leds; i++) {
> +		ret = led_multicolor_init_color(data, mcled_cdev,
> +						mcled_cdev->available_colors[i]);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(led_classdev_multicolor_register_ext);
> +
> +void led_classdev_multicolor_unregister(struct led_classdev_mc *mcled_cdev)
> +{
> +	if (!mcled_cdev)
> +		return;
> +
> +	led_classdev_unregister(&mcled_cdev->led_cdev);
> +}
> +EXPORT_SYMBOL_GPL(led_classdev_multicolor_unregister);
> +
> +static void devm_led_classdev_multicolor_release(struct device *dev, void *res)
> +{
> +	led_classdev_multicolor_unregister(*(struct led_classdev_mc **)res);
> +}
> +
> +/**
> + * devm_of_led_classdev_register - resource managed led_classdev_register()
> + *
> + * @parent: parent of LED device
> + * @led_cdev: the led_classdev structure for this device.
> + */
> +int devm_led_classdev_multicolor_register(struct device *parent,
> +					  struct led_classdev_mc *mcled_cdev)
> +{
> +	struct led_classdev_mc **dr;
> +	int ret;
> +
> +	dr = devres_alloc(devm_led_classdev_multicolor_release,
> +			  sizeof(*dr), GFP_KERNEL);
> +	if (!dr)
> +		return -ENOMEM;
> +
> +	ret = led_classdev_multicolor_register(parent, mcled_cdev);
> +	if (ret) {
> +		devres_free(dr);
> +		return ret;
> +	}
> +
> +	*dr = mcled_cdev;
> +	devres_add(parent, dr);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(devm_led_classdev_multicolor_register);
> +
> +static int devm_led_classdev_multicolor_match(struct device *dev,
> +					      void *res, void *data)
> +{
> +	struct mcled_cdev **p = res;
> +
> +	if (WARN_ON(!p || !*p))
> +		return 0;
> +
> +	return *p == data;
> +}
> +
> +/**
> + * devm_led_classdev_multicolor_unregister() - resource managed
> + *					led_classdev_multicolor_unregister()
> + * @parent: The device to unregister.
> + * @mcled_cdev: the led_classdev_mc structure for this device.
> + */
> +void devm_led_classdev_multicolor_unregister(struct device *dev,
> +				  struct led_classdev_mc *mcled_cdev)
> +{
> +	WARN_ON(devres_release(dev,
> +			       devm_led_classdev_multicolor_release,
> +			       devm_led_classdev_multicolor_match, mcled_cdev));
> +}
> +EXPORT_SYMBOL_GPL(devm_led_classdev_multicolor_unregister);
> +
> +MODULE_AUTHOR("Dan Murphy <dmurphy@ti.com>");
> +MODULE_DESCRIPTION("Multi Color LED class interface");
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/linux/led-class-multicolor.h b/include/linux/led-class-multicolor.h
> new file mode 100644
> index 000000000000..f9e71d984b03
> --- /dev/null
> +++ b/include/linux/led-class-multicolor.h
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/* LED Multicolor class interface
> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
> + */

Let's have C++ comment style also here.

> +#ifndef __LINUX_MULTICOLOR_LEDS_H_INCLUDED
> +#define __LINUX_MULTICOLOR_LEDS_H_INCLUDED
> +
> +#include <linux/leds.h>
> +#include <dt-bindings/leds/common.h>
[...]


-- 
Best regards,
Jacek Anaszewski

^ permalink raw reply

* Re: [PATCH 2/2] coresight: Abort probe for missing CPU phandle
From: Sai Prakash Ranjan @ 2019-06-20 16:10 UTC (permalink / raw)
  To: Suzuki K Poulose, mathieu.poirier, leo.yan, alexander.shishkin,
	david.brown, mark.rutland
  Cc: rnayak, vivek.gautam, sibis, linux-arm-kernel, linux-kernel,
	linux-arm-msm
In-Reply-To: <0182216b-5495-bcf7-bb0e-869133b24830@arm.com>

On 6/20/2019 8:53 PM, Suzuki K Poulose wrote:
> 
> 
> 
> Please wait for Mathieu's thoughts on it. And in general I would wait
> for feedback from the people in a version, before posting another one,
> to reduce the number of respins.
> 

Mathieu already said he was OK in the other thread, but I will wait for
some more feedbacks.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply

* Re: [Qemu-devel] [PATCH 02/12] block/backup: Add mirror sync mode 'bitmap'
From: John Snow @ 2019-06-20 16:01 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block
  Cc: Kevin Wolf, Fam Zheng, vsementsov, Wen Congyang, Xie Changlong,
	Markus Armbruster
In-Reply-To: <e7143bb8-afb6-8326-6e93-49a7470b9b98@redhat.com>



On 6/20/19 11:00 AM, Max Reitz wrote:
> On 20.06.19 03:03, John Snow wrote:
>> We don't need or want a new sync mode for simple differences in
>> semantics.  Create a new mode simply named "BITMAP" that is designed to
>> make use of the new Bitmap Sync Mode field.
>>
>> Because the only bitmap mode is 'conditional', this adds no new
>> functionality to the backup job (yet). The old incremental backup mode
>> is maintained as a syntactic sugar for sync=bitmap, mode=conditional.
>>
>> Add all of the plumbing necessary to support this new instruction.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  qapi/block-core.json      | 30 ++++++++++++++++++++++--------
>>  include/block/block_int.h |  6 +++++-
>>  block/backup.c            | 35 ++++++++++++++++++++++++++++-------
>>  block/mirror.c            |  6 ++++--
>>  block/replication.c       |  2 +-
>>  blockdev.c                |  8 ++++++--
>>  6 files changed, 66 insertions(+), 21 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index caf28a71a0..6d05ad8f47 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -1127,12 +1127,15 @@
>>  #
>>  # @none: only copy data written from now on
>>  #
>> -# @incremental: only copy data described by the dirty bitmap. Since: 2.4
>> +# @incremental: only copy data described by the dirty bitmap. (since: 2.4)
> 
> Why not deprecate this in the process and note that this is equal to
> sync=bitmap, bitmap-mode=conditional?
> 
> (I don’t think there is a rule that forces us to actually remove
> deprecated stuff after two releases if it doesn’t hurt to keep it.)
> 

Mostly I thought it would be fine to keep as sugar. In your replies so
far I gather that "incremental" and "differential" don't mean specific
backup paradigms to you, so maybe these seem like worthless words.

It was my general understanding that in terms of backup
paradigms/methodologies that "incremental" and "differential" mean very
specific things.

Incremental: Each backup contains only the delta from the last
incremental backup.
Differential: Each backup contains the delta from the last FULL backup.

You can search "incremental vs differential backup" on your search
engine of choice and find many relevant results. I took a Networking/IT
vocational degree in 2007 and these terms were taught in textbooks then.

So I will resist quite strongly changing them, and for this reason, felt
that it was strictly a good thing to keep incremental as sugar, because
I thought that people would know what it meant.

(More than "conditional", anyway, which is jargon I made up.)

>> +#
>> +# @bitmap: only copy data described by the dirty bitmap. (since: 4.1)
>> +#          Behavior on completion is determined by the BitmapSyncMode.
>>  #
>>  # Since: 1.3
>>  ##
>>  { 'enum': 'MirrorSyncMode',
>> -  'data': ['top', 'full', 'none', 'incremental'] }
>> +  'data': ['top', 'full', 'none', 'incremental', 'bitmap'] }
>>  
>>  ##
>>  # @BitmapSyncMode:
>> @@ -1352,10 +1355,14 @@
>>  #
>>  # @speed: the maximum speed, in bytes per second
>>  #
>> -# @bitmap: the name of dirty bitmap if sync is "incremental".
>> -#          Must be present if sync is "incremental", must NOT be present
>> +# @bitmap: the name of dirty bitmap if sync is "bitmap".
>> +#          Must be present if sync is "bitmap", must NOT be present
>>  #          otherwise. (Since 2.4)
> 
> Er, well, now this is too fast of a deprecation. :-)  It must still also
> be present if sync is “incremental”.
> 

OK; I will try to phrase it better. This is reflecting too much the
implementation -- I think I was trying to communicate that incremental
was just sugar for "bitmap", so I was trusting that was understood here.

...But, depending on the order in which you read the docs, this could be
confusing, so I guess I will change that.

>>  #
>> +# @bitmap-mode: Specifies the type of data the bitmap should contain after
>> +#               the operation concludes. Must be present if sync is "bitmap".
>> +#               Must NOT be present otherwise. (Since 4.1)
> 
> Do we have any rule that qemu must enforce “must not”s? :-)
> 
> (No, I don’t think so.  I think it’s very reasonable that you accept
> bitmap-mode=conditional for sync=incremental.)
> 

Right, I left this a secret wiggle room. If you specify the correct
bitmap sync mode for the incremental sugar, it will actually let it
slide. If you specify the wrong one, it will error out.

However, I think this is perfectly correct advice from the API: Please
use this mode with sync=bitmap and do not use it otherwise.

Would you like me to change it to be more technically correct and
document the little affordance I made?

>>  # @compress: true to compress data, if the target format supports it.
>>  #            (default: false) (since 2.8)
>>  #
>> @@ -1390,7 +1397,8 @@
>>    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
>>              '*format': 'str', 'sync': 'MirrorSyncMode',
>>              '*mode': 'NewImageMode', '*speed': 'int',
>> -            '*bitmap': 'str', '*compress': 'bool',
>> +            '*bitmap': 'str', '*bitmap-mode': 'BitmapSyncMode',
>> +            '*compress': 'bool',
>>              '*on-source-error': 'BlockdevOnError',
>>              '*on-target-error': 'BlockdevOnError',
>>              '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
>> @@ -1412,10 +1420,14 @@
>>  # @speed: the maximum speed, in bytes per second. The default is 0,
>>  #         for unlimited.
>>  #
>> -# @bitmap: the name of dirty bitmap if sync is "incremental".
>> -#          Must be present if sync is "incremental", must NOT be present
>> +# @bitmap: the name of dirty bitmap if sync is "bitmap".
>> +#          Must be present if sync is "bitmap", must NOT be present
>>  #          otherwise. (Since 3.1)
> 
> Same as above.
> 

OK

>> +# @bitmap-mode: Specifies the type of data the bitmap should contain after
>> +#               the operation concludes. Must be present if sync is "bitmap".
>> +#               Must NOT be present otherwise. (Since 4.1)
>> +#
>>  # @compress: true to compress data, if the target format supports it.
>>  #            (default: false) (since 2.8)
>>  #
>> @@ -1449,7 +1461,9 @@
>>  { 'struct': 'BlockdevBackup',
>>    'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
>>              'sync': 'MirrorSyncMode', '*speed': 'int',
>> -            '*bitmap': 'str', '*compress': 'bool',
>> +            '*bitmap': 'str',
>> +            '*bitmap-mode': 'BitmapSyncMode',
>> +            '*compress': 'bool',
>>              '*on-source-error': 'BlockdevOnError',
>>              '*on-target-error': 'BlockdevOnError',
>>              '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index d6415b53c1..89370c1b9b 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1132,7 +1132,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>   * @target: Block device to write to.
>>   * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
>>   * @sync_mode: What parts of the disk image should be copied to the destination.
>> - * @sync_bitmap: The dirty bitmap if sync_mode is MIRROR_SYNC_MODE_INCREMENTAL.
>> + * @sync_bitmap: The dirty bitmap if sync_mode is 'bitmap' or 'incremental'
>> + * @has_bitmap_mode: true if @bitmap_sync carries a meaningful value.
> 
> Hmm...  If you moved the conversion of incremental/- =>
> bitmap/conditional into blockdev.c, you could get rid of this parameter
> because it would be equal to (sync_bitmap != NULL).
> 
> (It itches me to get rid of this parameter because there is no other
> has* parameter for this function yet.)
> 

Yeah, it annoyed me too, and I believe later you do correctly guess why
I did it -- it's so that the sugar conversion occurs all in one place
where the logic was easiest to condense.

I ran into the issue that there's no way to define a QAPI enum that has
a "default"/"unset" state without also allowing that value to be entered
by the user explicitly; so there was no way to pass along an "unset
enum" down this far.

So... (thought continued below)

>> + * @bitmap_mode: The bitmap synchronization policy to use.
>>   * @on_source_error: The action to take upon error reading from the source.
>>   * @on_target_error: The action to take upon error writing to the target.
>>   * @creation_flags: Flags that control the behavior of the Job lifetime.
>> @@ -1148,6 +1150,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>                              BlockDriverState *target, int64_t speed,
>>                              MirrorSyncMode sync_mode,
>>                              BdrvDirtyBitmap *sync_bitmap,
>> +                            bool has_bitmap_mode,
>> +                            BitmapSyncMode bitmap_mode,
>>                              bool compress,
>>                              BlockdevOnError on_source_error,
>>                              BlockdevOnError on_target_error,
>> diff --git a/block/backup.c b/block/backup.c
>> index 715e1d3be8..c4f83d4ef7 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
> 
> [...]
> 
>> @@ -584,9 +586,28 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>      }
>>  
>>      if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>> +        if (has_bitmap_mode &&
>> +            bitmap_mode != BITMAP_SYNC_MODE_CONDITIONAL) {
>> +            error_setg(errp, "Bitmap sync mode must be 'conditional' "
>> +                       "when using sync mode '%s'",
>> +                       MirrorSyncMode_str(sync_mode));
>> +            return NULL;
>> +        }
>> +        has_bitmap_mode = true;
>> +        bitmap_mode = BITMAP_SYNC_MODE_CONDITIONAL;
>> +        effective_mode = MIRROR_SYNC_MODE_BITMAP;
>> +    }
>> +
> 
> I also just don’t quite feel like this is the correct place to put this.
>  It’s a deprecated interface, so it should be translated in the
> interface code, i.e. in blockdev.c.
> 
> (Sure, this gives you a central place for the translation, but you can
> just as well add a function to the same effect to blockdev.c.)
> 
> Max
> 

... I can toy around with your idea of making a helper that can be
called in blockdev and see if I like it.

Thank you for taking a look!


^ permalink raw reply

* TODO list form for Arm platform
From: tommaso merciai @ 2019-06-20 16:08 UTC (permalink / raw)
  To: Kernelnewbies


[-- Attachment #1.1: Type: text/plain, Size: 155 bytes --]

Hello to everyone,
Where can I find a TODO list to contribute in Kernel development?
Is there a separate list form Arm platform?

Regards,
Tommaso Merciai

[-- Attachment #1.2: Type: text/html, Size: 287 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply

* [PATCH v2 0/8] staging: erofs: decompression inplace approach
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang

This is patch v2 of erofs decompression inplace approach, no major
issues observed after v1 preliminarily applied to our products, and
changes from v1 are minor, so I drop "RFC" tag from this version.

See the bottom lines which are taken from RFC PATCH v1 and describe
the principle of these technologies.

The series is based on the latest staging-next since all dependencies
have already been merged.

changelog from v1:
 - keep Z_EROFS_NR_INLINE_PAGEVECS in unzip_vle.h after switching to
   new decompression backend;
 - add some DBG_BUGONs in new decompression backend to observe
   potential issues;
 - minor code cleanup.

8<--------

Hi,

After working on for more than half a year, the detail of erofs decompression
inplace is almost determined and ready for linux-next.

Currently, inplace IO is used if the whole compressed data is used
in order to reduce compressed pages extra memory overhead and an extra
memcpy (the only one memcpy) will be used for each inplace IO since
temporary buffer is needed to keep decompressing safe for inplace IO.

However, most of lz-based decompression algorithms support decompression
inplace by their algorithm designs, such as LZ4, LZO, etc.

If iend - oend margin is large enough, decompression inplace can be done
in the same buffer safely, as illustrated below:

         start of compressed logical extent
           |                          end of this logical extent
           |                           |
     ______v___________________________v________
... |  page 6  |  page 7  |  page 8  |  page 9  | ...
    |__________|__________|__________|__________|
           .                         ^ .        ^
           .                         |compressed|
           .                         |   data   |
           .                           .        .
           |<          dstsize        >|<margin>|
                                       oend     iend
           op                        ip

Fixed-size output compression can make the full use of this feature
to reduce memory overhead and avoid extra memcpy compared with fixed-size
input compression since iend is strictly not less than oend for fixed-size
output compression with inplace IO to last pages.

In addition, erofs compression indexes have been improved as well by
introducing compacted compression indexes.

These two techniques all benefit sequential read (on x86_64, 710.8MiB/s ->
755.4MiB/s; on Kirin980, 725MiB/s -> 812MiB/s) therefore erofs could have
similar sequential read performance against ext4 in a larger CR range
on high-spend SSD / NVMe devices as well.

However, note that it is _cpu vs storage device_ tradeoff, there is no
absolute performance conclusion for all on-market combinations.

Test images:
 name                       size                 CR
 enwik9                     1000000000           1.00
 enwik9_4k.squashfs.img      621211648           1.61
 enwik9_4k.erofs.img         558133248           1.79
 enwik9_8k.squashfs.img      556191744           1.80
 enwik9_16k.squashfs.img     502661120           1.99
 enwik9_128k.squashfs.img    398204928           2.51

Test Environment:
CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz (4 cores, 8 threads)
DDR: 8G
SSD: INTEL SSDPEKKF360G7H
Kernel: Linux 5.2-rc3+ (with lz4-1.8.3 algorithm)

Test configuration:
squashfs:
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_DIRECT=y
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
CONFIG_SQUASHFS_LZ4=y
CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
erofs:
CONFIG_EROFS_FS_USE_VM_MAP_RAM=y
CONFIG_EROFS_FS_ZIP=y
CONFIG_EROFS_FS_CLUSTER_PAGE_LIMIT=1
CONFIG_EROFS_FS_ZIP_CACHE_BIPOLAR=y

with intel_pstate=disable,
     8 cpus on at 1801000 scaling_{min,max}_freq,
     userspace scaling_governor

Sequential read results (MiB/s):
                           1      2      3      4      5      avg
 enwik9_4k.ext4.img        767    770    738    726    724    745
 enwik9_4k.erofs.img       756    745    770    746    760    755.4
 enwik9_4k.squashfs.img    90.3   83.0   94.3   90.7   92.6   90.18
 enwik9_8k.squashfs.img    111    108    110    108    110    109.4
 enwik9_16k.squashfs.img   158    163    146    165    174    161.2
 enwik9_128k.squashfs.img  324    314    262    262    296    291.6


Thanks,
Gao Xiang

Gao Xiang (8):
  staging: erofs: add compacted ondisk compression indexes
  staging: erofs: add compacted compression indexes support
  staging: erofs: move per-CPU buffers implementation to utils.c
  staging: erofs: move stagingpage operations to compress.h
  staging: erofs: introduce generic decompression backend
  staging: erofs: introduce LZ4 decompression inplace
  staging: erofs: switch to new decompression backend
  staging: erofs: integrate decompression inplace

 drivers/staging/erofs/Makefile        |   2 +-
 drivers/staging/erofs/compress.h      |  62 ++++
 drivers/staging/erofs/data.c          |   4 +-
 drivers/staging/erofs/decompressor.c  | 322 ++++++++++++++++++
 drivers/staging/erofs/erofs_fs.h      |  60 +++-
 drivers/staging/erofs/inode.c         |  12 +-
 drivers/staging/erofs/internal.h      |  52 ++-
 drivers/staging/erofs/unzip_vle.c     | 368 ++------------------
 drivers/staging/erofs/unzip_vle.h     |  38 +--
 drivers/staging/erofs/unzip_vle_lz4.c | 229 -------------
 drivers/staging/erofs/utils.c         |  12 +
 drivers/staging/erofs/zmap.c          | 463 ++++++++++++++++++++++++++
 12 files changed, 993 insertions(+), 631 deletions(-)
 create mode 100644 drivers/staging/erofs/compress.h
 create mode 100644 drivers/staging/erofs/decompressor.c
 delete mode 100644 drivers/staging/erofs/unzip_vle_lz4.c
 create mode 100644 drivers/staging/erofs/zmap.c

-- 
2.17.1


^ permalink raw reply

* [PATCH v2 2/8] staging: erofs: add compacted compression indexes support
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch aims at compacted compression indexes:
 1) cleanup z_erofs_map_blocks_iter and move into zmap.c;
 2) add compacted 4/2B decoding support.

On kirin980 platform, sequential read is increased about
6% (725MiB/s -> 770MiB/s) on enwik9 dataset if compacted 2B
feature is enabled.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/Makefile    |   2 +-
 drivers/staging/erofs/inode.c     |   7 +-
 drivers/staging/erofs/internal.h  |  18 +-
 drivers/staging/erofs/unzip_vle.c | 286 ------------------
 drivers/staging/erofs/zmap.c      | 462 ++++++++++++++++++++++++++++++
 5 files changed, 480 insertions(+), 295 deletions(-)
 create mode 100644 drivers/staging/erofs/zmap.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index a34248a2a16a..84b412c7a991 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o
 
diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c
index 3539290b8e45..1d467322bacf 100644
--- a/drivers/staging/erofs/inode.c
+++ b/drivers/staging/erofs/inode.c
@@ -210,12 +210,7 @@ static int fill_inode(struct inode *inode, int isdir)
 		}
 
 		if (is_inode_layout_compression(inode)) {
-#ifdef CONFIG_EROFS_FS_ZIP
-			inode->i_mapping->a_ops =
-				&z_erofs_vle_normalaccess_aops;
-#else
-			err = -ENOTSUPP;
-#endif
+			err = z_erofs_fill_inode(inode);
 			goto out_unlock;
 		}
 
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index c851d0be6cf6..f3063b13c117 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -339,9 +339,11 @@ static inline erofs_off_t iloc(struct erofs_sb_info *sbi, erofs_nid_t nid)
 
 /* atomic flag definitions */
 #define EROFS_V_EA_INITED_BIT	0
+#define EROFS_V_Z_INITED_BIT	1
 
 /* bitlock definitions (arranged in reverse order) */
 #define EROFS_V_BL_XATTR_BIT	(BITS_PER_LONG - 1)
+#define EROFS_V_BL_Z_BIT	(BITS_PER_LONG - 2)
 
 struct erofs_vnode {
 	erofs_nid_t nid;
@@ -356,8 +358,17 @@ struct erofs_vnode {
 	unsigned xattr_shared_count;
 	unsigned *xattr_shared_xattrs;
 
-	erofs_blk_t raw_blkaddr;
-
+	union {
+		erofs_blk_t raw_blkaddr;
+#ifdef CONFIG_EROFS_FS_ZIP
+		struct {
+			unsigned short z_advise;
+			unsigned char  z_algorithmtype[2];
+			unsigned char  z_logical_clusterbits;
+			unsigned char  z_physical_clusterbits[2];
+		};
+#endif
+	};
 	/* the corresponding vfs inode */
 	struct inode vfs_inode;
 };
@@ -447,11 +458,14 @@ struct erofs_map_blocks {
 /* Flags used by erofs_map_blocks() */
 #define EROFS_GET_BLOCKS_RAW    0x0001
 
+/* zmap.c */
 #ifdef CONFIG_EROFS_FS_ZIP
+int z_erofs_fill_inode(struct inode *inode);
 int z_erofs_map_blocks_iter(struct inode *inode,
 			    struct erofs_map_blocks *map,
 			    int flags);
 #else
+static inline int z_erofs_fill_inode(struct inode *inode) { return -ENOTSUPP; }
 static inline int z_erofs_map_blocks_iter(struct inode *inode,
 					  struct erofs_map_blocks *map,
 					  int flags)
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 0db9bc50f67c..8aea938172df 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -1600,289 +1600,3 @@ const struct address_space_operations z_erofs_vle_normalaccess_aops = {
 	.readpages = z_erofs_vle_normalaccess_readpages,
 };
 
-/*
- * Variable-sized Logical Extent (Fixed Physical Cluster) Compression Mode
- * ---
- * VLE compression mode attempts to compress a number of logical data into
- * a physical cluster with a fixed size.
- * VLE compression mode uses "struct z_erofs_vle_decompressed_index".
- */
-#define __vle_cluster_advise(x, bit, bits) \
-	((le16_to_cpu(x) >> (bit)) & ((1 << (bits)) - 1))
-
-#define __vle_cluster_type(advise) __vle_cluster_advise(advise, \
-	Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT, Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS)
-
-#define vle_cluster_type(di)	\
-	__vle_cluster_type((di)->di_advise)
-
-static int
-vle_decompressed_index_clusterofs(unsigned int *clusterofs,
-				  unsigned int clustersize,
-				  struct z_erofs_vle_decompressed_index *di)
-{
-	switch (vle_cluster_type(di)) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		*clusterofs = clustersize;
-		break;
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		*clusterofs = le16_to_cpu(di->di_clusterofs);
-		break;
-	default:
-		DBG_BUGON(1);
-		return -EIO;
-	}
-	return 0;
-}
-
-static inline erofs_blk_t
-vle_extent_blkaddr(struct inode *inode, pgoff_t index)
-{
-	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	struct erofs_vnode *vi = EROFS_V(inode);
-
-	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-						    vi->xattr_isize) +
-		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
-
-	return erofs_blknr(iloc(sbi, vi->nid) + ofs);
-}
-
-static inline unsigned int
-vle_extent_blkoff(struct inode *inode, pgoff_t index)
-{
-	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	struct erofs_vnode *vi = EROFS_V(inode);
-
-	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-						    vi->xattr_isize) +
-		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
-
-	return erofs_blkoff(iloc(sbi, vi->nid) + ofs);
-}
-
-struct vle_map_blocks_iter_ctx {
-	struct inode *inode;
-	struct super_block *sb;
-	unsigned int clusterbits;
-
-	struct page **mpage_ret;
-	void **kaddr_ret;
-};
-
-static int
-vle_get_logical_extent_head(const struct vle_map_blocks_iter_ctx *ctx,
-			    unsigned int lcn,	/* logical cluster number */
-			    unsigned long long *ofs,
-			    erofs_blk_t *pblk,
-			    unsigned int *flags)
-{
-	const unsigned int clustersize = 1 << ctx->clusterbits;
-	const erofs_blk_t mblk = vle_extent_blkaddr(ctx->inode, lcn);
-	struct page *mpage = *ctx->mpage_ret;	/* extent metapage */
-
-	struct z_erofs_vle_decompressed_index *di;
-	unsigned int cluster_type, delta0;
-
-	if (mpage->index != mblk) {
-		kunmap_atomic(*ctx->kaddr_ret);
-		unlock_page(mpage);
-		put_page(mpage);
-
-		mpage = erofs_get_meta_page(ctx->sb, mblk, false);
-		if (IS_ERR(mpage)) {
-			*ctx->mpage_ret = NULL;
-			return PTR_ERR(mpage);
-		}
-		*ctx->mpage_ret = mpage;
-		*ctx->kaddr_ret = kmap_atomic(mpage);
-	}
-
-	di = *ctx->kaddr_ret + vle_extent_blkoff(ctx->inode, lcn);
-
-	cluster_type = vle_cluster_type(di);
-	switch (cluster_type) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		delta0 = le16_to_cpu(di->di_u.delta[0]);
-		if (unlikely(!delta0 || delta0 > lcn)) {
-			errln("invalid NONHEAD dl0 %u at lcn %u of nid %llu",
-			      delta0, lcn, EROFS_V(ctx->inode)->nid);
-			DBG_BUGON(1);
-			return -EIO;
-		}
-		return vle_get_logical_extent_head(ctx,
-			lcn - delta0, ofs, pblk, flags);
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		*flags ^= EROFS_MAP_ZIPPED;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		/* clustersize should be a power of two */
-		*ofs = ((u64)lcn << ctx->clusterbits) +
-			(le16_to_cpu(di->di_clusterofs) & (clustersize - 1));
-		*pblk = le32_to_cpu(di->di_u.blkaddr);
-		break;
-	default:
-		errln("unknown cluster type %u at lcn %u of nid %llu",
-		      cluster_type, lcn, EROFS_V(ctx->inode)->nid);
-		DBG_BUGON(1);
-		return -EIO;
-	}
-	return 0;
-}
-
-int z_erofs_map_blocks_iter(struct inode *inode,
-			    struct erofs_map_blocks *map,
-			    int flags)
-{
-	void *kaddr;
-	const struct vle_map_blocks_iter_ctx ctx = {
-		.inode = inode,
-		.sb = inode->i_sb,
-		.clusterbits = EROFS_I_SB(inode)->clusterbits,
-		.mpage_ret = &map->mpage,
-		.kaddr_ret = &kaddr
-	};
-	const unsigned int clustersize = 1 << ctx.clusterbits;
-	/* if both m_(l,p)len are 0, regularize l_lblk, l_lofs, etc... */
-	const bool initial = !map->m_llen;
-
-	/* logicial extent (start, end) offset */
-	unsigned long long ofs, end;
-	unsigned int lcn;
-	u32 ofs_rem;
-
-	/* initialize `pblk' to keep gcc from printing foolish warnings */
-	erofs_blk_t mblk, pblk = 0;
-	struct page *mpage = map->mpage;
-	struct z_erofs_vle_decompressed_index *di;
-	unsigned int cluster_type, logical_cluster_ofs;
-	int err = 0;
-
-	trace_z_erofs_map_blocks_iter_enter(inode, map, flags);
-
-	/* when trying to read beyond EOF, leave it unmapped */
-	if (unlikely(map->m_la >= inode->i_size)) {
-		DBG_BUGON(!initial);
-		map->m_llen = map->m_la + 1 - inode->i_size;
-		map->m_la = inode->i_size;
-		map->m_flags = 0;
-		goto out;
-	}
-
-	debugln("%s, m_la %llu m_llen %llu --- start", __func__,
-		map->m_la, map->m_llen);
-
-	ofs = map->m_la + map->m_llen;
-
-	/* clustersize should be power of two */
-	lcn = ofs >> ctx.clusterbits;
-	ofs_rem = ofs & (clustersize - 1);
-
-	mblk = vle_extent_blkaddr(inode, lcn);
-
-	if (!mpage || mpage->index != mblk) {
-		if (mpage)
-			put_page(mpage);
-
-		mpage = erofs_get_meta_page(ctx.sb, mblk, false);
-		if (IS_ERR(mpage)) {
-			err = PTR_ERR(mpage);
-			goto out;
-		}
-		map->mpage = mpage;
-	} else {
-		lock_page(mpage);
-		DBG_BUGON(!PageUptodate(mpage));
-	}
-
-	kaddr = kmap_atomic(mpage);
-	di = kaddr + vle_extent_blkoff(inode, lcn);
-
-	debugln("%s, lcn %u mblk %u e_blkoff %u", __func__, lcn,
-		mblk, vle_extent_blkoff(inode, lcn));
-
-	err = vle_decompressed_index_clusterofs(&logical_cluster_ofs,
-						clustersize, di);
-	if (unlikely(err))
-		goto unmap_out;
-
-	if (!initial) {
-		/* [walking mode] 'map' has been already initialized */
-		map->m_llen += logical_cluster_ofs;
-		goto unmap_out;
-	}
-
-	/* by default, compressed */
-	map->m_flags |= EROFS_MAP_ZIPPED;
-
-	end = ((u64)lcn + 1) * clustersize;
-
-	cluster_type = vle_cluster_type(di);
-
-	switch (cluster_type) {
-	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
-		if (ofs_rem >= logical_cluster_ofs)
-			map->m_flags ^= EROFS_MAP_ZIPPED;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
-		if (ofs_rem == logical_cluster_ofs) {
-			pblk = le32_to_cpu(di->di_u.blkaddr);
-			goto exact_hitted;
-		}
-
-		if (ofs_rem > logical_cluster_ofs) {
-			ofs = (u64)lcn * clustersize | logical_cluster_ofs;
-			pblk = le32_to_cpu(di->di_u.blkaddr);
-			break;
-		}
-
-		/* logical cluster number should be >= 1 */
-		if (unlikely(!lcn)) {
-			errln("invalid logical cluster 0 at nid %llu",
-			      EROFS_V(inode)->nid);
-			err = -EIO;
-			goto unmap_out;
-		}
-		end = ((u64)lcn-- * clustersize) | logical_cluster_ofs;
-		/* fallthrough */
-	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-		/* get the correspoinding first chunk */
-		err = vle_get_logical_extent_head(&ctx, lcn, &ofs,
-						  &pblk, &map->m_flags);
-		mpage = map->mpage;
-
-		if (unlikely(err)) {
-			if (mpage)
-				goto unmap_out;
-			goto out;
-		}
-		break;
-	default:
-		errln("unknown cluster type %u at offset %llu of nid %llu",
-		      cluster_type, ofs, EROFS_V(inode)->nid);
-		err = -EIO;
-		goto unmap_out;
-	}
-
-	map->m_la = ofs;
-exact_hitted:
-	map->m_llen = end - ofs;
-	map->m_plen = clustersize;
-	map->m_pa = blknr_to_addr(pblk);
-	map->m_flags |= EROFS_MAP_MAPPED;
-unmap_out:
-	kunmap_atomic(kaddr);
-	unlock_page(mpage);
-out:
-	debugln("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
-		__func__, map->m_la, map->m_pa,
-		map->m_llen, map->m_plen, map->m_flags);
-
-	trace_z_erofs_map_blocks_iter_exit(inode, map, flags, err);
-
-	/* aggressively BUG_ON iff CONFIG_EROFS_FS_DEBUG is on */
-	DBG_BUGON(err < 0 && err != -ENOMEM);
-	return err;
-}
-
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
new file mode 100644
index 000000000000..77bc49c07846
--- /dev/null
+++ b/drivers/staging/erofs/zmap.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/drivers/staging/erofs/zmap.c
+ *
+ * Copyright (C) 2018-2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25@huawei.com>
+ */
+#include "internal.h"
+#include <asm/unaligned.h>
+#include <trace/events/erofs.h>
+
+int z_erofs_fill_inode(struct inode *inode)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct super_block *const sb = inode->i_sb;
+
+	if (vi->datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY) {
+		vi->z_advise = 0;
+		vi->z_algorithmtype[0] = 0;
+		vi->z_algorithmtype[1] = 0;
+		vi->z_logical_clusterbits = EROFS_SB(sb)->clusterbits;
+		vi->z_physical_clusterbits[0] = vi->z_logical_clusterbits;
+		vi->z_physical_clusterbits[1] = vi->z_logical_clusterbits;
+		set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
+	}
+
+	inode->i_mapping->a_ops = &z_erofs_vle_normalaccess_aops;
+	return 0;
+}
+
+static int fill_inode_lazy(struct inode *inode)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct super_block *const sb = inode->i_sb;
+	int err;
+	erofs_off_t pos;
+	struct page *page;
+	void *kaddr;
+	struct z_erofs_map_header *h;
+
+	if (test_bit(EROFS_V_Z_INITED_BIT, &vi->flags))
+		return 0;
+
+	if (wait_on_bit_lock(&vi->flags, EROFS_V_BL_Z_BIT, TASK_KILLABLE))
+		return -ERESTARTSYS;
+
+	err = 0;
+	if (test_bit(EROFS_V_Z_INITED_BIT, &vi->flags))
+		goto out_unlock;
+
+	DBG_BUGON(vi->datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY);
+
+	pos = ALIGN(iloc(EROFS_SB(sb), vi->nid) + vi->inode_isize +
+		    vi->xattr_isize, 8);
+	page = erofs_get_meta_page(sb, erofs_blknr(pos), false);
+	if (IS_ERR(page)) {
+		err = PTR_ERR(page);
+		goto out_unlock;
+	}
+
+	kaddr = kmap_atomic(page);
+
+	h = kaddr + erofs_blkoff(pos);
+	vi->z_advise = le16_to_cpu(h->h_advise);
+	vi->z_algorithmtype[0] = h->h_algorithmtype & 15;
+	vi->z_algorithmtype[1] = h->h_algorithmtype >> 4;
+
+	if (vi->z_algorithmtype[0] >= Z_EROFS_COMPRESSION_MAX) {
+		errln("unknown compression format %u for nid %llu, please upgrade kernel",
+		      vi->z_algorithmtype[0], vi->nid);
+		err = -ENOTSUPP;
+		goto unmap_done;
+	}
+
+	vi->z_logical_clusterbits = LOG_BLOCK_SIZE + (h->h_clusterbits & 7);
+	vi->z_physical_clusterbits[0] = vi->z_logical_clusterbits +
+					((h->h_clusterbits >> 3) & 3);
+
+	if (vi->z_physical_clusterbits[0] != LOG_BLOCK_SIZE) {
+		errln("unsupported physical clusterbits %u for nid %llu, please upgrade kernel",
+		      vi->z_physical_clusterbits[0], vi->nid);
+		err = -ENOTSUPP;
+		goto unmap_done;
+	}
+
+	vi->z_physical_clusterbits[1] = vi->z_logical_clusterbits +
+					((h->h_clusterbits >> 5) & 7);
+unmap_done:
+	kunmap_atomic(kaddr);
+	unlock_page(page);
+	put_page(page);
+
+	set_bit(EROFS_V_Z_INITED_BIT, &vi->flags);
+out_unlock:
+	clear_and_wake_up_bit(EROFS_V_BL_Z_BIT, &vi->flags);
+	return err;
+}
+
+struct z_erofs_maprecorder {
+	struct inode *inode;
+	struct erofs_map_blocks *map;
+	void *kaddr;
+
+	unsigned long lcn;
+	/* compression extent information gathered */
+	u8  type;
+	u16 clusterofs;
+	u16 delta[2];
+	erofs_blk_t pblk;
+};
+
+static int z_erofs_reload_indexes(struct z_erofs_maprecorder *m,
+				  erofs_blk_t eblk)
+{
+	struct super_block *const sb = m->inode->i_sb;
+	struct erofs_map_blocks *const map = m->map;
+	struct page *mpage = map->mpage;
+
+	if (mpage) {
+		if (mpage->index == eblk) {
+			if (!m->kaddr)
+				m->kaddr = kmap_atomic(mpage);
+			return 0;
+		}
+
+		if (m->kaddr) {
+			kunmap_atomic(m->kaddr);
+			m->kaddr = NULL;
+		}
+		put_page(mpage);
+	}
+
+	mpage = erofs_get_meta_page(sb, eblk, false);
+	if (IS_ERR(mpage)) {
+		map->mpage = NULL;
+		return PTR_ERR(mpage);
+	}
+	m->kaddr = kmap_atomic(mpage);
+	unlock_page(mpage);
+	map->mpage = mpage;
+	return 0;
+}
+
+static int vle_legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+					     unsigned long lcn)
+{
+	struct inode *const inode = m->inode;
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	const erofs_off_t ibase = iloc(EROFS_I_SB(inode), vi->nid);
+	const erofs_off_t pos = Z_EROFS_VLE_EXTENT_ALIGN(ibase +
+							 vi->inode_isize +
+							 vi->xattr_isize) +
+		16 + lcn * sizeof(struct z_erofs_vle_decompressed_index);
+	struct z_erofs_vle_decompressed_index *di;
+	unsigned int advise, type;
+	int err;
+
+	err = z_erofs_reload_indexes(m, erofs_blknr(pos));
+	if (err)
+		return err;
+
+	m->lcn = lcn;
+	di = m->kaddr + erofs_blkoff(pos);
+
+	advise = le16_to_cpu(di->di_advise);
+	type = (advise >> Z_EROFS_VLE_DI_CLUSTER_TYPE_BIT) &
+		((1 << Z_EROFS_VLE_DI_CLUSTER_TYPE_BITS) - 1);
+	switch (type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		m->clusterofs = 1 << vi->z_logical_clusterbits;
+		m->delta[0] = le16_to_cpu(di->di_u.delta[0]);
+		m->delta[1] = le16_to_cpu(di->di_u.delta[1]);
+		break;
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		m->clusterofs = le16_to_cpu(di->di_clusterofs);
+		m->pblk = le32_to_cpu(di->di_u.blkaddr);
+		break;
+	default:
+		DBG_BUGON(1);
+		return -EIO;
+	}
+	m->type = type;
+	return 0;
+}
+
+static unsigned int decode_compactedbits(unsigned int lobits,
+					 unsigned int lomask,
+					 u8 *in, unsigned int pos, u8 *type)
+{
+	const unsigned int v = get_unaligned_le32(in + pos / 8) >> (pos & 7);
+	const unsigned int lo = v & lomask;
+
+	*type = (v >> lobits) & 3;
+	return lo;
+}
+
+static int unpack_compacted_index(struct z_erofs_maprecorder *m,
+				  unsigned int amortizedshift,
+				  unsigned int eofs)
+{
+	struct erofs_vnode *const vi = EROFS_V(m->inode);
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	const unsigned int lomask = (1 << lclusterbits) - 1;
+	unsigned int vcnt, base, lo, encodebits, nblk;
+	int i;
+	u8 *in, type;
+
+	if (1 << amortizedshift == 4)
+		vcnt = 2;
+	else if (1 << amortizedshift == 2 && lclusterbits == 12)
+		vcnt = 16;
+	else
+		return -ENOTSUPP;
+
+	encodebits = ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt;
+	base = round_down(eofs, vcnt << amortizedshift);
+	in = m->kaddr + base;
+
+	i = (eofs - base) >> amortizedshift;
+
+	lo = decode_compactedbits(lclusterbits, lomask,
+				  in, encodebits * i, &type);
+	m->type = type;
+	if (type == Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD) {
+		m->clusterofs = 1 << lclusterbits;
+		if (i + 1 != vcnt) {
+			m->delta[0] = lo;
+			return 0;
+		}
+		/*
+		 * since the last lcluster in the pack is special,
+		 * of which lo saves delta[1] rather than delta[0].
+		 * Hence, get delta[0] by the previous lcluster indirectly.
+		 */
+		lo = decode_compactedbits(lclusterbits, lomask,
+					  in, encodebits * (i - 1), &type);
+		if (type != Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD)
+			lo = 0;
+		m->delta[0] = lo + 1;
+		return 0;
+	}
+	m->clusterofs = lo;
+	m->delta[0] = 0;
+	/* figout out blkaddr (pblk) for HEAD lclusters */
+	nblk = 1;
+	while (i > 0) {
+		--i;
+		lo = decode_compactedbits(lclusterbits, lomask,
+					  in, encodebits * i, &type);
+		if (type == Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD)
+			i -= lo;
+
+		if (i >= 0)
+			++nblk;
+	}
+	in += (vcnt << amortizedshift) - sizeof(__le32);
+	m->pblk = le32_to_cpu(*(__le32 *)in) + nblk;
+	return 0;
+}
+
+static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+					    unsigned long lcn)
+{
+	struct inode *const inode = m->inode;
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	const erofs_off_t ebase = ALIGN(iloc(EROFS_I_SB(inode), vi->nid) +
+					vi->inode_isize + vi->xattr_isize, 8) +
+		sizeof(struct z_erofs_map_header);
+	const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
+	unsigned int compacted_4b_initial, compacted_2b;
+	unsigned int amortizedshift;
+	erofs_off_t pos;
+	int err;
+
+	if (lclusterbits != 12)
+		return -ENOTSUPP;
+
+	if (lcn >= totalidx)
+		return -EINVAL;
+
+	m->lcn = lcn;
+	/* used to align to 32-byte (compacted_2b) alignment */
+	compacted_4b_initial = (32 - ebase % 32) / 4;
+	if (compacted_4b_initial == 32 / 4)
+		compacted_4b_initial = 0;
+
+	if (vi->z_advise & Z_EROFS_ADVISE_COMPACTED_2B)
+		compacted_2b = rounddown(totalidx - compacted_4b_initial, 16);
+	else
+		compacted_2b = 0;
+
+	pos = ebase;
+	if (lcn < compacted_4b_initial) {
+		amortizedshift = 2;
+		goto out;
+	}
+	pos += compacted_4b_initial * 4;
+	lcn -= compacted_4b_initial;
+
+	if (lcn < compacted_2b) {
+		amortizedshift = 1;
+		goto out;
+	}
+	pos += compacted_2b * 2;
+	lcn -= compacted_2b;
+	amortizedshift = 2;
+out:
+	pos += lcn * (1 << amortizedshift);
+	err = z_erofs_reload_indexes(m, erofs_blknr(pos));
+	if (err)
+		return err;
+	return unpack_compacted_index(m, amortizedshift, erofs_blkoff(pos));
+}
+
+static int vle_load_cluster_from_disk(struct z_erofs_maprecorder *m,
+				      unsigned int lcn)
+{
+	const unsigned int datamode = EROFS_V(m->inode)->datamode;
+
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY)
+		return vle_legacy_load_cluster_from_disk(m, lcn);
+
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION)
+		return compacted_load_cluster_from_disk(m, lcn);
+
+	return -EINVAL;
+}
+
+static int vle_extent_lookback(struct z_erofs_maprecorder *m,
+			       unsigned int lookback_distance)
+{
+	struct erofs_vnode *const vi = EROFS_V(m->inode);
+	struct erofs_map_blocks *const map = m->map;
+	const unsigned int lclusterbits = vi->z_logical_clusterbits;
+	unsigned long lcn = m->lcn;
+	int err;
+
+	if (lcn < lookback_distance) {
+		DBG_BUGON(1);
+		return -EIO;
+	}
+
+	/* load extent head logical cluster if needed */
+	lcn -= lookback_distance;
+	err = vle_load_cluster_from_disk(m, lcn);
+	if (err)
+		return err;
+
+	switch (m->type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		return vle_extent_lookback(m, m->delta[0]);
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+		map->m_flags &= ~EROFS_MAP_ZIPPED;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		map->m_la = (lcn << lclusterbits) | m->clusterofs;
+		break;
+	default:
+		errln("unknown type %u at lcn %lu of nid %llu",
+		      m->type, lcn, vi->nid);
+		DBG_BUGON(1);
+		return -EIO;
+	}
+	return 0;
+}
+
+int z_erofs_map_blocks_iter(struct inode *inode,
+			    struct erofs_map_blocks *map,
+			    int flags)
+{
+	struct erofs_vnode *const vi = EROFS_V(inode);
+	struct z_erofs_maprecorder m = {
+		.inode = inode,
+		.map = map,
+	};
+	int err = 0;
+	unsigned int lclusterbits, endoff;
+	unsigned long long ofs, end;
+
+	trace_z_erofs_map_blocks_iter_enter(inode, map, flags);
+
+	/* when trying to read beyond EOF, leave it unmapped */
+	if (unlikely(map->m_la >= inode->i_size)) {
+		map->m_llen = map->m_la + 1 - inode->i_size;
+		map->m_la = inode->i_size;
+		map->m_flags = 0;
+		goto out;
+	}
+
+	err = fill_inode_lazy(inode);
+	if (err)
+		goto out;
+
+	lclusterbits = vi->z_logical_clusterbits;
+	ofs = map->m_la;
+	m.lcn = ofs >> lclusterbits;
+	endoff = ofs & ((1 << lclusterbits) - 1);
+
+	err = vle_load_cluster_from_disk(&m, m.lcn);
+	if (err)
+		goto unmap_out;
+
+	map->m_flags = EROFS_MAP_ZIPPED;	/* by default, compressed */
+	end = (m.lcn + 1ULL) << lclusterbits;
+
+	switch (m.type) {
+	case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN:
+		if (endoff >= m.clusterofs)
+			map->m_flags &= ~EROFS_MAP_ZIPPED;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_HEAD:
+		if (endoff >= m.clusterofs) {
+			map->m_la = (m.lcn << lclusterbits) | m.clusterofs;
+			break;
+		}
+		/* m.lcn should be >= 1 if endoff < m.clusterofs */
+		if (unlikely(!m.lcn)) {
+			errln("invalid logical cluster 0 at nid %llu",
+			      vi->nid);
+			err = -EIO;
+			goto unmap_out;
+		}
+		end = (m.lcn << lclusterbits) | m.clusterofs;
+		m.delta[0] = 1;
+		/* fallthrough */
+	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
+		/* get the correspoinding first chunk */
+		err = vle_extent_lookback(&m, m.delta[0]);
+		if (unlikely(err))
+			goto unmap_out;
+		break;
+	default:
+		errln("unknown type %u at offset %llu of nid %llu",
+		      m.type, ofs, vi->nid);
+		err = -EIO;
+		goto unmap_out;
+	}
+
+	map->m_llen = end - map->m_la;
+	map->m_plen = 1 << lclusterbits;
+	map->m_pa = blknr_to_addr(m.pblk);
+	map->m_flags |= EROFS_MAP_MAPPED;
+
+unmap_out:
+	if (m.kaddr)
+		kunmap_atomic(m.kaddr);
+
+out:
+	debugln("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
+		__func__, map->m_la, map->m_pa,
+		map->m_llen, map->m_plen, map->m_flags);
+
+	trace_z_erofs_map_blocks_iter_exit(inode, map, flags, err);
+
+	/* aggressively BUG_ON iff CONFIG_EROFS_FS_DEBUG is on */
+	DBG_BUGON(err < 0 && err != -ENOMEM);
+	return err;
+}
+
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 1/8] staging: erofs: add compacted ondisk compression indexes
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch introduces new compacted compression indexes.

In contract to legacy compression indexes that
   each 4k logical cluster has an 8-byte index,
compacted ondisk compression indexes will have
   amortized 2 bytes for each 4k logical cluster (compacted 2B)
   amortized 4 bytes for each 4k logical cluster (compacted 4B)

In detail, several continuous clusters will be encoded in
a compacted pack with cluster types, offsets, and one blkaddr
at the end of the pack to leave 4-byte margin for better
decoding performance, as illustrated below:
   _____________________________________________
  |___@_____ encoded bits __________|_ blkaddr _|
  0       .                                     amortized * vcnt
  .          .
  .             .                   amortized * vcnt - 4
  .                .
  .___________________.
  |_type_|_clusterofs_|

Note that compacted 2 / 4B should be aligned with 32 / 8 bytes
in order to avoid each pack crossing page boundary.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/data.c      |  4 +--
 drivers/staging/erofs/erofs_fs.h  | 57 +++++++++++++++++++++++++------
 drivers/staging/erofs/inode.c     |  5 +--
 drivers/staging/erofs/internal.h  | 11 ++----
 drivers/staging/erofs/unzip_vle.c |  8 ++---
 5 files changed, 56 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c
index 746685f90564..cc31c3e5984c 100644
--- a/drivers/staging/erofs/data.c
+++ b/drivers/staging/erofs/data.c
@@ -124,7 +124,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	trace_erofs_map_blocks_flatmode_enter(inode, map, flags);
 
 	nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE);
-	lastblk = nblocks - is_inode_layout_inline(inode);
+	lastblk = nblocks - is_inode_flat_inline(inode);
 
 	if (unlikely(offset >= inode->i_size)) {
 		/* leave out-of-bound access unmapped */
@@ -139,7 +139,7 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
 	if (offset < blknr_to_addr(lastblk)) {
 		map->m_pa = blknr_to_addr(vi->raw_blkaddr) + map->m_la;
 		map->m_plen = blknr_to_addr(lastblk) - offset;
-	} else if (is_inode_layout_inline(inode)) {
+	} else if (is_inode_flat_inline(inode)) {
 		/* 2 - inode inline B: inode, [xattrs], inline last blk... */
 		struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb);
 
diff --git a/drivers/staging/erofs/erofs_fs.h b/drivers/staging/erofs/erofs_fs.h
index 8ddb2b3e7d39..a05139f1df60 100644
--- a/drivers/staging/erofs/erofs_fs.h
+++ b/drivers/staging/erofs/erofs_fs.h
@@ -49,19 +49,29 @@ struct erofs_super_block {
  * erofs inode data mapping:
  * 0 - inode plain without inline data A:
  * inode, [xattrs], ... | ... | no-holed data
- * 1 - inode VLE compression B:
+ * 1 - inode VLE compression B (legacy):
  * inode, [xattrs], extents ... | ...
  * 2 - inode plain with inline data C:
  * inode, [xattrs], last_inline_data, ... | ... | no-holed data
- * 3~7 - reserved
+ * 3 - inode compression D:
+ * inode, [xattrs], map_header, extents ... | ...
+ * 4~7 - reserved
  */
 enum {
-	EROFS_INODE_LAYOUT_PLAIN,
-	EROFS_INODE_LAYOUT_COMPRESSION,
-	EROFS_INODE_LAYOUT_INLINE,
+	EROFS_INODE_FLAT_PLAIN,
+	EROFS_INODE_FLAT_COMPRESSION_LEGACY,
+	EROFS_INODE_FLAT_INLINE,
+	EROFS_INODE_FLAT_COMPRESSION,
 	EROFS_INODE_LAYOUT_MAX
 };
 
+static bool erofs_inode_is_data_compressed(unsigned int datamode)
+{
+	if (datamode == EROFS_INODE_FLAT_COMPRESSION)
+		return true;
+	return datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY;
+}
+
 /* bit definitions of inode i_advise */
 #define EROFS_I_VERSION_BITS            1
 #define EROFS_I_DATA_MAPPING_BITS       3
@@ -176,11 +186,37 @@ struct erofs_xattr_entry {
 	sizeof(struct erofs_xattr_entry) + \
 	(entry)->e_name_len + le16_to_cpu((entry)->e_value_size))
 
-/* have to be aligned with 8 bytes on disk */
-struct erofs_extent_header {
-	__le32 eh_checksum;
-	__le32 eh_reserved[3];
-} __packed;
+/* available compression algorithm types */
+enum {
+	Z_EROFS_COMPRESSION_LZ4,
+	Z_EROFS_COMPRESSION_MAX
+};
+
+/*
+ * bit 0 : COMPACTED_2B indexes (0 - off; 1 - on)
+ *  e.g. for 4k logical cluster size,      4B        if compacted 2B is off;
+ *                                  (4B) + 2B + (4B) if compacted 2B is on.
+ */
+#define Z_EROFS_ADVISE_COMPACTED_2B_BIT         0
+
+#define Z_EROFS_ADVISE_COMPACTED_2B     (1 << Z_EROFS_ADVISE_COMPACTED_2B_BIT)
+
+struct z_erofs_map_header {
+	__le32	h_reserved1;
+	__le16	h_advise;
+	/*
+	 * bit 0-3 : algorithm type of head 1 (logical cluster type 01);
+	 * bit 4-7 : algorithm type of head 2 (logical cluster type 11).
+	 */
+	__u8	h_algorithmtype;
+	/*
+	 * bit 0-2 : logical cluster bits - 12, e.g. 0 for 4096;
+	 * bit 3-4 : (physical - logical) cluster bits of head 1:
+	 *       For example, if logical clustersize = 4096, 1 for 8192.
+	 * bit 5-7 : (physical - logical) cluster bits of head 2.
+	 */
+	__u8	h_clusterbits;
+};
 
 /*
  * Z_EROFS Variable-sized Logical Extent cluster type:
@@ -270,7 +306,6 @@ static inline void erofs_check_ondisk_layout_definitions(void)
 	BUILD_BUG_ON(sizeof(struct erofs_inode_v2) != 64);
 	BUILD_BUG_ON(sizeof(struct erofs_xattr_ibody_header) != 12);
 	BUILD_BUG_ON(sizeof(struct erofs_xattr_entry) != 4);
-	BUILD_BUG_ON(sizeof(struct erofs_extent_header) != 16);
 	BUILD_BUG_ON(sizeof(struct z_erofs_vle_decompressed_index) != 8);
 	BUILD_BUG_ON(sizeof(struct erofs_dirent) != 12);
 
diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c
index e51348f7e838..3539290b8e45 100644
--- a/drivers/staging/erofs/inode.c
+++ b/drivers/staging/erofs/inode.c
@@ -127,12 +127,9 @@ static int fill_inline_data(struct inode *inode, void *data,
 {
 	struct erofs_vnode *vi = EROFS_V(inode);
 	struct erofs_sb_info *sbi = EROFS_I_SB(inode);
-	const int mode = vi->datamode;
-
-	DBG_BUGON(mode >= EROFS_INODE_LAYOUT_MAX);
 
 	/* should be inode inline C */
-	if (mode != EROFS_INODE_LAYOUT_INLINE)
+	if (!is_inode_flat_inline(inode))
 		return 0;
 
 	/* fast symlink (following ext4) */
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index 1666cceecb3c..c851d0be6cf6 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -382,19 +382,14 @@ static inline unsigned long inode_datablocks(struct inode *inode)
 	return DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
 }
 
-static inline bool is_inode_layout_plain(struct inode *inode)
-{
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_PLAIN;
-}
-
 static inline bool is_inode_layout_compression(struct inode *inode)
 {
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_COMPRESSION;
+	return erofs_inode_is_data_compressed(EROFS_V(inode)->datamode);
 }
 
-static inline bool is_inode_layout_inline(struct inode *inode)
+static inline bool is_inode_flat_inline(struct inode *inode)
 {
-	return EROFS_V(inode)->datamode == EROFS_INODE_LAYOUT_INLINE;
+	return EROFS_V(inode)->datamode == EROFS_INODE_FLAT_INLINE;
 }
 
 extern const struct super_operations erofs_sops;
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index f3d0d2c03939..0db9bc50f67c 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -1643,8 +1643,8 @@ vle_extent_blkaddr(struct inode *inode, pgoff_t index)
 	struct erofs_vnode *vi = EROFS_V(inode);
 
 	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-		vi->xattr_isize) + sizeof(struct erofs_extent_header) +
-		index * sizeof(struct z_erofs_vle_decompressed_index);
+						    vi->xattr_isize) +
+		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
 
 	return erofs_blknr(iloc(sbi, vi->nid) + ofs);
 }
@@ -1656,8 +1656,8 @@ vle_extent_blkoff(struct inode *inode, pgoff_t index)
 	struct erofs_vnode *vi = EROFS_V(inode);
 
 	unsigned int ofs = Z_EROFS_VLE_EXTENT_ALIGN(vi->inode_isize +
-		vi->xattr_isize) + sizeof(struct erofs_extent_header) +
-		index * sizeof(struct z_erofs_vle_decompressed_index);
+						    vi->xattr_isize) +
+		16 + index * sizeof(struct z_erofs_vle_decompressed_index);
 
 	return erofs_blkoff(iloc(sbi, vi->nid) + ofs);
 }
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 4/8] staging: erofs: move stagingpage operations to compress.h
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

stagingpages are behaved as bounce pages for temporary use.
Move to compress.h since the upcoming decompressor will
allocate stagingpages as well.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/compress.h  | 40 +++++++++++++++++++++++++++++++
 drivers/staging/erofs/unzip_vle.c | 11 +++++----
 drivers/staging/erofs/unzip_vle.h | 20 ----------------
 3 files changed, 46 insertions(+), 25 deletions(-)
 create mode 100644 drivers/staging/erofs/compress.h

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
new file mode 100644
index 000000000000..1dcfc3b35118
--- /dev/null
+++ b/drivers/staging/erofs/compress.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * linux/drivers/staging/erofs/compress.h
+ *
+ * Copyright (C) 2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25@huawei.com>
+ */
+#ifndef __EROFS_FS_COMPRESS_H
+#define __EROFS_FS_COMPRESS_H
+
+/*
+ * - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
+ * used to mark temporary allocated pages from other
+ * file/cached pages and NULL mapping pages.
+ */
+#define Z_EROFS_MAPPING_STAGING         ((void *)0x5A110C8D)
+
+/* check if a page is marked as staging */
+static inline bool z_erofs_page_is_staging(struct page *page)
+{
+	return page->mapping == Z_EROFS_MAPPING_STAGING;
+}
+
+static inline bool z_erofs_put_stagingpage(struct list_head *pagepool,
+					   struct page *page)
+{
+	if (!z_erofs_page_is_staging(page))
+		return false;
+
+	/* staging pages should not be used by others at the same time */
+	if (page_ref_count(page) > 1)
+		put_page(page);
+	else
+		list_add(&page->lru, pagepool);
+	return true;
+}
+
+#endif
+
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 08f2d4302ecb..d95f985936b6 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -11,6 +11,7 @@
  * distribution for more details.
  */
 #include "unzip_vle.h"
+#include "compress.h"
 #include <linux/prefetch.h>
 
 #include <trace/events/erofs.h>
@@ -855,7 +856,7 @@ static inline void z_erofs_vle_read_endio(struct bio *bio)
 		DBG_BUGON(PageUptodate(page));
 		DBG_BUGON(!page->mapping);
 
-		if (unlikely(!sbi && !z_erofs_is_stagingpage(page))) {
+		if (unlikely(!sbi && !z_erofs_page_is_staging(page))) {
 			sbi = EROFS_SB(page->mapping->host->i_sb);
 
 			if (time_to_inject(sbi, FAULT_READ_IO)) {
@@ -947,7 +948,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page);
 		DBG_BUGON(!page->mapping);
 
-		if (z_erofs_gather_if_stagingpage(page_pool, page))
+		if (z_erofs_put_stagingpage(page_pool, page))
 			continue;
 
 		if (page_type == Z_EROFS_VLE_PAGE_TYPE_HEAD)
@@ -977,7 +978,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page);
 		DBG_BUGON(!page->mapping);
 
-		if (!z_erofs_is_stagingpage(page)) {
+		if (!z_erofs_page_is_staging(page)) {
 			if (erofs_page_is_managed(sbi, page)) {
 				if (unlikely(!PageUptodate(page)))
 					err = -EIO;
@@ -1055,7 +1056,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 			continue;
 
 		/* recycle all individual staging pages */
-		(void)z_erofs_gather_if_stagingpage(page_pool, page);
+		(void)z_erofs_put_stagingpage(page_pool, page);
 
 		WRITE_ONCE(compressed_pages[i], NULL);
 	}
@@ -1068,7 +1069,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 		DBG_BUGON(!page->mapping);
 
 		/* recycle all individual staging pages */
-		if (z_erofs_gather_if_stagingpage(page_pool, page))
+		if (z_erofs_put_stagingpage(page_pool, page))
 			continue;
 
 		if (unlikely(err < 0))
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 9c53009700cf..6c3e0deb63e7 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -16,26 +16,6 @@
 #include "internal.h"
 #include "unzip_pagevec.h"
 
-/*
- *  - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
- * used for temporary allocated pages (via erofs_allocpage),
- * in order to seperate those from NULL mapping (eg. truncated pages)
- */
-#define Z_EROFS_MAPPING_STAGING		((void *)0x5A110C8D)
-
-#define z_erofs_is_stagingpage(page)	\
-	((page)->mapping == Z_EROFS_MAPPING_STAGING)
-
-static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool,
-						 struct page *page)
-{
-	if (z_erofs_is_stagingpage(page)) {
-		list_add(&page->lru, page_pool);
-		return true;
-	}
-	return false;
-}
-
 /*
  * Structure fields follow one of the following exclusion rules.
  *
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 3/8] staging: erofs: move per-CPU buffers implementation to utils.c
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch moves per-CPU buffers to utils.c in order for
the upcoming generic decompression framework to use it.

Note that I tried to use generic per-CPU buffer or
per-CPU page approaches to clean up further, but obvious
performanace regression (about 2% for sequential read) was
observed.

Therefore let's leave it as it is instead, just move
to utils.c and I'll try to dig into the root cause later.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/internal.h      | 26 ++++++++++++++++++++++
 drivers/staging/erofs/unzip_vle.c     |  5 ++---
 drivers/staging/erofs/unzip_vle.h     |  4 +---
 drivers/staging/erofs/unzip_vle_lz4.c | 31 ++++++++-------------------
 drivers/staging/erofs/utils.c         | 12 +++++++++++
 5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index f3063b13c117..dcbe6f7f5dae 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -321,6 +321,16 @@ static inline void z_erofs_exit_zip_subsystem(void) {}
 
 /* page count of a compressed cluster */
 #define erofs_clusterpages(sbi)         ((1 << (sbi)->clusterbits) / PAGE_SIZE)
+#define Z_EROFS_NR_INLINE_PAGEVECS      3
+
+#if (Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_NR_INLINE_PAGEVECS)
+#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_CLUSTER_MAX_PAGES
+#else
+#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_NR_INLINE_PAGEVECS
+#endif
+
+#else
+#define EROFS_PCPUBUF_NR_PAGES          0
 #endif
 
 typedef u64 erofs_off_t;
@@ -608,6 +618,22 @@ static inline void erofs_vunmap(const void *mem, unsigned int count)
 extern struct shrinker erofs_shrinker_info;
 
 struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp);
+
+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+void *erofs_get_pcpubuf(unsigned int pagenr);
+#define erofs_put_pcpubuf(buf) do { \
+	(void)&(buf);	\
+	preempt_enable();	\
+} while (0)
+#else
+static inline void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+	return ERR_PTR(-ENOTSUPP);
+}
+
+#define erofs_put_pcpubuf(buf) do {} while (0)
+#endif
+
 void erofs_register_super(struct super_block *sb);
 void erofs_unregister_super(struct super_block *sb);
 
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index 8aea938172df..08f2d4302ecb 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -552,8 +552,7 @@ static int z_erofs_vle_work_iter_begin(struct z_erofs_vle_work_builder *builder,
 	if (IS_ERR(work))
 		return PTR_ERR(work);
 got_it:
-	z_erofs_pagevec_ctor_init(&builder->vector,
-				  Z_EROFS_VLE_INLINE_PAGEVECS,
+	z_erofs_pagevec_ctor_init(&builder->vector, Z_EROFS_NR_INLINE_PAGEVECS,
 				  work->pagevec, work->vcnt);
 
 	if (builder->role >= Z_EROFS_VLE_WORK_PRIMARY) {
@@ -936,7 +935,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	for (i = 0; i < nr_pages; ++i)
 		pages[i] = NULL;
 
-	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_VLE_INLINE_PAGEVECS,
+	z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
 				  work->pagevec, 0);
 
 	for (i = 0; i < work->vcnt; ++i) {
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 902e67d04029..9c53009700cf 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -44,8 +44,6 @@ static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool,
  *
  */
 
-#define Z_EROFS_VLE_INLINE_PAGEVECS     3
-
 struct z_erofs_vle_work {
 	struct mutex lock;
 
@@ -58,7 +56,7 @@ struct z_erofs_vle_work {
 
 	union {
 		/* L: pagevec */
-		erofs_vtptr_t pagevec[Z_EROFS_VLE_INLINE_PAGEVECS];
+		erofs_vtptr_t pagevec[Z_EROFS_NR_INLINE_PAGEVECS];
 		struct rcu_head rcu;
 	};
 };
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c
index 0daac9b984a8..399c3e3a3ff3 100644
--- a/drivers/staging/erofs/unzip_vle_lz4.c
+++ b/drivers/staging/erofs/unzip_vle_lz4.c
@@ -34,16 +34,6 @@ static int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
 	return -EIO;
 }
 
-#if Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_VLE_INLINE_PAGEVECS
-#define EROFS_PERCPU_NR_PAGES   Z_EROFS_CLUSTER_MAX_PAGES
-#else
-#define EROFS_PERCPU_NR_PAGES   Z_EROFS_VLE_INLINE_PAGEVECS
-#endif
-
-static struct {
-	char data[PAGE_SIZE * EROFS_PERCPU_NR_PAGES];
-} erofs_pcpubuf[NR_CPUS];
-
 int z_erofs_vle_plain_copy(struct page **compressed_pages,
 			   unsigned int clusterpages,
 			   struct page **pages,
@@ -56,8 +46,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
 	char *percpu_data;
 	bool mirrored[Z_EROFS_CLUSTER_MAX_PAGES] = { 0 };
 
-	preempt_disable();
-	percpu_data = erofs_pcpubuf[smp_processor_id()].data;
+	percpu_data = erofs_get_pcpubuf(0);
 
 	j = 0;
 	for (i = 0; i < nr_pages; j = i++) {
@@ -117,7 +106,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages,
 	if (src && !mirrored[j])
 		kunmap_atomic(src);
 
-	preempt_enable();
+	erofs_put_pcpubuf(percpu_data);
 	return 0;
 }
 
@@ -131,7 +120,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 	unsigned int nr_pages, i, j;
 	int ret;
 
-	if (outlen + pageofs > EROFS_PERCPU_NR_PAGES * PAGE_SIZE)
+	if (outlen + pageofs > EROFS_PCPUBUF_NR_PAGES * PAGE_SIZE)
 		return -ENOTSUPP;
 
 	nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE);
@@ -144,8 +133,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 			return -ENOMEM;
 	}
 
-	preempt_disable();
-	vout = erofs_pcpubuf[smp_processor_id()].data;
+	vout = erofs_get_pcpubuf(0);
 
 	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
 				clusterpages * PAGE_SIZE, outlen);
@@ -174,7 +162,7 @@ int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
 	}
 
 out:
-	preempt_enable();
+	erofs_put_pcpubuf(vout);
 
 	if (clusterpages == 1)
 		kunmap_atomic(vin);
@@ -196,8 +184,7 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
 	int ret;
 
 	if (overlapped) {
-		preempt_disable();
-		vin = erofs_pcpubuf[smp_processor_id()].data;
+		vin = erofs_get_pcpubuf(0);
 
 		for (i = 0; i < clusterpages; ++i) {
 			void *t = kmap_atomic(compressed_pages[i]);
@@ -216,13 +203,13 @@ int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
 	if (ret > 0)
 		ret = 0;
 
-	if (!overlapped) {
+	if (overlapped) {
+		erofs_put_pcpubuf(vin);
+	} else {
 		if (clusterpages == 1)
 			kunmap_atomic(vin);
 		else
 			erofs_vunmap(vin, clusterpages);
-	} else {
-		preempt_enable();
 	}
 	return ret;
 }
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c
index 3e7d30b6de1d..4bbd3bf34acd 100644
--- a/drivers/staging/erofs/utils.c
+++ b/drivers/staging/erofs/utils.c
@@ -27,6 +27,18 @@ struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp)
 	return page;
 }
 
+#if (EROFS_PCPUBUF_NR_PAGES > 0)
+static struct {
+	u8 data[PAGE_SIZE * EROFS_PCPUBUF_NR_PAGES];
+} ____cacheline_aligned_in_smp erofs_pcpubuf[NR_CPUS];
+
+void *erofs_get_pcpubuf(unsigned int pagenr)
+{
+	preempt_disable();
+	return &erofs_pcpubuf[smp_processor_id()].data[pagenr * PAGE_SIZE];
+}
+#endif
+
 /* global shrink count (for all mounted EROFS instances) */
 static atomic_long_t erofs_global_shrink_cnt;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 8/8] staging: erofs: integrate decompression inplace
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

Decompressor needs to know whether it's a partial
or full decompression since only full decompression
can be decompressed in-place.

On kirin980 platform, sequential read is finally
increased to 812MiB/s after decompression inplace
is enabled.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/internal.h  |  3 +++
 drivers/staging/erofs/unzip_vle.c | 15 +++++++++++----
 drivers/staging/erofs/unzip_vle.h |  1 +
 drivers/staging/erofs/zmap.c      |  1 +
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index 6c8767d4a1d5..963cc1b8b896 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -441,6 +441,7 @@ extern const struct address_space_operations z_erofs_vle_normalaccess_aops;
  */
 enum {
 	BH_Zipped = BH_PrivateStart,
+	BH_FullMapped,
 };
 
 /* Has a disk mapping */
@@ -449,6 +450,8 @@ enum {
 #define EROFS_MAP_META		(1 << BH_Meta)
 /* The extent has been compressed */
 #define EROFS_MAP_ZIPPED	(1 << BH_Zipped)
+/* The length of extent is full */
+#define EROFS_MAP_FULL_MAPPED	(1 << BH_FullMapped)
 
 struct erofs_map_blocks {
 	erofs_off_t m_pa, m_la;
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index cb870b83f3c8..316382d33783 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -469,6 +469,9 @@ z_erofs_vle_work_register(const struct z_erofs_vle_work_finder *f,
 				    Z_EROFS_VLE_WORKGRP_FMT_LZ4 :
 				    Z_EROFS_VLE_WORKGRP_FMT_PLAIN);
 
+	if (map->m_flags & EROFS_MAP_FULL_MAPPED)
+		grp->flags |= Z_EROFS_VLE_WORKGRP_FULL_LENGTH;
+
 	/* new workgrps have been claimed as type 1 */
 	WRITE_ONCE(grp->next, *f->owned_head);
 	/* primary and followed work for all new workgrps */
@@ -901,7 +904,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	unsigned int i, outputsize;
 
 	enum z_erofs_page_type page_type;
-	bool overlapped;
+	bool overlapped, partial;
 	struct z_erofs_vle_work *work;
 	int err;
 
@@ -1009,10 +1012,13 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	if (unlikely(err))
 		goto out;
 
-	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen)
+	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen) {
 		outputsize = grp->llen;
-	else
+		partial = !(grp->flags & Z_EROFS_VLE_WORKGRP_FULL_LENGTH);
+	} else {
 		outputsize = (nr_pages << PAGE_SHIFT) - work->pageofs;
+		partial = true;
+	}
 
 	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN)
 		algorithm = Z_EROFS_COMPRESSION_SHIFTED;
@@ -1028,7 +1034,8 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 					.outputsize = outputsize,
 					.alg = algorithm,
 					.inplace_io = overlapped,
-					.partial_decoding = true }, page_pool);
+					.partial_decoding = partial
+				 }, page_pool);
 
 out:
 	/* must handle all compressed pages before endding pages */
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index a2d9b60beebd..ab509d75aefd 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -46,6 +46,7 @@ struct z_erofs_vle_work {
 #define Z_EROFS_VLE_WORKGRP_FMT_PLAIN        0
 #define Z_EROFS_VLE_WORKGRP_FMT_LZ4          1
 #define Z_EROFS_VLE_WORKGRP_FMT_MASK         1
+#define Z_EROFS_VLE_WORKGRP_FULL_LENGTH      2
 
 typedef void *z_erofs_vle_owned_workgrp_t;
 
diff --git a/drivers/staging/erofs/zmap.c b/drivers/staging/erofs/zmap.c
index 77bc49c07846..ce6d955f0112 100644
--- a/drivers/staging/erofs/zmap.c
+++ b/drivers/staging/erofs/zmap.c
@@ -424,6 +424,7 @@ int z_erofs_map_blocks_iter(struct inode *inode,
 			goto unmap_out;
 		}
 		end = (m.lcn << lclusterbits) | m.clusterofs;
+		map->m_flags |= EROFS_MAP_FULL_MAPPED;
 		m.delta[0] = 1;
 		/* fallthrough */
 	case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD:
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 5/8] staging: erofs: introduce generic decompression backend
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch adds a new generic decompression framework
in order to replace the old LZ4-specific decompression code.

Even though LZ4 is still the only supported algorithm, yet
it is more cleaner and easy to integrate new algorithm than
the old almost hard-coded decompression backend.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/Makefile       |   2 +-
 drivers/staging/erofs/compress.h     |  21 ++
 drivers/staging/erofs/decompressor.c | 307 +++++++++++++++++++++++++++
 3 files changed, 329 insertions(+), 1 deletion(-)
 create mode 100644 drivers/staging/erofs/decompressor.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index 84b412c7a991..adeb5d6e2668 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o decompressor.o
 
diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index 1dcfc3b35118..ebeccb1f4eae 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -9,6 +9,24 @@
 #ifndef __EROFS_FS_COMPRESS_H
 #define __EROFS_FS_COMPRESS_H
 
+#include "internal.h"
+
+enum {
+	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
+	Z_EROFS_COMPRESSION_RUNTIME_MAX
+};
+
+struct z_erofs_decompress_req {
+	struct page **in, **out;
+
+	unsigned short pageofs_out;
+	unsigned int inputsize, outputsize;
+
+	/* indicate the algorithm will be used for decompression */
+	unsigned int alg;
+	bool inplace_io, partial_decoding;
+};
+
 /*
  * - 0x5A110C8D ('sallocated', Z_EROFS_MAPPING_STAGING) -
  * used to mark temporary allocated pages from other
@@ -36,5 +54,8 @@ static inline bool z_erofs_put_stagingpage(struct list_head *pagepool,
 	return true;
 }
 
+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+		       struct list_head *pagepool);
+
 #endif
 
diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
new file mode 100644
index 000000000000..c68d17b579e0
--- /dev/null
+++ b/drivers/staging/erofs/decompressor.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * linux/drivers/staging/erofs/decompressor.c
+ *
+ * Copyright (C) 2019 HUAWEI, Inc.
+ *             http://www.huawei.com/
+ * Created by Gao Xiang <gaoxiang25@huawei.com>
+ */
+#include "compress.h"
+#include <linux/lz4.h>
+
+#ifndef LZ4_DISTANCE_MAX	/* history window size */
+#define LZ4_DISTANCE_MAX 65535	/* set to maximum value by default */
+#endif
+
+#define LZ4_MAX_DISTANCE_PAGES	DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+
+struct z_erofs_decompressor {
+	/*
+	 * if destpages have sparsed pages, fill them with bounce pages.
+	 * it also check whether destpages indicate continuous physical memory.
+	 */
+	int (*prepare_destpages)(struct z_erofs_decompress_req *rq,
+				 struct list_head *pagepool);
+	int (*decompress)(struct z_erofs_decompress_req *rq, u8 *out);
+	char *name;
+};
+
+static int lz4_prepare_destpages(struct z_erofs_decompress_req *rq,
+				 struct list_head *pagepool)
+{
+	const unsigned int nr =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	struct page *availables[LZ4_MAX_DISTANCE_PAGES] = { NULL };
+	unsigned long unused[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
+					  BITS_PER_LONG)] = { 0 };
+	void *kaddr = NULL;
+	unsigned int i, j, k;
+
+	for (i = 0; i < nr; ++i) {
+		struct page *const page = rq->out[i];
+
+		j = i & (LZ4_MAX_DISTANCE_PAGES - 1);
+		if (availables[j])
+			__set_bit(j, unused);
+
+		if (page) {
+			if (kaddr) {
+				if (kaddr + PAGE_SIZE == page_address(page))
+					kaddr += PAGE_SIZE;
+				else
+					kaddr = NULL;
+			} else if (!i) {
+				kaddr = page_address(page);
+			}
+			continue;
+		}
+		kaddr = NULL;
+
+		k = find_first_bit(unused, LZ4_MAX_DISTANCE_PAGES);
+		if (k < LZ4_MAX_DISTANCE_PAGES) {
+			j = k;
+			get_page(availables[j]);
+		} else {
+			DBG_BUGON(availables[j]);
+
+			if (!list_empty(pagepool)) {
+				availables[j] = lru_to_page(pagepool);
+				list_del(&availables[j]->lru);
+				DBG_BUGON(page_ref_count(availables[j]) != 1);
+			} else {
+				availables[j] = alloc_pages(GFP_KERNEL, 0);
+				if (!availables[j])
+					return -ENOMEM;
+			}
+			availables[j]->mapping = Z_EROFS_MAPPING_STAGING;
+		}
+		rq->out[i] = availables[j];
+		__clear_bit(j, unused);
+	}
+	return kaddr ? 1 : 0;
+}
+
+static void *generic_copy_inplace_data(struct z_erofs_decompress_req *rq,
+				       u8 *src, unsigned int pageofs_in)
+{
+	/*
+	 * if in-place decompression is ongoing, those decompressed
+	 * pages should be copied in order to avoid being overlapped.
+	 */
+	struct page **in = rq->in;
+	u8 *const tmp = erofs_get_pcpubuf(0);
+	u8 *tmpp = tmp;
+	unsigned int inlen = rq->inputsize - pageofs_in;
+	unsigned int count = min_t(uint, inlen, PAGE_SIZE - pageofs_in);
+
+	while (tmpp < tmp + inlen) {
+		if (!src)
+			src = kmap_atomic(*in);
+		memcpy(tmpp, src + pageofs_in, count);
+		kunmap_atomic(src);
+		src = NULL;
+		tmpp += count;
+		pageofs_in = 0;
+		count = PAGE_SIZE;
+		++in;
+	}
+	return tmp;
+}
+
+static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
+{
+	unsigned int inputmargin, inlen;
+	u8 *src;
+	bool copied;
+	int ret;
+
+	if (rq->inputsize > PAGE_SIZE)
+		return -ENOTSUPP;
+
+	src = kmap_atomic(*rq->in);
+	inputmargin = 0;
+	while (!src[inputmargin & ~PAGE_MASK])
+		if (!(++inputmargin & ~PAGE_MASK))
+			break;
+
+	if (inputmargin >= rq->inputsize) {
+		kunmap_atomic(src);
+		return -EIO;
+	}
+
+	copied = false;
+	inlen = rq->inputsize - inputmargin;
+	if (rq->inplace_io) {
+		src = generic_copy_inplace_data(rq, src, inputmargin);
+		inputmargin = 0;
+		copied = true;
+	}
+
+	ret = LZ4_decompress_safe_partial(src + inputmargin, out,
+					  inlen, rq->outputsize,
+					  rq->outputsize);
+	if (ret < 0) {
+		errln("%s, failed to decompress, in[%p, %u, %u] out[%p, %u]",
+		      __func__, src + inputmargin, inlen, inputmargin,
+		      out, rq->outputsize);
+		WARN_ON(1);
+		print_hex_dump(KERN_DEBUG, "[ in]: ", DUMP_PREFIX_OFFSET,
+			       16, 1, src + inputmargin, inlen, true);
+		print_hex_dump(KERN_DEBUG, "[out]: ", DUMP_PREFIX_OFFSET,
+			       16, 1, out, rq->outputsize, true);
+		ret = -EIO;
+	}
+
+	if (copied)
+		erofs_put_pcpubuf(src);
+	else
+		kunmap_atomic(src);
+	return ret;
+}
+
+static struct z_erofs_decompressor decompressors[] = {
+	[Z_EROFS_COMPRESSION_SHIFTED] = {
+		.name = "shifted"
+	},
+	[Z_EROFS_COMPRESSION_LZ4] = {
+		.prepare_destpages = lz4_prepare_destpages,
+		.decompress = lz4_decompress,
+		.name = "lz4"
+	},
+};
+
+static void copy_from_pcpubuf(struct page **out, const char *dst,
+			      unsigned short pageofs_out,
+			      unsigned int outputsize)
+{
+	const char *end = dst + outputsize;
+	const unsigned int righthalf = PAGE_SIZE - pageofs_out;
+	const char *cur = dst - pageofs_out;
+
+	while (cur < end) {
+		struct page *const page = *out++;
+
+		if (page) {
+			char *buf = kmap_atomic(page);
+
+			if (cur >= dst) {
+				memcpy(buf, cur, min_t(uint, PAGE_SIZE,
+						       end - cur));
+			} else {
+				memcpy(buf + pageofs_out, cur + pageofs_out,
+				       min_t(uint, righthalf, end - cur));
+			}
+			kunmap_atomic(buf);
+		}
+		cur += PAGE_SIZE;
+	}
+}
+
+static int decompress_generic(struct z_erofs_decompress_req *rq,
+			      struct list_head *pagepool)
+{
+	const unsigned int nrpages_out =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	const struct z_erofs_decompressor *alg = decompressors + rq->alg;
+	unsigned int dst_maptype;
+	void *dst;
+	int ret;
+
+	if (nrpages_out == 1 && !rq->inplace_io) {
+		DBG_BUGON(!*rq->out);
+		dst = kmap_atomic(*rq->out);
+		dst_maptype = 0;
+		goto dstmap_out;
+	}
+
+	/*
+	 * For the case of small output size (especially much less
+	 * than PAGE_SIZE), memcpy the decompressed data rather than
+	 * compressed data is preferred.
+	 */
+	if (rq->outputsize <= PAGE_SIZE * 7 / 8) {
+		dst = erofs_get_pcpubuf(0);
+
+		rq->inplace_io = false;
+		ret = alg->decompress(rq, dst);
+		if (!ret)
+			copy_from_pcpubuf(rq->out, dst, rq->pageofs_out,
+					  rq->outputsize);
+
+		erofs_put_pcpubuf(dst);
+		return ret;
+	}
+
+	ret = alg->prepare_destpages(rq, pagepool);
+	if (ret < 0) {
+		return ret;
+	} else if (ret) {
+		dst = page_address(*rq->out);
+		dst_maptype = 1;
+		goto dstmap_out;
+	}
+
+	dst = erofs_vmap(rq->out, nrpages_out);
+	if (!dst)
+		return -ENOMEM;
+	dst_maptype = 2;
+
+dstmap_out:
+	ret = alg->decompress(rq, dst + rq->pageofs_out);
+
+	if (!dst_maptype)
+		kunmap_atomic(dst);
+	else if (dst_maptype == 2)
+		erofs_vunmap(dst, nrpages_out);
+	return ret;
+}
+
+static int shifted_decompress(const struct z_erofs_decompress_req *rq,
+			      struct list_head *pagepool)
+{
+	const unsigned int nrpages_out =
+		PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
+	const unsigned int righthalf = PAGE_SIZE - rq->pageofs_out;
+	unsigned char *src, *dst;
+
+	if (nrpages_out > 2) {
+		DBG_BUGON(1);
+		return -EIO;
+	}
+
+	if (rq->out[0] == *rq->in) {
+		DBG_BUGON(nrpages_out != 1);
+		return 0;
+	}
+
+	src = kmap_atomic(*rq->in);
+	if (!rq->out[0]) {
+		dst = NULL;
+	} else {
+		dst = kmap_atomic(rq->out[0]);
+		memcpy(dst + rq->pageofs_out, src, righthalf);
+	}
+
+	if (rq->out[1] == *rq->in) {
+		memmove(src, src + righthalf, rq->pageofs_out);
+	} else if (nrpages_out == 2) {
+		if (dst)
+			kunmap_atomic(dst);
+		DBG_BUGON(!rq->out[1]);
+		dst = kmap_atomic(rq->out[1]);
+		memcpy(dst, src + righthalf, rq->pageofs_out);
+	}
+	if (dst)
+		kunmap_atomic(dst);
+	kunmap_atomic(src);
+	return 0;
+}
+
+int z_erofs_decompress(struct z_erofs_decompress_req *rq,
+		       struct list_head *pagepool)
+{
+	if (rq->alg == Z_EROFS_COMPRESSION_SHIFTED)
+		return shifted_decompress(rq, pagepool);
+	return decompress_generic(rq, pagepool);
+}
+
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 7/8] staging: erofs: switch to new decompression backend
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

This patch integrates new decompression framework to
erofs decompression path, and remove the old
decompression implementation as well.

On kirin980 platform, sequential read is slightly
improved to 778MiB/s after the new decompression
backend is used.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/Makefile        |   2 +-
 drivers/staging/erofs/internal.h      |   6 -
 drivers/staging/erofs/unzip_vle.c     |  59 +++----
 drivers/staging/erofs/unzip_vle.h     |  15 +-
 drivers/staging/erofs/unzip_vle_lz4.c | 216 --------------------------
 5 files changed, 24 insertions(+), 274 deletions(-)
 delete mode 100644 drivers/staging/erofs/unzip_vle_lz4.c

diff --git a/drivers/staging/erofs/Makefile b/drivers/staging/erofs/Makefile
index adeb5d6e2668..e704d9e51514 100644
--- a/drivers/staging/erofs/Makefile
+++ b/drivers/staging/erofs/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
 ccflags-y += -I $(srctree)/$(src)/include
 erofs-objs := super.o inode.o data.o namei.o dir.o utils.o
 erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
-erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o unzip_vle_lz4.o zmap.o decompressor.o
+erofs-$(CONFIG_EROFS_FS_ZIP) += unzip_vle.o zmap.o decompressor.o
 
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h
index dcbe6f7f5dae..6c8767d4a1d5 100644
--- a/drivers/staging/erofs/internal.h
+++ b/drivers/staging/erofs/internal.h
@@ -321,14 +321,8 @@ static inline void z_erofs_exit_zip_subsystem(void) {}
 
 /* page count of a compressed cluster */
 #define erofs_clusterpages(sbi)         ((1 << (sbi)->clusterbits) / PAGE_SIZE)
-#define Z_EROFS_NR_INLINE_PAGEVECS      3
 
-#if (Z_EROFS_CLUSTER_MAX_PAGES > Z_EROFS_NR_INLINE_PAGEVECS)
 #define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_CLUSTER_MAX_PAGES
-#else
-#define EROFS_PCPUBUF_NR_PAGES          Z_EROFS_NR_INLINE_PAGEVECS
-#endif
-
 #else
 #define EROFS_PCPUBUF_NR_PAGES          0
 #endif
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c
index d95f985936b6..cb870b83f3c8 100644
--- a/drivers/staging/erofs/unzip_vle.c
+++ b/drivers/staging/erofs/unzip_vle.c
@@ -897,12 +897,12 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	unsigned int sparsemem_pages = 0;
 	struct page *pages_onstack[Z_EROFS_VLE_VMAP_ONSTACK_PAGES];
 	struct page **pages, **compressed_pages, *page;
-	unsigned int i, llen;
+	unsigned int algorithm;
+	unsigned int i, outputsize;
 
 	enum z_erofs_page_type page_type;
 	bool overlapped;
 	struct z_erofs_vle_work *work;
-	void *vout;
 	int err;
 
 	might_sleep();
@@ -1009,43 +1009,26 @@ static int z_erofs_vle_unzip(struct super_block *sb,
 	if (unlikely(err))
 		goto out;
 
-	llen = (nr_pages << PAGE_SHIFT) - work->pageofs;
-
-	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN) {
-		err = z_erofs_vle_plain_copy(compressed_pages, clusterpages,
-					     pages, nr_pages, work->pageofs);
-		goto out;
-	}
-
-	if (llen > grp->llen)
-		llen = grp->llen;
-
-	err = z_erofs_vle_unzip_fast_percpu(compressed_pages, clusterpages,
-					    pages, llen, work->pageofs);
-	if (err != -ENOTSUPP)
-		goto out;
-
-	if (sparsemem_pages >= nr_pages)
-		goto skip_allocpage;
-
-	for (i = 0; i < nr_pages; ++i) {
-		if (pages[i])
-			continue;
-
-		pages[i] = __stagingpage_alloc(page_pool, GFP_NOFS);
-	}
-
-skip_allocpage:
-	vout = erofs_vmap(pages, nr_pages);
-	if (!vout) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	err = z_erofs_vle_unzip_vmap(compressed_pages, clusterpages, vout,
-				     llen, work->pageofs, overlapped);
+	if (nr_pages << PAGE_SHIFT >= work->pageofs + grp->llen)
+		outputsize = grp->llen;
+	else
+		outputsize = (nr_pages << PAGE_SHIFT) - work->pageofs;
 
-	erofs_vunmap(vout, nr_pages);
+	if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN)
+		algorithm = Z_EROFS_COMPRESSION_SHIFTED;
+	else
+		algorithm = Z_EROFS_COMPRESSION_LZ4;
+
+	err = z_erofs_decompress(&(struct z_erofs_decompress_req) {
+					.sb = sb,
+					.in = compressed_pages,
+					.out = pages,
+					.pageofs_out = work->pageofs,
+					.inputsize = PAGE_SIZE,
+					.outputsize = outputsize,
+					.alg = algorithm,
+					.inplace_io = overlapped,
+					.partial_decoding = true }, page_pool);
 
 out:
 	/* must handle all compressed pages before endding pages */
diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h
index 6c3e0deb63e7..a2d9b60beebd 100644
--- a/drivers/staging/erofs/unzip_vle.h
+++ b/drivers/staging/erofs/unzip_vle.h
@@ -16,6 +16,8 @@
 #include "internal.h"
 #include "unzip_pagevec.h"
 
+#define Z_EROFS_NR_INLINE_PAGEVECS      3
+
 /*
  * Structure fields follow one of the following exclusion rules.
  *
@@ -189,18 +191,5 @@ static inline void z_erofs_onlinepage_endio(struct page *page)
 	min_t(unsigned int, THREAD_SIZE / 8 / sizeof(struct page *), 96U)
 #define Z_EROFS_VLE_VMAP_GLOBAL_PAGES	2048
 
-/* unzip_vle_lz4.c */
-int z_erofs_vle_plain_copy(struct page **compressed_pages,
-			   unsigned int clusterpages, struct page **pages,
-			   unsigned int nr_pages, unsigned short pageofs);
-int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
-				  unsigned int clusterpages,
-				  struct page **pages, unsigned int outlen,
-				  unsigned short pageofs);
-int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   void *vaddr, unsigned int llen,
-			   unsigned short pageofs, bool overlapped);
-
 #endif
 
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c
deleted file mode 100644
index 399c3e3a3ff3..000000000000
--- a/drivers/staging/erofs/unzip_vle_lz4.c
+++ /dev/null
@@ -1,216 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * linux/drivers/staging/erofs/unzip_vle_lz4.c
- *
- * Copyright (C) 2018 HUAWEI, Inc.
- *             http://www.huawei.com/
- * Created by Gao Xiang <gaoxiang25@huawei.com>
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file COPYING in the main directory of the Linux
- * distribution for more details.
- */
-#include "unzip_vle.h"
-#include <linux/lz4.h>
-
-static int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
-{
-	int ret = LZ4_decompress_safe_partial(in, out, inlen, outlen, outlen);
-
-	if (ret >= 0)
-		return ret;
-
-	/*
-	 * LZ4_decompress_safe_partial will return an error code
-	 * (< 0) if decompression failed
-	 */
-	errln("%s, failed to decompress, in[%p, %zu] outlen[%p, %zu]",
-	      __func__, in, inlen, out, outlen);
-	WARN_ON(1);
-	print_hex_dump(KERN_DEBUG, "raw data [in]: ", DUMP_PREFIX_OFFSET,
-		       16, 1, in, inlen, true);
-	print_hex_dump(KERN_DEBUG, "raw data [out]: ", DUMP_PREFIX_OFFSET,
-		       16, 1, out, outlen, true);
-	return -EIO;
-}
-
-int z_erofs_vle_plain_copy(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   struct page **pages,
-			   unsigned int nr_pages,
-			   unsigned short pageofs)
-{
-	unsigned int i, j;
-	void *src = NULL;
-	const unsigned int righthalf = PAGE_SIZE - pageofs;
-	char *percpu_data;
-	bool mirrored[Z_EROFS_CLUSTER_MAX_PAGES] = { 0 };
-
-	percpu_data = erofs_get_pcpubuf(0);
-
-	j = 0;
-	for (i = 0; i < nr_pages; j = i++) {
-		struct page *page = pages[i];
-		void *dst;
-
-		if (!page) {
-			if (src) {
-				if (!mirrored[j])
-					kunmap_atomic(src);
-				src = NULL;
-			}
-			continue;
-		}
-
-		dst = kmap_atomic(page);
-
-		for (; j < clusterpages; ++j) {
-			if (compressed_pages[j] != page)
-				continue;
-
-			DBG_BUGON(mirrored[j]);
-			memcpy(percpu_data + j * PAGE_SIZE, dst, PAGE_SIZE);
-			mirrored[j] = true;
-			break;
-		}
-
-		if (i) {
-			if (!src)
-				src = mirrored[i - 1] ?
-					percpu_data + (i - 1) * PAGE_SIZE :
-					kmap_atomic(compressed_pages[i - 1]);
-
-			memcpy(dst, src + righthalf, pageofs);
-
-			if (!mirrored[i - 1])
-				kunmap_atomic(src);
-
-			if (unlikely(i >= clusterpages)) {
-				kunmap_atomic(dst);
-				break;
-			}
-		}
-
-		if (!righthalf) {
-			src = NULL;
-		} else {
-			src = mirrored[i] ? percpu_data + i * PAGE_SIZE :
-				kmap_atomic(compressed_pages[i]);
-
-			memcpy(dst + pageofs, src, righthalf);
-		}
-
-		kunmap_atomic(dst);
-	}
-
-	if (src && !mirrored[j])
-		kunmap_atomic(src);
-
-	erofs_put_pcpubuf(percpu_data);
-	return 0;
-}
-
-int z_erofs_vle_unzip_fast_percpu(struct page **compressed_pages,
-				  unsigned int clusterpages,
-				  struct page **pages,
-				  unsigned int outlen,
-				  unsigned short pageofs)
-{
-	void *vin, *vout;
-	unsigned int nr_pages, i, j;
-	int ret;
-
-	if (outlen + pageofs > EROFS_PCPUBUF_NR_PAGES * PAGE_SIZE)
-		return -ENOTSUPP;
-
-	nr_pages = DIV_ROUND_UP(outlen + pageofs, PAGE_SIZE);
-
-	if (clusterpages == 1) {
-		vin = kmap_atomic(compressed_pages[0]);
-	} else {
-		vin = erofs_vmap(compressed_pages, clusterpages);
-		if (!vin)
-			return -ENOMEM;
-	}
-
-	vout = erofs_get_pcpubuf(0);
-
-	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
-				clusterpages * PAGE_SIZE, outlen);
-
-	if (ret < 0)
-		goto out;
-	ret = 0;
-
-	for (i = 0; i < nr_pages; ++i) {
-		j = min((unsigned int)PAGE_SIZE - pageofs, outlen);
-
-		if (pages[i]) {
-			if (clusterpages == 1 &&
-			    pages[i] == compressed_pages[0]) {
-				memcpy(vin + pageofs, vout + pageofs, j);
-			} else {
-				void *dst = kmap_atomic(pages[i]);
-
-				memcpy(dst + pageofs, vout + pageofs, j);
-				kunmap_atomic(dst);
-			}
-		}
-		vout += PAGE_SIZE;
-		outlen -= j;
-		pageofs = 0;
-	}
-
-out:
-	erofs_put_pcpubuf(vout);
-
-	if (clusterpages == 1)
-		kunmap_atomic(vin);
-	else
-		erofs_vunmap(vin, clusterpages);
-
-	return ret;
-}
-
-int z_erofs_vle_unzip_vmap(struct page **compressed_pages,
-			   unsigned int clusterpages,
-			   void *vout,
-			   unsigned int llen,
-			   unsigned short pageofs,
-			   bool overlapped)
-{
-	void *vin;
-	unsigned int i;
-	int ret;
-
-	if (overlapped) {
-		vin = erofs_get_pcpubuf(0);
-
-		for (i = 0; i < clusterpages; ++i) {
-			void *t = kmap_atomic(compressed_pages[i]);
-
-			memcpy(vin + PAGE_SIZE * i, t, PAGE_SIZE);
-			kunmap_atomic(t);
-		}
-	} else if (clusterpages == 1) {
-		vin = kmap_atomic(compressed_pages[0]);
-	} else {
-		vin = erofs_vmap(compressed_pages, clusterpages);
-	}
-
-	ret = z_erofs_unzip_lz4(vin, vout + pageofs,
-				clusterpages * PAGE_SIZE, llen);
-	if (ret > 0)
-		ret = 0;
-
-	if (overlapped) {
-		erofs_put_pcpubuf(vin);
-	} else {
-		if (clusterpages == 1)
-			kunmap_atomic(vin);
-		else
-			erofs_vunmap(vin, clusterpages);
-	}
-	return ret;
-}
-
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 6/8] staging: erofs: introduce LZ4 decompression inplace
From: Gao Xiang @ 2019-06-20 16:07 UTC (permalink / raw)
  To: Chao Yu, Greg Kroah-Hartman
  Cc: devel, LKML, linux-fsdevel, linux-erofs, Chao Yu, Fang Wei,
	Miao Xie, Du Wei, Gao Xiang
In-Reply-To: <20190620160719.240682-1-gaoxiang25@huawei.com>

compressed data will be usually loaded into last pages of
the extent (the last page for 4k) for in-place decompression
(more specifically, in-place IO), as ilustration below,

         start of compressed logical extent
           |                          end of this logical extent
           |                           |
     ______v___________________________v________
... |  page 6  |  page 7  |  page 8  |  page 9  | ...
    |__________|__________|__________|__________|
           .                         ^ .        ^
           .                         |compressed|
           .                         |   data   |
           .                           .        .
           |<          dstsize        >|<margin>|
                                       oend     iend
           op                        ip

Therefore, it's possible to do decompression inplace (thus no
memcpy at all) if the margin is sufficient and safe enough [1],
and it can be implemented only for fixed-size output compression
compared with fixed-size input compression.

No memcpy for most of in-place IO (about 99% of enwik9) after
decompression inplace is implemented and sequential read will
be improved of course (see the following patches for test results).

[1] https://github.com/lz4/lz4/commit/b17f578a919b7e6b078cede2d52be29dd48c8e8c
    https://github.com/lz4/lz4/commit/5997e139f53169fa3a1c1b4418d2452a90b01602

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
---
 drivers/staging/erofs/compress.h     |  1 +
 drivers/staging/erofs/decompressor.c | 21 ++++++++++++++++++---
 drivers/staging/erofs/erofs_fs.h     |  3 ++-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/erofs/compress.h b/drivers/staging/erofs/compress.h
index ebeccb1f4eae..c43aa3374d28 100644
--- a/drivers/staging/erofs/compress.h
+++ b/drivers/staging/erofs/compress.h
@@ -17,6 +17,7 @@ enum {
 };
 
 struct z_erofs_decompress_req {
+	struct super_block *sb;
 	struct page **in, **out;
 
 	unsigned short pageofs_out;
diff --git a/drivers/staging/erofs/decompressor.c b/drivers/staging/erofs/decompressor.c
index c68d17b579e0..689fb8ec7032 100644
--- a/drivers/staging/erofs/decompressor.c
+++ b/drivers/staging/erofs/decompressor.c
@@ -14,6 +14,9 @@
 #endif
 
 #define LZ4_MAX_DISTANCE_PAGES	DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE)
+#ifndef LZ4_DECOMPRESS_INPLACE_MARGIN
+#define LZ4_DECOMPRESS_INPLACE_MARGIN(srcsize)  (((srcsize) >> 8) + 32)
+#endif
 
 struct z_erofs_decompressor {
 	/*
@@ -132,9 +135,21 @@ static int lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out)
 	copied = false;
 	inlen = rq->inputsize - inputmargin;
 	if (rq->inplace_io) {
-		src = generic_copy_inplace_data(rq, src, inputmargin);
-		inputmargin = 0;
-		copied = true;
+		const uint oend = (rq->pageofs_out +
+				   rq->outputsize) & ~PAGE_MASK;
+		const uint nr = PAGE_ALIGN(rq->pageofs_out +
+					   rq->outputsize) >> PAGE_SHIFT;
+
+		if (rq->partial_decoding ||
+		    !(EROFS_SB(rq->sb)->requirements &
+		      EROFS_REQUIREMENT_LZ4_0PADDING) ||
+		    rq->out[nr - 1] != rq->in[0] ||
+		    rq->inputsize - oend <
+		      LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) {
+			src = generic_copy_inplace_data(rq, src, inputmargin);
+			inputmargin = 0;
+			copied = true;
+		}
 	}
 
 	ret = LZ4_decompress_safe_partial(src + inputmargin, out,
diff --git a/drivers/staging/erofs/erofs_fs.h b/drivers/staging/erofs/erofs_fs.h
index a05139f1df60..353322a3206c 100644
--- a/drivers/staging/erofs/erofs_fs.h
+++ b/drivers/staging/erofs/erofs_fs.h
@@ -21,7 +21,8 @@
  * Any bits that aren't in EROFS_ALL_REQUIREMENTS should be
  * incompatible with this kernel version.
  */
-#define EROFS_ALL_REQUIREMENTS  0
+#define EROFS_REQUIREMENT_LZ4_0PADDING	0x00000001
+#define EROFS_ALL_REQUIREMENTS		EROFS_REQUIREMENT_LZ4_0PADDING
 
 struct erofs_super_block {
 /*  0 */__le32 magic;           /* in the little endian */
-- 
2.17.1


^ permalink raw reply related

* [Xen-devel] [PATCH] mm: fix regression with deferred struct page init
From: Juergen Gross @ 2019-06-20 16:08 UTC (permalink / raw)
  To: xen-devel, linux-mm, linux-kernel; +Cc: Juergen Gross, Alexander Duyck

Commit 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time
instead of doing larger sections") is causing a regression on some
systems when the kernel is booted as Xen dom0.

The system will just hang in early boot.

Reason is an endless loop in get_page_from_freelist() in case the first
zone looked at has no free memory. deferred_grow_zone() is always
returning true due to the following code snipplet:

  /* If the zone is empty somebody else may have cleared out the zone */
  if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
                                           first_deferred_pfn)) {
          pgdat->first_deferred_pfn = ULONG_MAX;
          pgdat_resize_unlock(pgdat, &flags);
          return true;
  }

This in turn results in the loop as get_page_from_freelist() is
assuming forward progress can be made by doing some more struct page
initialization.

Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Fixes: 0e56acae4b4dd4a9 ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..8e3bc949ebcc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1826,7 +1826,8 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
 						 first_deferred_pfn)) {
 		pgdat->first_deferred_pfn = ULONG_MAX;
 		pgdat_resize_unlock(pgdat, &flags);
-		return true;
+		/* Retry only once. */
+		return first_deferred_pfn != ULONG_MAX;
 	}
 
 	/*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related

* Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
From: Yang Shi @ 2019-06-20 16:08 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: akpm, mgorman, linux-mm, linux-kernel, Eric Dumazet,
	David S. Miller, netdev
In-Reply-To: <d81b36bb-876e-917a-6115-cedf496b4923@suse.cz>



On 6/20/19 12:18 AM, Vlastimil Babka wrote:
> On 6/19/19 8:19 PM, Yang Shi wrote:
>>>>> This is getting even more muddy TBH. Is there any reason that we
>>>>> have to
>>>>> handle this problem during the isolation phase rather the migration?
>>>> I think it was already said that if pages can't be isolated, then
>>>> migration phase won't process them, so they're just ignored.
>>> Yes,exactly.
>>>
>>>> However I think the patch is wrong to abort immediately when
>>>> encountering such page that cannot be isolated (AFAICS). IMHO it should
>>>> still try to migrate everything it can, and only then return -EIO.
>>> It is fine too. I don't see mbind semantics define how to handle such
>>> case other than returning -EIO.
> I think it does. There's:
> If MPOL_MF_MOVE is specified in flags, then the kernel *will attempt to
> move all the existing pages* ... If MPOL_MF_STRICT is also specified,
> then the call fails with the error *EIO if some pages could not be moved*
>
> Aborting immediately would be against the attempt to move all.
>
>> By looking into the code, it looks not that easy as what I thought.
>> do_mbind() would check the return value of queue_pages_range(), it just
>> applies the policy and manipulates vmas as long as the return value is 0
>> (success), then migrate pages on the list. We could put the movable
>> pages on the list by not breaking immediately, but they will be ignored.
>> If we migrate the pages regardless of the return value, it may break the
>> policy since the policy will *not* be applied at all.
> I think we just need to remember if there was at least one page that
> failed isolation or migration, but keep working, and in the end return
> EIO if there was such page(s). I don't think it breaks the policy. Once
> pages are allocated in a mapping, changing the policy is a best effort
> thing anyway.

The current behavior is:
If queue_pages_range() return -EIO (vma is not migratable, ignore other 
conditions since we just focus on page migration), the policy won't be 
set and no page will be migrated.

However, the problem here is the vma might look migratable, but some or 
all the underlying pages are unmovable. So, my patch assumes the vma is 
*not* migratable if at least one page is unmovable. I'm not sure if it 
is possible to have both movable and unmovable pages for the same 
mapping or not, I'm supposed the vma would be split much earlier.

If we don't abort immediately, then we record if there is unmovable 
page, then we could do:
#1. Still follows the current behavior (then why not abort immediately?)
#2. Set mempolicy then migrate all the migratable pages. But, we may end 
up with the pages on node A, but the policy says node B. Doesn't it 
break the policy?

>
>>>


^ permalink raw reply

* Re: [PATCH] ASoC: rk3399_gru_sound: Support 32, 44.1 and 88.2 kHz sample rates
From: Enric Balletbo Serra @ 2019-06-20 16:08 UTC (permalink / raw)
  To: Mark Brown
  Cc: alsa-devel, Collabora Kernel ML, Heiko Stuebner, Xing Zheng,
	linux-kernel, Takashi Iwai, Liam Girdwood,
	open list:ARM/Rockchip SoC..., Enric Balletbo i Serra,
	Jaroslav Kysela, Benson Leung, Linux ARM
In-Reply-To: <20190620154150.GE5316@sirena.org.uk>

Hi Mark,

Missatge de Mark Brown <broonie@kernel.org> del dia dj., 20 de juny
2019 a les 17:42:
>
> On Thu, Jun 20, 2019 at 03:47:08PM +0200, Enric Balletbo i Serra wrote:
> > According to the datasheet the max98357a also supports 32, 44.1 and
> > 88.2 kHz sample rate. This support was also introduced recently by
> > commit fdf34366d324 ("ASoC: max98357a: add missing supported rates").
> > This patch adds support for these rates also for the machine driver so
> > we get rid of the errors like the below and we are able to play files
> > using these sample rates.
>
> Does the machine actually need to validate this at all?  The component
> drivers can all apply whatever constraints are needed and do their own
> validation, the machine driver is just getting in the way here.

I think you have reason, I'll test by removing these validation and
respin the patch.

Thanks,
~ Enric

> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [U-Boot] [PATCH] fastboot: Remove "bootloader-version" variable
From: Sam Protsenko @ 2019-06-20 16:08 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <CAByghJZvbGKbz8m2-nzGNXoauOSbmBQ7X+zn2xYf_GqnPiZ9Eg@mail.gmail.com>

Hi Igor,

On Thu, Jun 20, 2019 at 5:55 PM Igor Opaniuk <igor.opaniuk@gmail.com> wrote:
>
> Hi Sam,
>
> On Thu, Jun 20, 2019 at 5:00 PM Sam Protsenko
> <semen.protsenko@linaro.org> wrote:
> >
> > As per [1], there is no such fastboot variable as "bootloader-version".
> > Only "version-bootloader" is supported. Let's reflect this and not
> > confuse users further.
> >
> > [1] https://android.googlesource.com/platform/system/core/+/master/fastboot/README.md#client-variables
> >
> > Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
> > ---
> >  doc/README.android-fastboot  | 4 ++--
> >  drivers/fastboot/fb_getvar.c | 9 +++------
> >  2 files changed, 5 insertions(+), 8 deletions(-)
> >
> > diff --git a/doc/README.android-fastboot b/doc/README.android-fastboot
> > index 431191c473..ce852a4fd1 100644
> > --- a/doc/README.android-fastboot
> > +++ b/doc/README.android-fastboot
> > @@ -169,8 +169,8 @@ On the client side you can fetch the bootloader version for instance:
> >
> >  ::
> >
> > -   $ fastboot getvar bootloader-version
> > -   bootloader-version: U-Boot 2014.04-00005-gd24cabc
> > +   $ fastboot getvar version-bootloader
> > +   version-bootloader: U-Boot 2014.04-00005-gd24cabc
> >     finished. total time: 0.000s
> >
> >  or initiate a reboot:
> > diff --git a/drivers/fastboot/fb_getvar.c b/drivers/fastboot/fb_getvar.c
> > index fd0823b2bf..ebe5c8a104 100644
> > --- a/drivers/fastboot/fb_getvar.c
> > +++ b/drivers/fastboot/fb_getvar.c
> > @@ -12,7 +12,7 @@
> >  #include <version.h>
> >
> >  static void getvar_version(char *var_parameter, char *response);
> > -static void getvar_bootloader_version(char *var_parameter, char *response);
> > +static void getvar_version_bootloader(char *var_parameter, char *response);
> >  static void getvar_downloadsize(char *var_parameter, char *response);
> >  static void getvar_serialno(char *var_parameter, char *response);
> >  static void getvar_version_baseband(char *var_parameter, char *response);
> > @@ -37,12 +37,9 @@ static const struct {
> >         {
> >                 .variable = "version",
> >                 .dispatch = getvar_version
> > -       }, {
> > -               .variable = "bootloader-version",
> > -               .dispatch = getvar_bootloader_version
> >         }, {
> >                 .variable = "version-bootloader",
> > -               .dispatch = getvar_bootloader_version
> > +               .dispatch = getvar_version_bootloader
> >         }, {
> >                 .variable = "downloadsize",
> >                 .dispatch = getvar_downloadsize
> > @@ -131,7 +128,7 @@ static void getvar_version(char *var_parameter, char *response)
> >         fastboot_okay(FASTBOOT_VERSION, response);
> >  }
> >
> > -static void getvar_bootloader_version(char *var_parameter, char *response)
> > +static void getvar_version_bootloader(char *var_parameter, char *response)
> >  {
> >         fastboot_okay(U_BOOT_VERSION, response);
> >  }
> > --
> > 2.20.1
> >
>
> My two cents here,
>
> Based on the commit messages from "git log --grep=bootloader-version"
> people prefer to use "bootloader-version" instead of
> "version-bootloader", and totally removing it will probably affect
> usual workflow with fastboot (probably someone will be suprised that
> "bootloader-version" doesn't work anymore), including some CI automate
> testing etc (if there is any);
>

We need to decide what has more value in this particular case:
  1. Keeping protocol clean, correct and up-to-date
  2. Supporting all erroneous choices we've done before

If we follow golden rule of kernel, (2) is proffered. But I don't
think in this particular case a lot of harm will be done. So from my
POV (1) is preferred. Otherwise we can clutter the protocol, causing
some confusion.

You can check fastboot project in AOSP [1]

    $ git log -S'bootloader-version' -- fastboot/

No occurrences, ever. AOSP is unaware we have 'bootloader-version'
variable in U-Boot, but AOSP defines 'version-bootloader' variable,
for the same stuff. I would really prefer we avoid using some weird
undocumented stuff, and I think in this particular case the
cleanliness of protocol overrules golden rule of kernel development,
as it seems relatively easy to fix that command (matter of one sed
execution).

[1] https://android.googlesource.com/platform/system/core/+/master/

> I think I does make sense to involve all of them to this discussion
> also (already added to CC).
>
> --
> Best regards - Freundliche Grüsse - Meilleures salutations
>
> Igor Opaniuk
>
> mailto: igor.opaniuk at gmail.com
> skype: igor.opanyuk
> +380 (93) 836 40 67
> http://ua.linkedin.com/in/iopaniuk

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.