[PATCH v9 00/15] Enable build of full Xen for RISC-V

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v9 00/15] Enable build of full Xen for RISC-V
@ 2024-05-06 10:15 Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 01/15] xen/riscv: disable unnecessary configs Oleksii Kurochko
                   ` (14 more replies)
  0 siblings, 15 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Doug Goldstein, Stefano Stabellini,
	Alistair Francis, Bob Eshleman, Connor Davis, Daniel P. Smith,
	Roger Pau Monné, Ross Lagerwall, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Shawn Anastasio, Rahul Singh,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu

This patch series performs all of the additions necessary to drop the
build overrides for RISCV and enable the full Xen build. Except in cases
where compatibile implementations already exist (e.g. atomic.h and
bitops.h), the newly added definitions are simple.

The patch series is based on the following patch series:
- [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() [1]
- [PATCH] move __read_mostly to xen/cache.h  [2]

Right now, the patch series doesn't have a direct dependency on [2] and it
provides __read_mostly in the patch:
    [PATCH v3 26/34] xen/riscv: add definition of __read_mostly
However, it will be dropped as soon as [2] is merged or at least when the
final version of the patch [2] is provided.
As an option, it can be considered to merge arch-specific patch and then just
rebase [2] and drop arch-specific changes.

[1] https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.cooper3@citrix.com/T/#t
[2] https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b333e@suse.com/

---
Changes in V9:
 - Patch was merged to staging:
    - [PATCH v8 07/17] xen/riscv: introduce io.h
  	- [PATCH v7 14/19] xen/riscv: add minimal stuff to page.h to build full Xen
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V8:
 - Patch was merged to staging:
    - [PATCH v7 01/19] automation: introduce fixed randconfig for RISC-V
    - [PATCH v7 03/19] xen/riscv: introduce extenstion support check by compiler
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
 - Update the commit message:
     - drop the dependency from STATIC_ASSERT_UNREACHABLE() implementation.
     - Add suggestion to merge arch-specific changes related to __read_mostly.
---
Changes in V7:
 - Patch was merged to staging:
   [PATCH v6 15/20] xen/riscv: add minimal stuff to processor.h to build full Xen.
 - Other changes are specific to specific patches. Please look at changes for
   specific patch.
---
Changes in V6:
 - Update the cover letter message: drop already merged dependecies and add
   a new one.
 - Patches were merged to staging:
   - [PATCH v5 02/23] xen/riscv: use some asm-generic headers ( even v4 was
     merged to staging branch, I just wasn't apply changes on top of the latest staging branch )
   - [PATCH v5 03/23] xen/riscv: introduce nospec.h
   - [PATCH v5 10/23] xen/riscv: introduces acrquire, release and full barriers
 - Introduce new patches:
   - xen/riscv: introduce extenstion support check by compiler
   - xen/bitops: put __ffs() and ffz() into linux compatible header
   - xen/bitops: implement fls{l}() in common logic
 - The following patches were dropped:
   - drop some patches related to bitops operations as they were introduced in another
     patch series [...]
   - introduce new version for generic __ffs(), ffz() and fls{l}().
 - Merge patch from patch series "[PATCH v9 0/7]  Introduce generic headers" to this patch
   series as only one patch left in the generic headers patch series and it is more about
   RISC-V.
 - Other changes are specific to specific patches. please look at specific patch.
---
Changes in V5:
 - Update the cover letter as one of the dependencies were merged to staging.
 - Was introduced asm-generic for atomic ops and separate patches for asm-generic bit ops
 - Moved fence.h to separate patch to deal with some patches dependecies on fence.h
 - Patches were dropped as they were merged to staging:
   * [PATCH v4 03/30] xen: add support in public/hvm/save.h for PPC and RISC-V
   * [PATCH v4 04/30] xen/riscv: introduce cpufeature.h
   * [PATCH v4 05/30] xen/riscv: introduce guest_atomics.h
   * [PATCH v4 06/30] xen: avoid generation of empty asm/iommu.h
   * [PATCH v4 08/30] xen/riscv: introduce setup.h
   * [PATCH v4 10/30] xen/riscv: introduce flushtlb.h
   * [PATCH v4 11/30] xen/riscv: introduce smp.h
   * [PATCH v4 15/30] xen/riscv: introduce irq.h
   * [PATCH v4 16/30] xen/riscv: introduce p2m.h
   * [PATCH v4 17/30] xen/riscv: introduce regs.h
   * [PATCH v4 18/30] xen/riscv: introduce time.h
   * [PATCH v4 19/30] xen/riscv: introduce event.h
   * [PATCH v4 22/30] xen/riscv: define an address of frame table
 - Other changes are specific to specific patches. please look at specific patch
---
Changes in V4:
 - Update the cover letter message: new patch series dependencies.
 - Some patches were merged to staging, so they were dropped in this patch series:
     [PATCH v3 09/34] xen/riscv: introduce system.h
     [PATCH v3 18/34] xen/riscv: introduce domain.h
     [PATCH v3 19/34] xen/riscv: introduce guest_access.h
 - Was sent out of this patch series:
     [PATCH v3 16/34] xen/lib: introduce generic find next bit operations
 - [PATCH v3 17/34] xen/riscv: add compilation of generic find-next-bit.c was
   droped as CONFIG_GENERIC_FIND_NEXT_BIT was dropped.
 - All other changes are specific to a specific patch.
---
Changes in V3:
 - Update the cover letter message
 - The following patches were dropped as they were merged to staging:
    [PATCH v2 03/39] xen/riscv:introduce asm/byteorder.h
    [PATCH v2 04/39] xen/riscv: add public arch-riscv.h
    [PATCH v2 05/39] xen/riscv: introduce spinlock.h
    [PATCH v2 20/39] xen/riscv: define bug frame tables in xen.lds.S
    [PATCH v2 34/39] xen: add RISCV support for pmu.h
    [PATCH v2 35/39] xen: add necessary headers to common to build full Xen for RISC-V
 - Instead of the following patches were introduced new:
    [PATCH v2 10/39] xen/riscv: introduce asm/iommu.h
    [PATCH v2 11/39] xen/riscv: introduce asm/nospec.h
 - remove "asm/"  for commit messages which start with "xen/riscv:"
 - code style updates.
 - add emulation of {cmp}xchg_* for 1 and 2 bytes types.
 - code style fixes.
 - add SPDX and footer for the newly added headers.
 - introduce generic find-next-bit.c.
 - some other mionor changes. ( details please find in a patch )
---
Changes in V2:
  - Drop the following patches as they are the part of [2]:
      [PATCH v1 06/57] xen/riscv: introduce paging.h
      [PATCH v1 08/57] xen/riscv: introduce asm/device.h
      [PATCH v1 10/57] xen/riscv: introduce asm/grant_table.h
      [PATCH v1 12/57] xen/riscv: introduce asm/hypercall.h
      [PATCH v1 13/57] xen/riscv: introduce asm/iocap.h
      [PATCH v1 15/57] xen/riscv: introduce asm/mem_access.h
      [PATCH v1 18/57] xen/riscv: introduce asm/random.h
      [PATCH v1 21/57] xen/riscv: introduce asm/xenoprof.h
      [PATCH v1 24/57] xen/riscv: introduce asm/percpu.h
      [PATCH v1 29/57] xen/riscv: introduce asm/hardirq.h
      [PATCH v1 33/57] xen/riscv: introduce asm/altp2m.h
      [PATCH v1 38/57] xen/riscv: introduce asm/monitor.h
      [PATCH v1 39/57] xen/riscv: introduce asm/numa.h
      [PATCH v1 42/57] xen/riscv: introduce asm/softirq.h
  - xen/lib.h in most of the cases were changed to xen/bug.h as
    mostly functionilty of bug.h is used.
  - align arch-riscv.h with Arm's version of it.
  - change the Author of commit with introduction of asm/atomic.h.
  - update some definition from spinlock.h.
  - code style changes.
---

Oleksii Kurochko (15):
  xen/riscv: disable unnecessary configs
  xen: introduce generic non-atomic test_*bit()
  xen/bitops: implement fls{l}() in common logic
  xen/bitops: put __ffs() into linux compatible header
  xen/riscv: introduce bitops.h
  xen/riscv: introduce cmpxchg.h
  xen/riscv: introduce atomic.h
  xen/riscv: introduce monitor.h
  xen/riscv: add definition of __read_mostly
  xen/riscv: add required things to current.h
  xen/riscv: add minimal stuff to mm.h to build full Xen
  xen/riscv: introduce vm_event_*() functions
  xen/riscv: add minimal amount of stubs to build full Xen
  xen/riscv: enable full Xen build
  xen/README: add compiler and binutils versions for RISC-V64

 README                                  |   4 +
 automation/gitlab-ci/build.yaml         |   4 +
 xen/arch/arm/arm64/livepatch.c          |   1 -
 xen/arch/arm/include/asm/arm32/bitops.h |   2 +-
 xen/arch/arm/include/asm/arm64/bitops.h |  27 +-
 xen/arch/arm/include/asm/bitops.h       |  74 +----
 xen/arch/ppc/include/asm/bitops.h       |  76 -----
 xen/arch/ppc/include/asm/page.h         |   2 +-
 xen/arch/ppc/mm-radix.c                 |   2 +-
 xen/arch/riscv/Makefile                 |  18 +-
 xen/arch/riscv/arch.mk                  |   4 -
 xen/arch/riscv/configs/tiny64_defconfig |  12 +-
 xen/arch/riscv/early_printk.c           | 168 ----------
 xen/arch/riscv/include/asm/atomic.h     | 280 ++++++++++++++++
 xen/arch/riscv/include/asm/bitops.h     | 137 ++++++++
 xen/arch/riscv/include/asm/cache.h      |   2 +
 xen/arch/riscv/include/asm/cmpxchg.h    | 239 ++++++++++++++
 xen/arch/riscv/include/asm/config.h     |   2 +
 xen/arch/riscv/include/asm/current.h    |  19 ++
 xen/arch/riscv/include/asm/domain.h     |   2 +
 xen/arch/riscv/include/asm/mm.h         | 240 ++++++++++++++
 xen/arch/riscv/include/asm/monitor.h    |  26 ++
 xen/arch/riscv/include/asm/p2m.h        |   2 +
 xen/arch/riscv/mm.c                     |  52 ++-
 xen/arch/riscv/setup.c                  |  10 +-
 xen/arch/riscv/stubs.c                  | 415 ++++++++++++++++++++++++
 xen/arch/riscv/traps.c                  |  25 ++
 xen/arch/riscv/vm_event.c               |  19 ++
 xen/arch/x86/include/asm/bitops.h       |  37 +--
 xen/common/bitops.c                     |  22 ++
 xen/common/page_alloc.c                 |   4 +-
 xen/drivers/passthrough/arm/smmu-v3.c   |   2 +
 xen/include/asm-generic/atomic-ops.h    |  97 ++++++
 xen/include/xen/bitops.h                | 158 +++++++++
 xen/include/xen/linux-compat.h          |   2 +
 xen/include/xen/types.h                 |   6 +
 xen/lib/find-next-bit.c                 |   3 +
 37 files changed, 1810 insertions(+), 385 deletions(-)
 create mode 100644 xen/arch/riscv/include/asm/atomic.h
 create mode 100644 xen/arch/riscv/include/asm/bitops.h
 create mode 100644 xen/arch/riscv/include/asm/cmpxchg.h
 create mode 100644 xen/arch/riscv/include/asm/monitor.h
 create mode 100644 xen/arch/riscv/stubs.c
 create mode 100644 xen/arch/riscv/vm_event.c
 create mode 100644 xen/include/asm-generic/atomic-ops.h

-- 
2.45.0



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v9 01/15] xen/riscv: disable unnecessary configs
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit() Oleksii Kurochko
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Doug Goldstein, Stefano Stabellini,
	Alistair Francis, Bob Eshleman, Connor Davis, Daniel P. Smith

Disables unnecessary configs for two cases:
1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs).
2. By using tiny64_defconfig for non-randconfig builds.

Only configs which lead to compilation issues were disabled.

Remove lines related to disablement of configs which aren't affected
compilation:
 -# CONFIG_SCHED_CREDIT is not set
 -# CONFIG_SCHED_RTDS is not set
 -# CONFIG_SCHED_NULL is not set
 -# CONFIG_SCHED_ARINC653 is not set
 -# CONFIG_TRACEBUFFER is not set
 -# CONFIG_HYPFS is not set
 -# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set

To allow CONFIG_ARGO build happy it was included <asm/p2m.h> to <asm/domain.h>
as ARGO requires p2m_type_t ( p2m_ram_rw ) and declaration of
check_get_page_from_gfn() from xen/p2m-common.h.

Also, it was included <xen/errno.h> to asm/p2m.h as after the latter was
included to <asm/domain.h> the compilation error that EINVAL, EOPNOTSUPP
aren't declared started to occur.

CONFIG_XSM=n as it requires an introduction of:
* boot_module_find_by_kind()
* BOOTMOD_XSM
* struct bootmodule
* copy_from_paddr()
The mentioned things aren't introduced now.

CPU_BOOT_TIME_CPUPOOLS requires an introduction of cpu_physical_id() and
acpi_disabled, so it is disabled for now.

PERF_COUNTERS requires asm/perf.h and asm/perfc-defn.h, so it is
also disabled for now, as RISC-V hasn't introduced this headers yet.

LIVEPATCH isn't ready for RISC-V too and it can be overriden by randconfig,
so to avoid compilation errors for randconfig it is disabled for now.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V9:
 - update the commit message: add info about LIVEPATCH and PERF_COUNTERS.
---
Changes in V8:
 - disabled CPU_BOOT_TIME_CPUPOOLS as it requires an introduction of cpu_physical_id() and acpi_disabled.
 - leave XSM disabled, add explanation in the commit message.
 - drop HYPFS as the patch was provided to resolve compilation issue when this condif is enabled for RISC-V.
 - include asm/p2m.h to asm/domain.h, and xen/errno.h to asm/p2m.h to drop ARGO config from
   tiny64_defconfing and build.yaml.
 - update the commit message.
---
Changes in V7:
 - Disable only configs which cause compilation issues.
 - Update the commit message.
---
Changes in V6:
 - Nothing changed. Only rebase.
---
Changes in V5:
 - Rebase and drop duplicated configs in EXTRA_FIXED_RANDCONFIG list
 - Update the commit message
---
Changes in V4:
 - Nothing changed. Only rebase
---
Changes in V3:
 - Remove EXTRA_FIXED_RANDCONFIG for non-randconfig jobs.
   For non-randconfig jobs, it is sufficient to disable configs by using the defconfig.
 - Remove double blank lines in build.yaml file before archlinux-current-gcc-riscv64-debug
---
Changes in V2:
 - update the commit message.
 - remove xen/arch/riscv/Kconfig changes.
---
 automation/gitlab-ci/build.yaml         |  4 ++++
 xen/arch/riscv/configs/tiny64_defconfig | 12 +++++-------
 xen/arch/riscv/include/asm/domain.h     |  2 ++
 xen/arch/riscv/include/asm/p2m.h        |  2 ++
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 49d6265ad5..ff5c9055d1 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -494,10 +494,14 @@ alpine-3.18-gcc-debug-arm64-earlyprintk:
 .riscv-fixed-randconfig:
   variables: &riscv-fixed-randconfig
     EXTRA_FIXED_RANDCONFIG: |
+      CONFIG_BOOT_TIME_CPUPOOLS=n
       CONFIG_COVERAGE=n
       CONFIG_EXPERT=y
       CONFIG_GRANT_TABLE=n
       CONFIG_MEM_ACCESS=n
+      CONFIG_PERF_COUNTERS=n
+      CONFIG_LIVEPATCH=n
+      CONFIG_XSM=n
 
 archlinux-current-gcc-riscv64:
   extends: .gcc-riscv64-cross-build
diff --git a/xen/arch/riscv/configs/tiny64_defconfig b/xen/arch/riscv/configs/tiny64_defconfig
index 09defe236b..fc7a04872f 100644
--- a/xen/arch/riscv/configs/tiny64_defconfig
+++ b/xen/arch/riscv/configs/tiny64_defconfig
@@ -1,12 +1,10 @@
-# CONFIG_SCHED_CREDIT is not set
-# CONFIG_SCHED_RTDS is not set
-# CONFIG_SCHED_NULL is not set
-# CONFIG_SCHED_ARINC653 is not set
-# CONFIG_TRACEBUFFER is not set
-# CONFIG_HYPFS is not set
+# CONFIG_BOOT_TIME_CPUPOOLS is not set
 # CONFIG_GRANT_TABLE is not set
-# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set
 # CONFIG_MEM_ACCESS is not set
+# CONFIG_PERF_COUNTERS is not set
+# CONFIG_COVERAGE is not set
+# CONFIG_LIVEPATCH is not set
+# CONFIG_XSM is not set
 
 CONFIG_RISCV_64=y
 CONFIG_DEBUG=y
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 027bfa8a93..16a9dd57aa 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -5,6 +5,8 @@
 #include <xen/xmalloc.h>
 #include <public/hvm/params.h>
 
+#include <asm/p2m.h>
+
 struct hvm_domain
 {
     uint64_t              params[HVM_NR_PARAMS];
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 87b13f8979..aa86fa10a7 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_RISCV_P2M_H__
 #define __ASM_RISCV_P2M_H__
 
+#include <xen/errno.h>
+
 #include <asm/page-bits.h>
 
 #define paddr_bits PADDR_BITS
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 01/15] xen/riscv: disable unnecessary configs Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15  8:52   ` Jan Beulich
  2024-05-06 10:15 ` [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic Oleksii Kurochko
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Roger Pau Monné, Ross Lagerwall,
	Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Shawn Anastasio

The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit

Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE

These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.

Because of that x86 has the following check in the macros test_bit(),
__test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit():
    if ( bitop_bad_size(addr) ) __bitop_bad_size();
It was necessary to make bitop bad size check generic too, so
arch_check_bitop_size() was introduced.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
   The context ("* Find First Set bit.  Bits are labelled from 1." in xen/bitops.h )
   suggests there's a dependency on an uncommitted patch. It happens becuase the current patch
   series is based on Andrew's patch series ( https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.cooper3@citrix.com/T/#t ),
   but if everything is okay with the current one patch it can be merged without Andrew's patch series being merged.
---
Changes in V9:
  - move up xen/bitops.h in ppc/asm/page.h.
  - update defintion of arch_check_bitop_size.
    And drop correspondent macros from x86/asm/bitops.h
  - drop parentheses in generic__test_and_set_bit() for definition of
    local variable p.
  - fix indentation inside #ifndef BITOP_TYPE...#endif
  - update the commit message.
---
 Changes in V8:
  - drop __pure for function which uses volatile.
  - drop unnessary () in generic__test_and_change_bit() for addr casting.
  - update prototype of generic_test_bit() and test_bit(): now it returns bool
    instead of int.
  - update generic_test_bit() to use BITOP_MASK().
  - Deal with fls{l} changes: it should be in the patch with introduced generic fls{l}.
  - add a footer with explanation of dependency on an uncommitted patch after Signed-off.
  - abstract bitop_size().
  - move BITOP_TYPE define to <xen/types.h>.
---
 Changes in V7:
  - move everything to xen/bitops.h to follow the same approach for all generic
    bit ops.
  - put together BITOP_BITS_PER_WORD and bitops_uint_t.
  - make BITOP_MASK more generic.
  - drop #ifdef ... #endif around BITOP_MASK, BITOP_WORD as they are generic
    enough.
  - drop "_" for generic__{test_and_set_bit,...}().
  - drop " != 0" for functions which return bool.
  - add volatile during the cast for generic__{...}().
  - update the commit message.
  - update arch related code to follow the proposed generic approach.
---
 Changes in V6:
  - Nothing changed ( only rebase )
---
 Changes in V5:
   - new patch
---
 xen/arch/arm/arm64/livepatch.c    |   1 -
 xen/arch/arm/include/asm/bitops.h |  67 ---------------
 xen/arch/ppc/include/asm/bitops.h |  52 ------------
 xen/arch/ppc/include/asm/page.h   |   2 +-
 xen/arch/ppc/mm-radix.c           |   2 +-
 xen/arch/x86/include/asm/bitops.h |  31 ++-----
 xen/include/xen/bitops.h          | 134 ++++++++++++++++++++++++++++++
 xen/include/xen/types.h           |   6 ++
 8 files changed, 151 insertions(+), 144 deletions(-)

diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index df2cebedde..4bc8ed9be5 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -10,7 +10,6 @@
 #include <xen/mm.h>
 #include <xen/vmap.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 #include <asm/insn.h>
 #include <asm/livepatch.h>
diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 5104334e48..8e16335e76 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -22,9 +22,6 @@
 #define __set_bit(n,p)            set_bit(n,p)
 #define __clear_bit(n,p)          clear_bit(n,p)
 
-#define BITOP_BITS_PER_WORD     32
-#define BITOP_MASK(nr)          (1UL << ((nr) % BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
 #define BITS_PER_BYTE           8
 
 #define ADDR (*(volatile int *) addr)
@@ -76,70 +73,6 @@ bool test_and_change_bit_timeout(int nr, volatile void *p,
 bool clear_mask16_timeout(uint16_t mask, volatile void *p,
                           unsigned int max_try);
 
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_set_bit(int nr, volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old | mask;
-        return (old & mask) != 0;
-}
-
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_clear_bit(int nr, volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old & ~mask;
-        return (old & mask) != 0;
-}
-
-/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
-                                            volatile void *addr)
-{
-        unsigned int mask = BITOP_MASK(nr);
-        volatile unsigned int *p =
-                ((volatile unsigned int *)addr) + BITOP_WORD(nr);
-        unsigned int old = *p;
-
-        *p = old ^ mask;
-        return (old & mask) != 0;
-}
-
-/**
- * test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static inline int test_bit(int nr, const volatile void *addr)
-{
-        const volatile unsigned int *p = (const volatile unsigned int *)addr;
-        return 1UL & (p[BITOP_WORD(nr)] >> (nr & (BITOP_BITS_PER_WORD-1)));
-}
-
 /*
  * On ARMv5 and above those functions can be implemented around
  * the clz instruction for much better code efficiency.
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index 989d341a44..e2b6473c8c 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -15,9 +15,6 @@
 #define __set_bit(n, p)         set_bit(n, p)
 #define __clear_bit(n, p)       clear_bit(n, p)
 
-#define BITOP_BITS_PER_WORD     32
-#define BITOP_MASK(nr)          (1U << ((nr) % BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)          ((nr) / BITOP_BITS_PER_WORD)
 #define BITS_PER_BYTE           8
 
 /* PPC bit number conversion */
@@ -69,17 +66,6 @@ static inline void clear_bit(int nr, volatile void *addr)
     clear_bits(BITOP_MASK(nr), (volatile unsigned int *)addr + BITOP_WORD(nr));
 }
 
-/**
- * test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static inline int test_bit(int nr, const volatile void *addr)
-{
-    const volatile unsigned int *p = addr;
-    return 1 & (p[BITOP_WORD(nr)] >> (nr & (BITOP_BITS_PER_WORD - 1)));
-}
-
 static inline unsigned int test_and_clear_bits(
     unsigned int mask,
     volatile unsigned int *p)
@@ -133,44 +119,6 @@ static inline int test_and_set_bit(unsigned int nr, volatile void *addr)
         (volatile unsigned int *)addr + BITOP_WORD(nr)) != 0;
 }
 
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_set_bit(int nr, volatile void *addr)
-{
-    unsigned int mask = BITOP_MASK(nr);
-    volatile unsigned int *p = (volatile unsigned int *)addr + BITOP_WORD(nr);
-    unsigned int old = *p;
-
-    *p = old | mask;
-    return (old & mask) != 0;
-}
-
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail.  You must protect multiple accesses with a lock.
- */
-static inline int __test_and_clear_bit(int nr, volatile void *addr)
-{
-    unsigned int mask = BITOP_MASK(nr);
-    volatile unsigned int *p = (volatile unsigned int *)addr + BITOP_WORD(nr);
-    unsigned int old = *p;
-
-    *p = old & ~mask;
-    return (old & mask) != 0;
-}
-
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_fls(x)
 
diff --git a/xen/arch/ppc/include/asm/page.h b/xen/arch/ppc/include/asm/page.h
index 890e285051..6d4cd2611c 100644
--- a/xen/arch/ppc/include/asm/page.h
+++ b/xen/arch/ppc/include/asm/page.h
@@ -2,9 +2,9 @@
 #ifndef _ASM_PPC_PAGE_H
 #define _ASM_PPC_PAGE_H
 
+#include <xen/bitops.h>
 #include <xen/types.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 
 #define PDE_VALID     PPC_BIT(0)
diff --git a/xen/arch/ppc/mm-radix.c b/xen/arch/ppc/mm-radix.c
index ab5a10695c..9055730997 100644
--- a/xen/arch/ppc/mm-radix.c
+++ b/xen/arch/ppc/mm-radix.c
@@ -1,11 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <xen/bitops.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/mm.h>
 #include <xen/types.h>
 #include <xen/lib.h>
 
-#include <asm/bitops.h>
 #include <asm/byteorder.h>
 #include <asm/early_printk.h>
 #include <asm/page.h>
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index dd439b69a0..23f09fdb7a 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -19,9 +19,6 @@
 #define ADDR (*(volatile int *) addr)
 #define CONST_ADDR (*(const volatile int *) addr)
 
-extern void __bitop_bad_size(void);
-#define bitop_bad_size(addr) (sizeof(*(addr)) < 4)
-
 /**
  * set_bit - Atomically set a bit in memory
  * @nr: the bit to set
@@ -175,7 +172,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
 })
 
 /**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch__test_and_set_bit - Set a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, void *addr)
+static inline int arch__test_and_set_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -194,10 +191,7 @@ static inline int __test_and_set_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_set_bit(nr, addr) ({                 \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_set_bit(nr, addr);                       \
-})
+#define arch__test_and_set_bit arch__test_and_set_bit
 
 /**
  * test_and_clear_bit - Clear a bit and return its old value
@@ -224,7 +218,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
 })
 
 /**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch__test_and_clear_bit - Clear a bit and return its old value
  * @nr: Bit to set
  * @addr: Address to count from
  *
@@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, void *addr)
+static inline int arch__test_and_clear_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_clear_bit(nr, addr) ({               \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_clear_bit(nr, addr);                     \
-})
+#define arch__test_and_clear_bit arch__test_and_clear_bit
 
 /* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr, void *addr)
+static inline int arch__test_and_change_bit(int nr, volatile void *addr)
 {
     int oldbit;
 
@@ -260,10 +251,7 @@ static inline int __test_and_change_bit(int nr, void *addr)
 
     return oldbit;
 }
-#define __test_and_change_bit(nr, addr) ({              \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
-    __test_and_change_bit(nr, addr);                    \
-})
+#define arch__test_and_change_bit arch__test_and_change_bit
 
 /**
  * test_and_change_bit - Change a bit and return its new value
@@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr, const volatile void *addr)
     return oldbit;
 }
 
-#define test_bit(nr, addr) ({                           \
-    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
+#define arch_test_bit(nr, addr) ({                      \
     __builtin_constant_p(nr) ?                          \
         constant_test_bit(nr, addr) :                   \
         variable_test_bit(nr, addr);                    \
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index f14ad0d33a..4f3399273a 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long x)
  * scope
  */
 
+#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
+
+#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
+
+extern void __bitop_bad_size(void);
+
 /* --------------------- Please tidy above here --------------------- */
 
 #include <asm/bitops.h>
 
+#ifndef arch_check_bitop_size
+
+#define bitop_bad_size(addr) sizeof(*(addr)) < sizeof(bitop_uint_t)
+
+#define arch_check_bitop_size(addr) \
+    if ( bitop_bad_size(addr) ) __bitop_bad_size();
+
+#endif /* arch_check_bitop_size */
+
+/**
+ * generic__test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static always_inline bool
+generic__test_and_set_bit(unsigned long nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old | mask;
+    return (old & mask);
+}
+
+/**
+ * generic__test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */
+static always_inline bool
+generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old & ~mask;
+    return (old & mask);
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static always_inline bool
+generic__test_and_change_bit(unsigned long nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old ^ mask;
+    return (old & mask);
+}
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static always_inline bool generic_test_bit(int nr, const volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+
+    return (*p & mask);
+}
+
+static always_inline bool
+__test_and_set_bit(unsigned long nr, volatile void *addr)
+{
+#ifndef arch__test_and_set_bit
+#define arch__test_and_set_bit generic__test_and_set_bit
+#endif
+
+    return arch__test_and_set_bit(nr, addr);
+}
+#define __test_and_set_bit(nr, addr) ({             \
+    arch_check_bitop_size(addr);                    \
+    __test_and_set_bit(nr, addr);                   \
+})
+
+static always_inline bool
+__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
+{
+#ifndef arch__test_and_clear_bit
+#define arch__test_and_clear_bit generic__test_and_clear_bit
+#endif
+
+    return arch__test_and_clear_bit(nr, addr);
+}
+#define __test_and_clear_bit(nr, addr) ({           \
+    arch_check_bitop_size(addr);                    \
+    __test_and_clear_bit(nr, addr);                 \
+})
+
+static always_inline bool
+__test_and_change_bit(unsigned long nr, volatile void *addr)
+{
+#ifndef arch__test_and_change_bit
+#define arch__test_and_change_bit generic__test_and_change_bit
+#endif
+
+    return arch__test_and_change_bit(nr, addr);
+}
+#define __test_and_change_bit(nr, addr) ({              \
+    arch_check_bitop_size(addr);                        \
+    __test_and_change_bit(nr, addr);                    \
+})
+
+static always_inline bool test_bit(int nr, const volatile void *addr)
+{
+#ifndef arch_test_bit
+#define arch_test_bit generic_test_bit
+#endif
+
+    return arch_test_bit(nr, addr);
+}
+#define test_bit(nr, addr) ({                           \
+    arch_check_bitop_size(addr);                        \
+    test_bit(nr, addr);                                 \
+})
+
 /*
  * Find First Set bit.  Bits are labelled from 1.
  */
diff --git a/xen/include/xen/types.h b/xen/include/xen/types.h
index 449947b353..2d63d984eb 100644
--- a/xen/include/xen/types.h
+++ b/xen/include/xen/types.h
@@ -64,6 +64,12 @@ typedef __u64 __be64;
 
 typedef unsigned int __attribute__((__mode__(__pointer__))) uintptr_t;
 
+#ifndef BITOP_TYPE
+#define BITOP_BITS_PER_WORD 32
+
+typedef uint32_t bitop_uint_t;
+#endif
+
 #define test_and_set_bool(b)   xchg(&(b), true)
 #define test_and_clear_bool(b) xchg(&(b), false)
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 01/15] xen/riscv: disable unnecessary configs Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit() Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15  9:09   ` Jan Beulich
  2024-05-06 10:15 ` [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header Oleksii Kurochko
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Shawn Anastasio, Roger Pau Monné

To avoid the compilation error below, it is needed to update to places
in common/page_alloc.c where flsl() is used as now flsl() returns unsigned int:

./include/xen/kernel.h:18:21: error: comparison of distinct pointer types lacks a cast [-Werror]
       18 |         (void) (&_x == &_y);            \
          |                     ^~
    common/page_alloc.c:1843:34: note: in expansion of macro 'min'
     1843 |         unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);

generic_fls{l} was used instead of __builtin_clz{l}(x) as if x is 0,
the result in undefined.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
 The patch is almost independent from Andrew's patch series
 ( https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.cooper3@citrix.com/T/#t)
 except test_fls() function which IMO can be merged as a separate patch after Andrew's patch
 will be fully ready.
---
Changes in V9:
 - update return type of fls and flsl() to unsigned int to be aligned with other
   bit ops.
 - update places where return value of fls() and flsl() is compared with int.
 - update the commit message.
---
Changes in V8:
 - do proper rebase: back definition of fls{l} to the current patch.
 - drop the changes which removed ffz() in PPC. it should be done not
   in this patch.
 - add a message after Signed-off.
---
Changes in V7:
 - Code style fixes
---
Changes in V6:
 - new patch for the patch series.
---
 xen/arch/arm/include/asm/arm32/bitops.h |  2 +-
 xen/arch/arm/include/asm/arm64/bitops.h |  6 ++----
 xen/arch/arm/include/asm/bitops.h       |  7 ++-----
 xen/arch/ppc/include/asm/bitops.h       |  3 ---
 xen/arch/x86/include/asm/bitops.h       |  6 ++++--
 xen/common/bitops.c                     | 22 ++++++++++++++++++++++
 xen/common/page_alloc.c                 |  4 ++--
 xen/include/xen/bitops.h                | 24 ++++++++++++++++++++++++
 8 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm32/bitops.h b/xen/arch/arm/include/asm/arm32/bitops.h
index d0309d47c1..5552d4e945 100644
--- a/xen/arch/arm/include/asm/arm32/bitops.h
+++ b/xen/arch/arm/include/asm/arm32/bitops.h
@@ -1,7 +1,7 @@
 #ifndef _ARM_ARM32_BITOPS_H
 #define _ARM_ARM32_BITOPS_H
 
-#define flsl fls
+#define arch_flsl fls
 
 /*
  * Little endian assembly bitops.  nr = 0 -> byte 0 bit 0.
diff --git a/xen/arch/arm/include/asm/arm64/bitops.h b/xen/arch/arm/include/asm/arm64/bitops.h
index 0efde29068..5f5d97faa0 100644
--- a/xen/arch/arm/include/asm/arm64/bitops.h
+++ b/xen/arch/arm/include/asm/arm64/bitops.h
@@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long __ffs(unsigned long word)
  */
 #define ffz(x)  __ffs(~(x))
 
-static inline int flsl(unsigned long x)
+static inline int arch_flsl(unsigned long x)
 {
         uint64_t ret;
 
-        if (__builtin_constant_p(x))
-               return generic_flsl(x);
-
         asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
 
         return BITS_PER_LONG - ret;
 }
+#define arch_flsl arch_flsl
 
 /* Based on linux/include/asm-generic/bitops/find.h */
 
diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 8e16335e76..860d6d4689 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -78,17 +78,14 @@ bool clear_mask16_timeout(uint16_t mask, volatile void *p,
  * the clz instruction for much better code efficiency.
  */
 
-static inline int fls(unsigned int x)
+static inline int arch_fls(unsigned int x)
 {
         int ret;
 
-        if (__builtin_constant_p(x))
-               return generic_fls(x);
-
         asm("clz\t%"__OP32"0, %"__OP32"1" : "=r" (ret) : "r" (x));
         return 32 - ret;
 }
-
+#define arch_fls arch_fls
 
 #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index e2b6473c8c..ca308fd62b 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -119,9 +119,6 @@ static inline int test_and_set_bit(unsigned int nr, volatile void *addr)
         (volatile unsigned int *)addr + BITOP_WORD(nr)) != 0;
 }
 
-#define flsl(x) generic_flsl(x)
-#define fls(x) generic_fls(x)
-
 /* Based on linux/include/asm-generic/bitops/ffz.h */
 /*
  * ffz - find first zero in word.
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 23f09fdb7a..8f3d76fe44 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -425,7 +425,7 @@ static always_inline unsigned int arch_ffsl(unsigned long x)
  *
  * This is defined the same way as ffs.
  */
-static inline int flsl(unsigned long x)
+static always_inline int arch_flsl(unsigned long x)
 {
     long r;
 
@@ -435,8 +435,9 @@ static inline int flsl(unsigned long x)
           "1:" : "=r" (r) : "rm" (x));
     return (int)r+1;
 }
+#define arch_flsl arch_flsl
 
-static inline int fls(unsigned int x)
+static always_inline int arch_fls(unsigned int x)
 {
     int r;
 
@@ -446,6 +447,7 @@ static inline int fls(unsigned int x)
           "1:" : "=r" (r) : "rm" (x));
     return r + 1;
 }
+#define arch_fls arch_fls
 
 /**
  * hweightN - returns the hamming weight of a N-bit word
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index a8c32f6767..95bc47176b 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -62,9 +62,31 @@ static void test_ffs(void)
     CHECK(ffs64, (uint64_t)0x8000000000000000, 64);
 }
 
+static void test_fls(void)
+{
+    /* unsigned int ffs(unsigned int) */
+    CHECK(fls, 1, 1);
+    CHECK(fls, 3, 2);
+    CHECK(fls, 3U << 30, 32);
+
+    /* unsigned int flsl(unsigned long) */
+    CHECK(flsl, 1, 1);
+    CHECK(flsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
+#if BITS_PER_LONG > 32
+    CHECK(flsl, 3UL << 32, 34);
+#endif
+
+    /* unsigned int fls64(uint64_t) */
+    CHECK(fls64, 1, 1);
+    CHECK(fls64, 0x00000000C0000000ULL, 32);
+    CHECK(fls64, 0x0000000180000000ULL, 33);
+    CHECK(fls64, 0xC000000000000000ULL, 64);
+}
+
 static int __init cf_check test_bitops(void)
 {
     test_ffs();
+    test_fls();
 
     return 0;
 }
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index be4ba3962a..eed6b2a901 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1842,7 +1842,7 @@ static void _init_heap_pages(const struct page_info *pg,
          * Note that the value of ffsl() and flsl() starts from 1 so we need
          * to decrement it by 1.
          */
-        unsigned int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
+        unsigned int inc_order = min(MAX_ORDER + 0U, flsl(e - s) - 1);
 
         if ( s )
             inc_order = min(inc_order, ffsl(s) - 1U);
@@ -2266,7 +2266,7 @@ void __init xenheap_max_mfn(unsigned long mfn)
     ASSERT(!first_node_initialised);
     ASSERT(!xenheap_bits);
     BUILD_BUG_ON((PADDR_BITS - PAGE_SHIFT) >= BITS_PER_LONG);
-    xenheap_bits = min(flsl(mfn + 1) - 1 + PAGE_SHIFT, PADDR_BITS);
+    xenheap_bits = min(flsl(mfn + 1) - 1 + PAGE_SHIFT, PADDR_BITS + 0U);
     printk(XENLOG_INFO "Xen heap: %u bits\n", xenheap_bits);
 }
 
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index 4f3399273a..f5be10b928 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -203,6 +203,30 @@ static always_inline bool test_bit(int nr, const volatile void *addr)
     test_bit(nr, addr);                                 \
 })
 
+static always_inline __pure unsigned int fls(unsigned int x)
+{
+    if ( __builtin_constant_p(x) )
+        return generic_fls(x);
+
+#ifndef arch_fls
+#define arch_fls generic_fls
+#endif
+
+    return arch_fls(x);
+}
+
+static always_inline __pure unsigned int flsl(unsigned long x)
+{
+    if ( __builtin_constant_p(x) )
+        return generic_flsl(x);
+
+#ifndef arch_flsl
+#define arch_flsl generic_flsl
+#endif
+
+    return arch_flsl(x);
+}
+
 /*
  * Find First Set bit.  Bits are labelled from 1.
  */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (2 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15 14:34   ` Michal Orzel
  2024-05-15 15:08   ` Rahul Singh
  2024-05-06 10:15 ` [PATCH v9 05/15] xen/riscv: introduce bitops.h Oleksii Kurochko
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Shawn Anastasio, Rahul Singh

The mentioned macros exist only because of Linux compatible purpose.

The patch defines __ffs() in terms of Xen bitops and it is safe
to define in this way ( as __ffs() - 1 ) as considering that __ffs()
was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
so it is assumed that such cases are not encountered in the current code.

To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
were defined locally in find-next-bit.c.

Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
this file, but to keep smmu-v3.c looks close to linux it was deciced just
to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V9:
 - update the defintion of __ffs in xen/linux-compat.h.
 - add Reviewed-by: Jan Beulich <jbeulich@suse.com>
 - add Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com> for PPC part which
   I missed to add in the previous patch version.
---
Changes in V8:
 - drop ffz() for PPC as there is no any usage of it and it seems to me that it was
   introduced only because Arm has it, and Arm uses it only in find-next-bit.c where
   ffz() was moved to.
 - add Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com> for PPC part.
---
Changes in V7:
 - introduce ffz(),__ffs() locally in find-next-bit.c
 - drop inclusion of <xen/linux-compat.h> in find-next-bit.c.
 - update the commit message.
---
Changes in V6:
 - new patch for the patch series.
---
 xen/arch/arm/include/asm/arm64/bitops.h | 21 ---------------------
 xen/arch/ppc/include/asm/bitops.h       | 21 ---------------------
 xen/drivers/passthrough/arm/smmu-v3.c   |  2 ++
 xen/include/xen/linux-compat.h          |  2 ++
 xen/lib/find-next-bit.c                 |  3 +++
 5 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/bitops.h b/xen/arch/arm/include/asm/arm64/bitops.h
index 5f5d97faa0..2deb134388 100644
--- a/xen/arch/arm/include/asm/arm64/bitops.h
+++ b/xen/arch/arm/include/asm/arm64/bitops.h
@@ -1,27 +1,6 @@
 #ifndef _ARM_ARM64_BITOPS_H
 #define _ARM_ARM64_BITOPS_H
 
-/* Based on linux/include/asm-generic/bitops/builtin-__ffs.h */
-/**
- * __ffs - find first bit in word.
- * @word: The word to search
- *
- * Undefined if no bit exists, so code should check against 0 first.
- */
-static /*__*/always_inline unsigned long __ffs(unsigned long word)
-{
-        return __builtin_ctzl(word);
-}
-
-/* Based on linux/include/asm-generic/bitops/ffz.h */
-/*
- * ffz - find first zero in word.
- * @word: The word to search
- *
- * Undefined if no zero exists, so code should check against ~0UL first.
- */
-#define ffz(x)  __ffs(~(x))
-
 static inline int arch_flsl(unsigned long x)
 {
         uint64_t ret;
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index ca308fd62b..2237b9f8f4 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -119,15 +119,6 @@ static inline int test_and_set_bit(unsigned int nr, volatile void *addr)
         (volatile unsigned int *)addr + BITOP_WORD(nr)) != 0;
 }
 
-/* Based on linux/include/asm-generic/bitops/ffz.h */
-/*
- * ffz - find first zero in word.
- * @word: The word to search
- *
- * Undefined if no zero exists, so code should check against ~0UL first.
- */
-#define ffz(x) __ffs(~(x))
-
 /**
  * hweightN - returns the hamming weight of a N-bit word
  * @x: the word to weigh
@@ -139,16 +130,4 @@ static inline int test_and_set_bit(unsigned int nr, volatile void *addr)
 #define hweight16(x) __builtin_popcount((uint16_t)(x))
 #define hweight8(x)  __builtin_popcount((uint8_t)(x))
 
-/* Based on linux/include/asm-generic/bitops/builtin-__ffs.h */
-/**
- * __ffs - find first bit in word.
- * @word: The word to search
- *
- * Undefined if no bit exists, so code should check against 0 first.
- */
-static always_inline unsigned long __ffs(unsigned long word)
-{
-    return __builtin_ctzl(word);
-}
-
 #endif /* _ASM_PPC_BITOPS_H */
diff --git a/xen/drivers/passthrough/arm/smmu-v3.c b/xen/drivers/passthrough/arm/smmu-v3.c
index b1c40c2c0a..6904962467 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -72,12 +72,14 @@
  */
 
 #include <xen/acpi.h>
+#include <xen/bitops.h>
 #include <xen/config.h>
 #include <xen/delay.h>
 #include <xen/errno.h>
 #include <xen/err.h>
 #include <xen/irq.h>
 #include <xen/lib.h>
+#include <xen/linux-compat.h>
 #include <xen/list.h>
 #include <xen/mm.h>
 #include <xen/rbtree.h>
diff --git a/xen/include/xen/linux-compat.h b/xen/include/xen/linux-compat.h
index 62ba71485c..b289dfd894 100644
--- a/xen/include/xen/linux-compat.h
+++ b/xen/include/xen/linux-compat.h
@@ -19,4 +19,6 @@ typedef int64_t __s64;
 
 typedef paddr_t phys_addr_t;
 
+#define __ffs(x) (ffsl(x) - 1UL)
+
 #endif /* __XEN_LINUX_COMPAT_H__ */
diff --git a/xen/lib/find-next-bit.c b/xen/lib/find-next-bit.c
index ca6f82277e..761b027398 100644
--- a/xen/lib/find-next-bit.c
+++ b/xen/lib/find-next-bit.c
@@ -12,6 +12,9 @@
 
 #include <asm/byteorder.h>
 
+#define __ffs(x) (ffsl(x) - 1)
+#define ffz(x) __ffs(~(x))
+
 #ifndef find_next_bit
 /*
  * Find the next set bit in a memory region.
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 05/15] xen/riscv: introduce bitops.h
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (3 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15  9:29   ` Jan Beulich
  2024-05-06 10:15 ` [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h Oleksii Kurochko
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Taken from Linux-6.4.0-rc1

Xen's bitops.h consists of several Linux's headers:
* linux/arch/include/asm/bitops.h:
  * The following function were removed as they aren't used in Xen:
        * test_and_set_bit_lock
        * clear_bit_unlock
        * __clear_bit_unlock
  * The following functions were renamed in the way how they are
    used by common code:
        * __test_and_set_bit
        * __test_and_clear_bit
  * The declaration and implementation of the following functios
    were updated to make Xen build happy:
        * clear_bit
        * set_bit
        * __test_and_clear_bit
        * __test_and_set_bit

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V9:
 - add Acked-by: Jan Beulich <jbeulich@suse.com>
 - drop redefinition of bitop_uint_t in asm/types.h as some operation in Xen common code expects
   to work with 32-bit quantities.
 - s/BITS_PER_LONG/BITOP_BITS_PER_WORD in asm/bitops.h around __AMO() macros.
---
Changes in V8:
 - define bitop_uint_t in <asm/types.h> after the changes in patch related to introduction of
   "introduce generic non-atomic test_*bit()".
 - drop duplicated __set_bit() and __clear_bit().
 - drop duplicated comment: /* Based on linux/arch/include/asm/bitops.h */.
 - update type of res and mask in test_and_op_bit_ord(): unsigned long -> bitop_uint_t.
 - drop 1 padding blank in test_and_op_bit_ord().
 - update definition of test_and_set_bit(),test_and_clear_bit(),test_and_change_bit:
   change return type to bool.
 - change addr argument type of test_and_change_bit(): unsigned long * -> void *.
 - move test_and_change_bit() closer to other test_and-s function.
 - Code style fixes: tabs -> space.
 - s/#undef __op_bit/#undef op_bit.
 - update the commit message: delete information about generic-non-atomic.h changes as now
   it is a separate patch.
---
Changes in V7:
 - Update the commit message.
 - Drop "__" for __op_bit and __op_bit_ord as they are atomic.
 - add comment above __set_bit and __clear_bit about why they are defined as atomic.
 - align bitops_uint_t with __AMO().
 - make changes after  generic non-atomic test_*bit() were changed.
 - s/__asm__ __volatile__/asm volatile
---
Changes in V6:
 - rebase clean ups were done: drop unused asm-generic includes
---
 Changes in V5:
   - new patch
---
 xen/arch/riscv/include/asm/bitops.h | 137 ++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/bitops.h

diff --git a/xen/arch/riscv/include/asm/bitops.h b/xen/arch/riscv/include/asm/bitops.h
new file mode 100644
index 0000000000..9f796bf859
--- /dev/null
+++ b/xen/arch/riscv/include/asm/bitops.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2012 Regents of the University of California */
+
+#ifndef _ASM_RISCV_BITOPS_H
+#define _ASM_RISCV_BITOPS_H
+
+#include <asm/system.h>
+
+#if BITOP_BITS_PER_WORD == 64
+#define __AMO(op)   "amo" #op ".d"
+#elif BITOP_BITS_PER_WORD == 32
+#define __AMO(op)   "amo" #op ".w"
+#else
+#error "Unexpected BITS_PER_LONG"
+#endif
+
+/* Based on linux/arch/include/asm/bitops.h */
+
+/*
+ * Non-atomic bit manipulation.
+ *
+ * Implemented using atomics to be interrupt safe. Could alternatively
+ * implement with local interrupt masking.
+ */
+#define __set_bit(n, p)      set_bit(n, p)
+#define __clear_bit(n, p)    clear_bit(n, p)
+
+#define test_and_op_bit_ord(op, mod, nr, addr, ord)     \
+({                                                      \
+    bitop_uint_t res, mask;                             \
+    mask = BITOP_MASK(nr);                              \
+    asm volatile (                                      \
+        __AMO(op) #ord " %0, %2, %1"                    \
+        : "=r" (res), "+A" (addr[BITOP_WORD(nr)])       \
+        : "r" (mod(mask))                               \
+        : "memory");                                    \
+    ((res & mask) != 0);                                \
+})
+
+#define op_bit_ord(op, mod, nr, addr, ord)      \
+    asm volatile (                              \
+        __AMO(op) #ord " zero, %1, %0"          \
+        : "+A" (addr[BITOP_WORD(nr)])           \
+        : "r" (mod(BITOP_MASK(nr)))             \
+        : "memory");
+
+#define test_and_op_bit(op, mod, nr, addr)    \
+    test_and_op_bit_ord(op, mod, nr, addr, .aqrl)
+#define op_bit(op, mod, nr, addr) \
+    op_bit_ord(op, mod, nr, addr, )
+
+/* Bitmask modifiers */
+#define NOP(x)    (x)
+#define NOT(x)    (~(x))
+
+/**
+ * test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ */
+static inline bool test_and_set_bit(int nr, volatile void *p)
+{
+    volatile bitop_uint_t *addr = p;
+
+    return test_and_op_bit(or, NOP, nr, addr);
+}
+
+/**
+ * test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ */
+static inline bool test_and_clear_bit(int nr, volatile void *p)
+{
+    volatile bitop_uint_t *addr = p;
+
+    return test_and_op_bit(and, NOT, nr, addr);
+}
+
+/**
+ * test_and_change_bit - Toggle (change) a bit and return its old value
+ * @nr: Bit to change
+ * @addr: Address to count from
+ *
+ * This operation is atomic and cannot be reordered.
+ * It also implies a memory barrier.
+ */
+static inline bool test_and_change_bit(int nr, volatile void *p)
+{
+    volatile bitop_uint_t *addr = p;
+
+    return test_and_op_bit(xor, NOP, nr, addr);
+}
+
+/**
+ * set_bit - Atomically set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Note that @nr may be almost arbitrarily large; this function is not
+ * restricted to acting on a single-word quantity.
+ */
+static inline void set_bit(int nr, volatile void *p)
+{
+    volatile bitop_uint_t *addr = p;
+
+    op_bit(or, NOP, nr, addr);
+}
+
+/**
+ * clear_bit - Clears a bit in memory
+ * @nr: Bit to clear
+ * @addr: Address to start counting from
+ */
+static inline void clear_bit(int nr, volatile void *p)
+{
+    volatile bitop_uint_t *addr = p;
+
+    op_bit(and, NOT, nr, addr);
+}
+
+#undef test_and_op_bit
+#undef op_bit
+#undef NOP
+#undef NOT
+#undef __AMO
+
+#endif /* _ASM_RISCV_BITOPS_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (4 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 05/15] xen/riscv: introduce bitops.h Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15  9:38   ` Jan Beulich
  2024-05-06 10:15 ` [PATCH v9 07/15] xen/riscv: introduce atomic.h Oleksii Kurochko
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

The header was taken from Linux kernl 6.4.0-rc1.

Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
  access.
* replace tabs with spaces
* replace __* variale with *__
* introduce generic version of xchg_* and cmpxchg_*.
* drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
* drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )

Implementation of 4- and 8-byte cases were updated according to the spec:
```
              ....
Linux Construct         RVWMO AMO Mapping
    ...
atomic <op>             amo<op>.{w|d}.aqrl
Linux Construct         RVWMO LR/SC Mapping
    ...
atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop

Table A.5: Mappings from Linux memory primitives to RISC-V primitives

```

The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:

 - .aq:   The LR/SC sequence can be given acquire semantics by
          setting the aq bit on the LR instruction.
 - .rl:   The LR/SC sequence can be given release semantics by
          setting the rl bit on the SC instruction.
 - .aqrl: Setting the aq bit on the LR instruction, and setting
          both the aq and the rl bit on the SC instruction makes
          the LR/SC sequence sequentially consistent, meaning that
          it cannot be reordered with earlier or later memory
          operations from the same hart.

 Software should not set the rl bit on an LR instruction unless
 the aq bit is also set, nor should software set the aq bit on an
 SC instruction unless the rl bit is also set. LR.rl and SC.aq
 instructions are not guaranteed to provide any stronger ordering
 than those with both bits clear, but may result in lower
 performance.

Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]

[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
[2] https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V9:
 - update return type of __bad_xchg();
 - update the comment above __bad_cmpxchg();
 - update the default case inside __xchg() to be aligned with similar default
   case in __cmpxchg().
---
Changes in V8:
 - use __bad_{xchg,cmpxch}(ptr,size) insetead of STATIC_ASSERT_UNREACHABLE() to
   make this patch be independent from the macros that haven't been committed yet
   and may never be.
---
Changes in V7:
 - replace __*() -> _*() in cmpxchg.h
 - add () around ptr in _amoswap_generic(), emulate_xchg_1_2()
 - fix typos
 - code style fixes.
 - refactor emulate_xcgh_1_2():
   - add parentheses for new argument.
   - use instead of constant 0x4 -> sizeof(*aligned_ptr).
   - add alignment_mask to save  sizeof(*aligned_ptr) - sizeof(*(ptr));
 - s/CONFIG_32BIT/CONFIG_RISCV_32
 - drop unnecessary parentheses in xchg()
 - drop register in _generic_cmpxchg()
 - refactor and update prototype of _generic_cmpxchg():
   add named operands, return value instead of passing ret as an argument, drop %z and J
   constraints for mask operand as it can't be zero
 - refactor and code style fixes in emulate_cmpxchg_1_2():
   - add explanatory comment for emulate_cmpxchg_1_2().
   - add parentheses for old and new arguments.
   - use instead of constant 0x4 -> sizeof(*aligned_ptr).
   - add alignment_mask to save  sizeof(*aligned_ptr) - sizeof(*(ptr));
 - drop unnessary parenthesses in cmpxchg().
 - update the commit message.
 - s/__asm__ __volatile__/asm volatile
---
Changes in V6:
-  update the commit message? ( As before I don't understand this point. Can you give an example of what sort of opcode / instruction is missing?)
 - Code style fixes
 - change sizeof(*ptr) -> sizeof(*(ptr))
 - update operands names and some local variables for macros emulate_xchg_1_2() and emulate_cmpxchg_1_2()
 - drop {cmp}xchg_{relaxed,acquire,release) versions as they aren't needed for Xen
 - update __amoswap_generic() prototype and defintion: drop pre and post barries.
 - update emulate_xchg_1_2() prototype and definion: add lr_sfx, drop pre and post barries.
 - rename __xchg_generic to __xchg(), make __xchg as static inline function to be able to "#ifndef CONFIG_32BIT case 8:... "
---
Changes in V5:
 - update the commit message.
 - drop ALIGN_DOWN().
 - update the definition of emulate_xchg_1_2():
   - lr.d -> lr.w, sc.d -> sc.w.
   - drop ret argument.
   - code style fixes around asm volatile.
   - update prototype.
   - use asm named operands.
   - rename local variables.
   - add comment above the macros
 - update the definition of __xchg_generic:
   - rename to __xchg()
   - transform it to static inline
   - code style fixes around switch()
   - update prototype.
 - redefine cmpxchg()
 - update emulate_cmpxchg_1_2():
   - update prototype
   - update local variables names and usage of them
   - use name asm operands.
   - add comment above the macros
 - drop pre and post, and use .aq,.rl, .aqrl suffixes.
 - drop {cmp}xchg_{relaxed, aquire, release} as they are not used by Xen.
 - drop unnessary details in comment above emulate_cmpxchg_1_2()
---
Changes in V4:
 - Code style fixes.
 - enforce in __xchg_*() has the same type for new and *ptr, also "\n"
   was removed at the end of asm instruction.
 - dependency from https://lore.kernel.org/xen-devel/cover.1706259490.git.federico.serafini@bugseng.com/
 - switch from ASSERT_UNREACHABLE to STATIC_ASSERT_UNREACHABLE().
 - drop xchg32(ptr, x) and xchg64(ptr, x) as they aren't used.
 - drop cmpxcg{32,64}_{local} as they aren't used.
 - introduce generic version of xchg_* and cmpxchg_*.
 - update the commit message.
---
Changes in V3:
 - update the commit message
 - add emulation of {cmp}xchg_... for 1 and 2 bytes types
---
Changes in V2:
 - update the comment at the top of the header.
 - change xen/lib.h to xen/bug.h.
 - sort inclusion of headers properly.
---
 xen/arch/riscv/include/asm/cmpxchg.h | 239 +++++++++++++++++++++++++++
 xen/arch/riscv/include/asm/config.h  |   2 +
 2 files changed, 241 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/cmpxchg.h

diff --git a/xen/arch/riscv/include/asm/cmpxchg.h b/xen/arch/riscv/include/asm/cmpxchg.h
new file mode 100644
index 0000000000..f5a23d5cbf
--- /dev/null
+++ b/xen/arch/riscv/include/asm/cmpxchg.h
@@ -0,0 +1,239 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2014 Regents of the University of California */
+
+#ifndef _ASM_RISCV_CMPXCHG_H
+#define _ASM_RISCV_CMPXCHG_H
+
+#include <xen/compiler.h>
+#include <xen/lib.h>
+
+#include <asm/fence.h>
+#include <asm/io.h>
+#include <asm/system.h>
+
+#define _amoswap_generic(ptr, new, ret, sfx) \
+    asm volatile ( \
+        " amoswap" sfx " %0, %2, %1" \
+        : "=r" (ret), "+A" (*(ptr)) \
+        : "r" (new) \
+        : "memory" );
+
+/*
+ * For LR and SC, the A extension requires that the address held in rs1 be
+ * naturally aligned to the size of the operand (i.e., eight-byte aligned
+ * for 64-bit words and four-byte aligned for 32-bit words).
+ * If the address is not naturally aligned, an address-misaligned exception
+ * or an access-fault exception will be generated.
+ *
+ * Thereby:
+ * - for 1-byte xchg access the containing word by clearing low two bits.
+ * - for 2-byte xchg access the containing word by clearing bit 1.
+ *
+ * If resulting 4-byte access is still misalgined, it will fault just as
+ * non-emulated 4-byte access would.
+ */
+#define emulate_xchg_1_2(ptr, new, lr_sfx, sc_sfx) \
+({ \
+    uint32_t *aligned_ptr; \
+    unsigned long alignment_mask = sizeof(*aligned_ptr) - sizeof(*(ptr)); \
+    unsigned int new_val_bit = \
+        ((unsigned long)(ptr) & alignment_mask) * BITS_PER_BYTE; \
+    unsigned long mask = \
+        GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0) << new_val_bit; \
+    unsigned int new_ = (new) << new_val_bit; \
+    unsigned int old; \
+    unsigned int scratch; \
+    \
+    aligned_ptr = (uint32_t *)((unsigned long)(ptr) & ~alignment_mask); \
+    \
+    asm volatile ( \
+        "0: lr.w" lr_sfx " %[old], %[ptr_]\n" \
+        "   andn  %[scratch], %[old], %[mask]\n" \
+        "   or   %[scratch], %[scratch], %z[new_]\n" \
+        "   sc.w" sc_sfx " %[scratch], %[scratch], %[ptr_]\n" \
+        "   bnez %[scratch], 0b\n" \
+        : [old] "=&r" (old), [scratch] "=&r" (scratch), \
+          [ptr_] "+A" (*aligned_ptr) \
+        : [new_] "rJ" (new_), [mask] "r" (mask) \
+        : "memory" ); \
+    \
+    (__typeof__(*(ptr)))((old & mask) >> new_val_bit); \
+})
+
+/*
+ * This function doesn't exist, so you'll get a linker error
+ * if something tries to do an invalid xchg().
+ */
+extern unsigned long __bad_xchg(volatile void *ptr, int size);
+
+static always_inline unsigned long __xchg(volatile void *ptr, unsigned long new, int size)
+{
+    unsigned long ret;
+
+    switch ( size )
+    {
+    case 1:
+        ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new, ".aq", ".aqrl");
+        break;
+    case 2:
+        ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new, ".aq", ".aqrl");
+        break;
+    case 4:
+        _amoswap_generic((volatile uint32_t *)ptr, new, ret, ".w.aqrl");
+        break;
+#ifndef CONFIG_RISCV_32
+    case 8:
+        _amoswap_generic((volatile uint64_t *)ptr, new, ret, ".d.aqrl");
+        break;
+#endif
+    default:
+        return __bad_xchg(ptr, size);
+    }
+
+    return ret;
+}
+
+#define xchg(ptr, x) \
+({ \
+    __typeof__(*(ptr)) n_ = (x); \
+    (__typeof__(*(ptr))) \
+        __xchg((ptr), (unsigned long)n_, sizeof(*(ptr))); \
+})
+
+#define _generic_cmpxchg(ptr, old, new, lr_sfx, sc_sfx) \
+ ({ \
+    unsigned int rc; \
+    unsigned long ret; \
+    unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0); \
+    asm volatile ( \
+        "0: lr" lr_sfx " %[ret], %[ptr_]\n" \
+        "   and  %[ret], %[ret], %[mask]\n" \
+        "   bne  %[ret], %z[old_], 1f\n" \
+        "   sc" sc_sfx " %[rc], %z[new_], %[ptr_]\n" \
+        "   bnez %[rc], 0b\n" \
+        "1:\n" \
+        : [ret] "=&r" (ret), [rc] "=&r" (rc), [ptr_] "+A" (*ptr) \
+        : [old_] "rJ" (old), [new_] "rJ" (new), [mask] "r" (mask)  \
+        : "memory" ); \
+    ret; \
+ })
+
+/*
+ * For LR and SC, the A extension requires that the address held in rs1 be
+ * naturally aligned to the size of the operand (i.e., eight-byte aligned
+ * for 64-bit words and four-byte aligned for 32-bit words).
+ * If the address is not naturally aligned, an address-misaligned exception
+ * or an access-fault exception will be generated.
+ *
+ * Thereby:
+ * - for 1-byte xchg access the containing word by clearing low two bits
+ * - for 2-byte xchg ccess the containing word by clearing first bit.
+ * 
+ * If resulting 4-byte access is still misalgined, it will fault just as
+ * non-emulated 4-byte access would.
+ *
+ * old_val was casted to unsigned long for cmpxchgptr()
+ */
+#define emulate_cmpxchg_1_2(ptr, old, new, lr_sfx, sc_sfx) \
+({ \
+    uint32_t *aligned_ptr; \
+    unsigned long alignment_mask = sizeof(*aligned_ptr) - sizeof(*(ptr)); \
+    uint8_t new_val_bit = \
+        ((unsigned long)(ptr) & alignment_mask) * BITS_PER_BYTE; \
+    unsigned long mask = \
+        GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0) << new_val_bit; \
+    unsigned int old_ = (old) << new_val_bit; \
+    unsigned int new_ = (new) << new_val_bit; \
+    unsigned int old_val; \
+    unsigned int scratch; \
+    \
+    aligned_ptr = (uint32_t *)((unsigned long)ptr & ~alignment_mask); \
+    \
+    asm volatile ( \
+        "0: lr.w" lr_sfx " %[scratch], %[ptr_]\n" \
+        "   and  %[old_val], %[scratch], %[mask]\n" \
+        "   bne  %[old_val], %z[old_], 1f\n" \
+        /* the following line is an equivalent to: \
+         *     scratch = old_val & ~mask; \
+         * And to elimanate one ( likely register ) input it was decided \
+         * to use: \
+         *     scratch = old_val ^ scratch \
+         */ \
+        "   xor  %[scratch], %[old_val], %[scratch]\n" \
+        "   or   %[scratch], %[scratch], %z[new_]\n" \
+        "   sc.w" sc_sfx " %[scratch], %[scratch], %[ptr_]\n" \
+        "   bnez %[scratch], 0b\n" \
+        "1:\n" \
+        : [old_val] "=&r" (old_val), [scratch] "=&r" (scratch), \
+          [ptr_] "+A" (*aligned_ptr) \
+        : [old_] "rJ" (old_), [new_] "rJ" (new_), \
+          [mask] "r" (mask) \
+        : "memory" ); \
+    \
+    (__typeof__(*(ptr)))((unsigned long)old_val >> new_val_bit); \
+})
+
+/*
+ * This function doesn't exist, so you'll get a linker error
+ * if something tries to do an invalid cmpxchg().
+ */
+extern unsigned long __bad_cmpxchg(volatile void *ptr, int size);
+
+/*
+ * Atomic compare and exchange.  Compare OLD with MEM, if identical,
+ * store NEW in MEM.  Return the initial value in MEM.  Success is
+ * indicated by comparing RETURN with OLD.
+ */
+static always_inline unsigned long __cmpxchg(volatile void *ptr,
+                                             unsigned long old,
+                                             unsigned long new,
+                                             int size)
+{
+    unsigned long ret;
+
+    switch ( size )
+    {
+    case 1:
+        ret = emulate_cmpxchg_1_2((volatile uint8_t *)ptr, old, new,
+                                  ".aq", ".aqrl");
+        break;
+    case 2:
+        ret = emulate_cmpxchg_1_2((volatile uint16_t *)ptr, old, new,
+                                   ".aq", ".aqrl");
+        break;
+    case 4:
+        ret = _generic_cmpxchg((volatile uint32_t *)ptr, old, new,
+                          ".w.aq", ".w.aqrl");
+        break;
+#ifndef CONFIG_32BIT
+    case 8:
+        ret = _generic_cmpxchg((volatile uint64_t *)ptr, old, new,
+                           ".d.aq", ".d.aqrl");
+        break;
+#endif
+    default:
+        return __bad_cmpxchg(ptr, size);
+    }
+
+    return ret;
+}
+
+#define cmpxchg(ptr, o, n) \
+({ \
+    __typeof__(*(ptr)) o_ = (o); \
+    __typeof__(*(ptr)) n_ = (n); \
+    (__typeof__(*(ptr))) \
+    __cmpxchg((ptr), (unsigned long)o_, (unsigned long)n_, \
+              sizeof(*(ptr))); \
+})
+
+#endif /* _ASM_RISCV_CMPXCHG_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/riscv/include/asm/config.h b/xen/arch/riscv/include/asm/config.h
index c5f93e6a01..50583aafdc 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -119,6 +119,8 @@
 
 #define BITS_PER_LLONG 64
 
+#define BITS_PER_BYTE 8
+
 /* xen_ulong_t is always 64 bits */
 #define BITS_PER_XEN_ULONG 64
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 07/15] xen/riscv: introduce atomic.h
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (5 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-15  9:49   ` Jan Beulich
  2024-05-06 10:15 ` [PATCH v9 08/15] xen/riscv: introduce monitor.h Oleksii Kurochko
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.

The following changes were done on top of Bobby's changes:
 - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
   to use__*xchg_generic()
 - drop casts in write_atomic() as they are unnecessary
 - drop introduction of WRITE_ONCE() and READ_ONCE().
   Xen provides ACCESS_ONCE()
 - remove zero-length array access in read_atomic()
 - drop defines similar to pattern:
   #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
 - drop  atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
   are not used in Xen.
 - update the defintion of  atomic##prefix##_{cmp}xchg according to
   {cmp}xchg() implementation in Xen.
 - some ATOMIC_OP() macros were updated:
   - drop size argument for ATOMIC_OP which defines atomic##prefix##_xchg()
     and atomic##prefix##_cmpxchg().
   - drop c_op argument for ATOMIC_OPS which defines ATOMIC_OPS(and, and),
     ATOMIC_OPS( or,  or), ATOMIC_OPS(xor, xor), ATOMIC_OPS(add, add, +),
     ATOMIC_OPS(sub, add, -) as c_op is always "+" for them.
   - drop "" from definition of __atomic_{acquire/release"}_fence.

The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:

 - .aq:   The LR/SC sequence can be given acquire semantics by
          setting the aq bit on the LR instruction.
 - .rl:   The LR/SC sequence can be given release semantics by
              setting the rl bit on the SC instruction.
 - .aqrl: Setting the aq bit on the LR instruction, and setting
          both the aq and the rl bit on the SC instruction makes
          the LR/SC sequence sequentially consistent, meaning that
          it cannot be reordered with earlier or later memory
          operations from the same hart.

 Software should not set the rl bit on an LR instruction unless
 the aq bit is also set, nor should software set the aq bit on an
 SC instruction unless the rl bit is also set. LR.rl and SC.aq
 instructions are not guaranteed to provide any stronger ordering
 than those with both bits clear, but may result in lower
 performance.

Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]

[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
[2] https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/

Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V9:
 - update the defintion of write_atomic macros:
   drop the return value as this macros isn't expeceted to return something
   drop unnessary parentheses around p.
 - drop casts inside _add_sized() for ptr variable as they aren't necessary.
---
Changes in V8:
 - ???? add the explanatory comment to _add_sized().
 - drop "" in __atomic_{acquire, release}_fence().
 - code style fixes in atomic##prefix##_##op##_return(): indentation.
 - drop an unary_op argument ("+") for ATOMIC_OPS(and, and), ATOMIC_OPS( or,  or), ATOMIC_OPS(xor, xor)
   and use "+" directly inside definition of ATOMIC_OPS().
 - drop c_op for ATOMIC_OPS(add, add, +) and ATOMIC_OPS(sub, add, -) as it is always "+" for now.
   Just use "+" inside definition of ATOMIC_OPS().
 - drop size argument for ATOMIC_OP() defintions of atomic##prefix##_{xchg,cmpxchg}()
 - update the commit message.
---
Changes in V7:
 - drop relaxed version of atomic ops as they are not used.
 - update the commit message
 - code style fixes
 - refactor functions write_atomic(), add_sized() to be able to use #ifdef CONFIG_RISCV_32 ... #endif
   for {write,read}q().
 - update ATOMIC_OPS to receive unary operator.
 - update the header on top of atomic-ops.h.
 - some minor movements of function inside atomic-ops.h header.
---
Changes in V6:
 - drop atomic##prefix##_{cmp}xchg_{release, aquire, relaxed} as they aren't used
   by Xen
 - code style fixes.
 - %s/__asm__ __volatile__/asm volatile
 - add explanational comments.
 - move inclusion of "#include <asm-generic/atomic-ops.h>" further down in atomic.h
   header.
---
Changes in V5:
 - fence.h changes were moved to separate patch as patches related to io.h and cmpxchg.h,
   which are dependecies for this patch, also needed changes in fence.h
 - remove accessing of zero-length array
 - drops cast in write_atomic()
 - drop introduction of WRITE_ONCE() and READ_ONCE().
 - drop defines similar to pattern #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - Xen code style fixes
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
---
Changes in V4:
 - do changes related to the updates of [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
 - drop casts in read_atomic_size(), write_atomic(), add_sized()
 - tabs -> spaces
 - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler to handle NR_CPUS=1
   the same as NR_CPUS>1 with accepting less than ideal performance.
---
Changes in V3:
  - update the commit message
  - add SPDX for fence.h
  - code style fixes
  - Remove /* TODO: ... */ for add_sized macros. It looks correct to me.
  - re-order the patch
  - merge to this patch fence.h
---
Changes in V2:
 - Change an author of commit. I got this header from Bobby's old repo.
---

 xen/arch/riscv/include/asm/atomic.h  | 280 +++++++++++++++++++++++++++
 xen/include/asm-generic/atomic-ops.h |  97 ++++++++++
 2 files changed, 377 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/atomic.h
 create mode 100644 xen/include/asm-generic/atomic-ops.h

diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h
new file mode 100644
index 0000000000..097e27c51b
--- /dev/null
+++ b/xen/arch/riscv/include/asm/atomic.h
@@ -0,0 +1,280 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Taken and modified from Linux.
+ *
+ * The following changes were done:
+ * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
+ *     to use__*xchg_generic()
+ * - drop casts in write_atomic() as they are unnecessary
+ * - drop introduction of WRITE_ONCE() and READ_ONCE().
+ *   Xen provides ACCESS_ONCE()
+ * - remove zero-length array access in read_atomic()
+ * - drop defines similar to pattern
+ *   #define atomic_add_return_relaxed   atomic_add_return_relaxed
+ * - move not RISC-V specific functions to asm-generic/atomics-ops.h
+ * 
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2012 Regents of the University of California
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2024 Vates SAS
+ */
+
+#ifndef _ASM_RISCV_ATOMIC_H
+#define _ASM_RISCV_ATOMIC_H
+
+#include <xen/atomic.h>
+
+#include <asm/cmpxchg.h>
+#include <asm/fence.h>
+#include <asm/io.h>
+#include <asm/system.h>
+
+void __bad_atomic_size(void);
+
+/*
+ * Legacy from Linux kernel. For some reason they wanted to have ordered
+ * read/write access. Thereby read* is used instead of read*_cpu()
+ */
+static always_inline void read_atomic_size(const volatile void *p,
+                                           void *res,
+                                           unsigned int size)
+{
+    switch ( size )
+    {
+    case 1: *(uint8_t *)res = readb(p); break;
+    case 2: *(uint16_t *)res = readw(p); break;
+    case 4: *(uint32_t *)res = readl(p); break;
+#ifndef CONFIG_RISCV_32
+    case 8: *(uint32_t *)res = readq(p); break;
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define read_atomic(p) ({                                   \
+    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \
+    read_atomic_size(p, x_.c, sizeof(*(p)));                \
+    x_.val;                                                 \
+})
+
+static always_inline void _write_atomic(volatile void *p,
+                                       unsigned long x, unsigned int size)
+{
+    switch ( size )
+    {
+    case 1: writeb(x, p); break;
+    case 2: writew(x, p); break;
+    case 4: writel(x, p); break;
+#ifndef CONFIG_RISCV_32
+    case 8: writeq(x, p); break;
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define write_atomic(p, x)                              \
+({                                                      \
+    typeof(*(p)) x_ = (x);                              \
+    _write_atomic(p, x_, sizeof(*(p)));                 \
+})
+
+static always_inline void _add_sized(volatile void *p,
+                                     unsigned long x, unsigned int size)
+{
+    switch ( size )
+    {
+    case 1:
+    {
+        volatile uint8_t *ptr = p;
+        write_atomic(ptr, read_atomic(ptr) + x);
+        break;
+    }
+    case 2:
+    {
+        volatile uint16_t *ptr = p;
+        write_atomic(ptr, read_atomic(ptr) + x);
+        break;
+    }
+    case 4:
+    {
+        volatile uint32_t *ptr = p;
+        write_atomic(ptr, read_atomic(ptr) + x);
+        break;
+    }
+#ifndef CONFIG_RISCV_32
+    case 8:
+    {
+        volatile uint64_t *ptr = p;
+        write_atomic(ptr, read_atomic(ptr) + x);
+        break;
+    }
+#endif
+    default: __bad_atomic_size(); break;
+    }
+}
+
+#define add_sized(p, x)                                 \
+({                                                      \
+    typeof(*(p)) x_ = (x);                              \
+    _add_sized((p), x_, sizeof(*(p)));                  \
+})
+
+#define __atomic_acquire_fence() \
+    asm volatile ( RISCV_ACQUIRE_BARRIER ::: "memory" )
+
+#define __atomic_release_fence() \
+    asm volatile ( RISCV_RELEASE_BARRIER ::: "memory" )
+
+/*
+ * First, the atomic ops that have no ordering constraints and therefor don't
+ * have the AQ or RL bits set.  These don't return anything, so there's only
+ * one version to worry about.
+ */
+#define ATOMIC_OP(op, asm_op, unary_op, asm_type, c_type, prefix)  \
+static inline                                               \
+void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
+{                                                           \
+    asm volatile (                                          \
+        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
+        : "+A" (v->counter)                                 \
+        : "r" (unary_op i)                                  \
+        : "memory" );                                       \
+}                                                           \
+
+/*
+ * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
+ * last argument for ATOMIC_OP isn't used.
+ */
+#define ATOMIC_OPS(op, asm_op, unary_op)                    \
+        ATOMIC_OP (op, asm_op, unary_op, w, int,   )
+
+ATOMIC_OPS(add, add, +)
+ATOMIC_OPS(sub, add, -)
+ATOMIC_OPS(and, and, +)
+ATOMIC_OPS( or,  or, +)
+ATOMIC_OPS(xor, xor, +)
+
+#undef ATOMIC_OP
+#undef ATOMIC_OPS
+
+#include <asm-generic/atomic-ops.h>
+
+/*
+ * Atomic ops that have ordered variant.
+ * There's two flavors of these: the arithmatic ops have both fetch and return
+ * versions, while the logical ops only have fetch versions.
+ */
+#define ATOMIC_FETCH_OP(op, asm_op, unary_op, asm_type, c_type, prefix) \
+static inline                                                       \
+c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
+{                                                                   \
+    register c_type ret;                                            \
+    asm volatile (                                                  \
+        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2, %0"          \
+        : "+A" (v->counter), "=r" (ret)                             \
+        : "r" (unary_op i)                                          \
+        : "memory" );                                               \
+    return ret;                                                     \
+}
+
+#define ATOMIC_OP_RETURN(op, asm_op, c_op, unary_op, asm_type, c_type, prefix) \
+static inline                                                           \
+c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v)  \
+{                                                                       \
+    return atomic##prefix##_fetch_##op(i, v) c_op (unary_op i);         \
+}
+
+/*
+ * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
+ * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
+ */
+#define ATOMIC_OPS(op, asm_op, unary_op)                        \
+        ATOMIC_FETCH_OP( op, asm_op,    unary_op, w, int,   )   \
+        ATOMIC_OP_RETURN(op, asm_op, +, unary_op, w, int,   )
+
+ATOMIC_OPS(add, add, +)
+ATOMIC_OPS(sub, add, -)
+
+#undef ATOMIC_OPS
+
+#define ATOMIC_OPS(op, asm_op) \
+        ATOMIC_FETCH_OP(op, asm_op, +, w, int,   )
+
+ATOMIC_OPS(and, and)
+ATOMIC_OPS( or,  or)
+ATOMIC_OPS(xor, xor)
+
+#undef ATOMIC_OPS
+
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+
+/* This is required to provide a full barrier on success. */
+static inline int atomic_add_unless(atomic_t *v, int a, int u)
+{
+    int prev, rc;
+
+    asm volatile (
+        "0: lr.w     %[p],  %[c]\n"
+        "   beq      %[p],  %[u], 1f\n"
+        "   add      %[rc], %[p], %[a]\n"
+        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
+        "   bnez     %[rc], 0b\n"
+        "1:\n"
+        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
+        : [a] "r" (a), [u] "r" (u)
+        : "memory");
+    return prev;
+}
+
+static inline int atomic_sub_if_positive(atomic_t *v, int offset)
+{
+    int prev, rc;
+
+    asm volatile (
+        "0: lr.w     %[p],  %[c]\n"
+        "   sub      %[rc], %[p], %[o]\n"
+        "   bltz     %[rc], 1f\n"
+        "   sc.w.aqrl  %[rc], %[rc], %[c]\n"
+        "   bnez     %[rc], 0b\n"
+        "1:\n"
+        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
+        : [o] "r" (offset)
+        : "memory" );
+    return prev - offset;
+}
+
+/*
+ * atomic_{cmp,}xchg is required to have exactly the same ordering semantics as
+ * {cmp,}xchg and the operations that return.
+ */
+#define ATOMIC_OP(c_t, prefix)                                  \
+static inline                                                   \
+c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
+{                                                               \
+    return __xchg(&v->counter, n, sizeof(c_t));                 \
+}                                                               \
+static inline                                                   \
+c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \
+{                                                               \
+    return __cmpxchg(&v->counter, o, n, sizeof(c_t));           \
+}
+
+#define ATOMIC_OPS() \
+    ATOMIC_OP(int,   )
+
+ATOMIC_OPS()
+
+#undef ATOMIC_OPS
+#undef ATOMIC_OP
+
+#endif /* _ASM_RISCV_ATOMIC_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-generic/atomic-ops.h b/xen/include/asm-generic/atomic-ops.h
new file mode 100644
index 0000000000..98dd907942
--- /dev/null
+++ b/xen/include/asm-generic/atomic-ops.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * The header provides default implementations for every xen/atomic.h-provided
+ * forward inline declaration that can be synthesized from other atomic
+ * functions or being created from scratch.
+ */
+#ifndef _ASM_GENERIC_ATOMIC_OPS_H_
+#define _ASM_GENERIC_ATOMIC_OPS_H_
+
+#include <xen/atomic.h>
+#include <xen/lib.h>
+
+#ifndef ATOMIC_READ
+static inline int atomic_read(const atomic_t *v)
+{
+    return ACCESS_ONCE(v->counter);
+}
+#endif
+
+#ifndef _ATOMIC_READ
+static inline int _atomic_read(atomic_t v)
+{
+    return v.counter;
+}
+#endif
+
+#ifndef ATOMIC_SET
+static inline void atomic_set(atomic_t *v, int i)
+{
+    ACCESS_ONCE(v->counter) = i;
+}
+#endif
+
+#ifndef _ATOMIC_SET
+static inline void _atomic_set(atomic_t *v, int i)
+{
+    v->counter = i;
+}
+#endif
+
+#ifndef ATOMIC_SUB_AND_TEST
+static inline int atomic_sub_and_test(int i, atomic_t *v)
+{
+    return atomic_sub_return(i, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_INC_AND_TEST
+static inline int atomic_inc_and_test(atomic_t *v)
+{
+    return atomic_add_return(1, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_INC
+static inline void atomic_inc(atomic_t *v)
+{
+    atomic_add(1, v);
+}
+#endif
+
+#ifndef ATOMIC_INC_RETURN
+static inline int atomic_inc_return(atomic_t *v)
+{
+    return atomic_add_return(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC
+static inline void atomic_dec(atomic_t *v)
+{
+    atomic_sub(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC_RETURN
+static inline int atomic_dec_return(atomic_t *v)
+{
+    return atomic_sub_return(1, v);
+}
+#endif
+
+#ifndef ATOMIC_DEC_AND_TEST
+static inline int atomic_dec_and_test(atomic_t *v)
+{
+    return atomic_sub_return(1, v) == 0;
+}
+#endif
+
+#ifndef ATOMIC_ADD_NEGATIVE
+static inline int atomic_add_negative(int i, atomic_t *v)
+{
+    return atomic_add_return(i, v) < 0;
+}
+#endif
+
+#endif /* _ASM_GENERIC_ATOMIC_OPS_H_ */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 08/15] xen/riscv: introduce monitor.h
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (6 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 07/15] xen/riscv: introduce atomic.h Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 09/15] xen/riscv: add definition of __read_mostly Oleksii Kurochko
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Tamas K Lengyel, Alexandru Isaila,
	Petre Pircalabu, Alistair Francis, Bob Eshleman, Connor Davis

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V4-V9:
 - Nothing changed. Only rebase.
---
Changes in V3:
 - new patch.
---
 xen/arch/riscv/include/asm/monitor.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/monitor.h

diff --git a/xen/arch/riscv/include/asm/monitor.h b/xen/arch/riscv/include/asm/monitor.h
new file mode 100644
index 0000000000..f4fe2c0690
--- /dev/null
+++ b/xen/arch/riscv/include/asm/monitor.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_RISCV_MONITOR_H__
+#define __ASM_RISCV_MONITOR_H__
+
+#include <xen/bug.h>
+
+#include <asm-generic/monitor.h>
+
+struct domain;
+
+static inline uint32_t arch_monitor_get_capabilities(struct domain *d)
+{
+    BUG_ON("unimplemented");
+    return 0;
+}
+
+#endif /* __ASM_RISCV_MONITOR_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 09/15] xen/riscv: add definition of __read_mostly
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (7 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 08/15] xen/riscv: introduce monitor.h Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 10/15] xen/riscv: add required things to current.h Oleksii Kurochko
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

The definition of __read_mostly should be removed in:
https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b333e@suse.com/

The patch introduces it in arch-specific header to not
block enabling of full Xen build for RISC-V.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
- [PATCH] move __read_mostly to xen/cache.h  [2]

Right now, the patch series doesn't have a direct dependency on [2] and it
provides __read_mostly in the patch:
    [PATCH v3 26/34] xen/riscv: add definition of __read_mostly
However, it will be dropped as soon as [2] is merged or at least when the
final version of the patch [2] is provided.

Considering that there is still no still final decision regarding patch [2] my suggestion
is to merge RISC-V specific patch and just drop the changes in patch [2].

[2] https://lore.kernel.org/xen-devel/f25eb5c9-7c14-6e23-8535-2c66772b333e@suse.com/
---
Changes in V9:
 - Only rebase was done.
---
Change in V8:
 - update the footer after Signed-off.
---
Changes in V4-V7:
 - Nothing changed. Only rebase.
---
 xen/arch/riscv/include/asm/cache.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/riscv/include/asm/cache.h b/xen/arch/riscv/include/asm/cache.h
index 69573eb051..94bd94db53 100644
--- a/xen/arch/riscv/include/asm/cache.h
+++ b/xen/arch/riscv/include/asm/cache.h
@@ -3,4 +3,6 @@
 #ifndef _ASM_RISCV_CACHE_H
 #define _ASM_RISCV_CACHE_H
 
+#define __read_mostly __section(".data.read_mostly")
+
 #endif /* _ASM_RISCV_CACHE_H */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 10/15] xen/riscv: add required things to current.h
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (8 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 09/15] xen/riscv: add definition of __read_mostly Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 11/15] xen/riscv: add minimal stuff to mm.h to build full Xen Oleksii Kurochko
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Add minimal requied things to be able to build full Xen.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V5-V9:
 - Nothing changed. Only rebase.
---
Changes in V4:
 - BUG() was changed to BUG_ON("unimplemented");
 - Change "xen/bug.h" to "xen/lib.h" as BUG_ON is defined in xen/lib.h.
 - Add Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V3:
 - add SPDX
 - drop a forward declaration of struct vcpu;
 - update guest_cpu_user_regs() macros
 - replace get_processor_id with smp_processor_id
 - update the commit message
 - code style fixes
---
Changes in V2:
 - Nothing changed. Only rebase.
---
 xen/arch/riscv/include/asm/current.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/xen/arch/riscv/include/asm/current.h b/xen/arch/riscv/include/asm/current.h
index d84f15dc50..aedb6dc732 100644
--- a/xen/arch/riscv/include/asm/current.h
+++ b/xen/arch/riscv/include/asm/current.h
@@ -3,6 +3,21 @@
 #ifndef __ASM_CURRENT_H
 #define __ASM_CURRENT_H
 
+#include <xen/lib.h>
+#include <xen/percpu.h>
+#include <asm/processor.h>
+
+#ifndef __ASSEMBLY__
+
+/* Which VCPU is "current" on this PCPU. */
+DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
+
+#define current            this_cpu(curr_vcpu)
+#define set_current(vcpu)  do { current = (vcpu); } while (0)
+#define get_cpu_current(cpu)  per_cpu(curr_vcpu, cpu)
+
+#define guest_cpu_user_regs() ({ BUG_ON("unimplemented"); NULL; })
+
 #define switch_stack_and_jump(stack, fn) do {               \
     asm volatile (                                          \
             "mv sp, %0\n"                                   \
@@ -10,4 +25,8 @@
     unreachable();                                          \
 } while ( false )
 
+#define get_per_cpu_offset() __per_cpu_offset[smp_processor_id()]
+
+#endif /* __ASSEMBLY__ */
+
 #endif /* __ASM_CURRENT_H */
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 11/15] xen/riscv: add minimal stuff to mm.h to build full Xen
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (9 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 10/15] xen/riscv: add required things to current.h Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 12/15] xen/riscv: introduce vm_event_*() functions Oleksii Kurochko
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V8-V9:
 - Nothing changed only rebase.
---
Changes in V7:
 - update argument type of maddr_to_virt() function: unsigned long -> paddr_t
 - rename argument of PFN_ORDER(): pfn -> pg.
 - add Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V6:
 - drop __virt_to_maddr() ( transform to macro ) and __maddr_to_virt ( rename to maddr_to_virt ).
 - parenthesize va in definition of vmap_to_mfn().
 - Code style fixes.
---
Changes in V5:
 - update the comment around "struct domain *domain;" : zero -> NULL
 - fix ident. for unsigned long val;
 - put page_to_virt() and virt_to_page() close to each other.
 - drop unnessary leading underscore
 - drop a space before the comment: /* Count of uses of this frame as its current type. */
 - drop comment about a page 'not as a shadow'. it is not necessary for RISC-V
---
Changes in V4:
 - update an argument name of PFN_ORDERN macros.
 - drop pad at the end of 'struct page_info'.
 - Change message -> subject in "Changes in V3"
 - delete duplicated macros from riscv/mm.h
 - fix identation in struct page_info
 - align comment for PGC_ macros
 - update definitions of domain_set_alloc_bitsize() and domain_clamp_alloc_bitsize()
 - drop unnessary comments.
 - s/BUG/BUG_ON("...")
 - define __virt_to_maddr, __maddr_to_virt as stubs
 - add inclusion of xen/mm-frame.h for mfn_x and others
 - include "xen/mm.h" instead of "asm/mm.h" to fix compilation issues:
	 In file included from arch/riscv/setup.c:7:
	./arch/riscv/include/asm/mm.h:60:28: error: field 'list' has incomplete type
	   60 |     struct page_list_entry list;
	      |                            ^~~~
	./arch/riscv/include/asm/mm.h:81:43: error: 'MAX_ORDER' undeclared here (not in a function)
	   81 |                 unsigned long first_dirty:MAX_ORDER + 1;
	      |                                           ^~~~~~~~~
	./arch/riscv/include/asm/mm.h:81:31: error: bit-field 'first_dirty' width not an integer constant
	   81 |                 unsigned long first_dirty:MAX_ORDER + 1;
 - Define __virt_to_mfn() and __mfn_to_virt() using maddr_to_mfn() and mfn_to_maddr().
---
Changes in V3:
 - update the commit title
 - introduce DIRECTMAP_VIRT_START.
 - drop changes related pfn_to_paddr() and paddr_to_pfn as they were remvoe in
   [PATCH v2 32/39] xen/riscv: add minimal stuff to asm/page.h to build full Xen
 - code style fixes.
 - drop get_page_nr  and put_page_nr as they don't need for time being
 - drop CONFIG_STATIC_MEMORY related things
 - code style fixes
---
Changes in V2:
 - define stub for arch_get_dma_bitsize(void)
---
 xen/arch/riscv/include/asm/mm.h | 240 ++++++++++++++++++++++++++++++++
 xen/arch/riscv/mm.c             |   2 +-
 xen/arch/riscv/setup.c          |   2 +-
 3 files changed, 242 insertions(+), 2 deletions(-)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 07c7a0abba..cc4a07a71c 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -3,11 +3,246 @@
 #ifndef _ASM_RISCV_MM_H
 #define _ASM_RISCV_MM_H
 
+#include <public/xen.h>
+#include <xen/bug.h>
+#include <xen/mm-frame.h>
+#include <xen/pdx.h>
+#include <xen/types.h>
+
 #include <asm/page-bits.h>
 
 #define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
 #define paddr_to_pfn(pa)  ((unsigned long)((pa) >> PAGE_SHIFT))
 
+#define paddr_to_pdx(pa)    mfn_to_pdx(maddr_to_mfn(pa))
+#define gfn_to_gaddr(gfn)   pfn_to_paddr(gfn_x(gfn))
+#define gaddr_to_gfn(ga)    _gfn(paddr_to_pfn(ga))
+#define mfn_to_maddr(mfn)   pfn_to_paddr(mfn_x(mfn))
+#define maddr_to_mfn(ma)    _mfn(paddr_to_pfn(ma))
+#define vmap_to_mfn(va)     maddr_to_mfn(virt_to_maddr((vaddr_t)(va)))
+#define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
+
+static inline void *maddr_to_virt(paddr_t ma)
+{
+    BUG_ON("unimplemented");
+    return NULL;
+}
+
+#define virt_to_maddr(va) ({ BUG_ON("unimplemented"); 0; })
+
+/* Convert between Xen-heap virtual addresses and machine frame numbers. */
+#define __virt_to_mfn(va)  mfn_x(maddr_to_mfn(virt_to_maddr(va)))
+#define __mfn_to_virt(mfn) maddr_to_virt(mfn_to_maddr(_mfn(mfn)))
+
+/*
+ * We define non-underscored wrappers for above conversion functions.
+ * These are overriden in various source files while underscored version
+ * remain intact.
+ */
+#define virt_to_mfn(va)     __virt_to_mfn(va)
+#define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
+
+struct page_info
+{
+    /* Each frame can be threaded onto a doubly-linked list. */
+    struct page_list_entry list;
+
+    /* Reference count and various PGC_xxx flags and fields. */
+    unsigned long count_info;
+
+    /* Context-dependent fields follow... */
+    union {
+        /* Page is in use: ((count_info & PGC_count_mask) != 0). */
+        struct {
+            /* Type reference count and various PGT_xxx flags and fields. */
+            unsigned long type_info;
+        } inuse;
+
+        /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
+        union {
+            struct {
+                /*
+                 * Index of the first *possibly* unscrubbed page in the buddy.
+                 * One more bit than maximum possible order to accommodate
+                 * INVALID_DIRTY_IDX.
+                 */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+                unsigned long first_dirty:MAX_ORDER + 1;
+
+                /* Do TLBs need flushing for safety before next page use? */
+                bool need_tlbflush:1;
+
+#define BUDDY_NOT_SCRUBBING    0
+#define BUDDY_SCRUBBING        1
+#define BUDDY_SCRUB_ABORT      2
+                unsigned long scrub_state:2;
+            };
+
+            unsigned long val;
+        } free;
+    } u;
+
+    union {
+        /* Page is in use */
+        struct {
+            /* Owner of this page (NULL if page is anonymous). */
+            struct domain *domain;
+        } inuse;
+
+        /* Page is on a free list. */
+        struct {
+            /* Order-size of the free chunk this page is the head of. */
+            unsigned int order;
+        } free;
+    } v;
+
+    union {
+        /*
+         * Timestamp from 'TLB clock', used to avoid extra safety flushes.
+         * Only valid for: a) free pages, and b) pages with zero type count
+         */
+        uint32_t tlbflush_timestamp;
+    };
+};
+
+#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
+
+/* PDX of the first page in the frame table. */
+extern unsigned long frametable_base_pdx;
+
+/* Convert between machine frame numbers and page-info structures. */
+#define mfn_to_page(mfn)                                            \
+    (frame_table + (mfn_to_pdx(mfn) - frametable_base_pdx))
+#define page_to_mfn(pg)                                             \
+    pdx_to_mfn((unsigned long)((pg) - frame_table) + frametable_base_pdx)
+
+static inline void *page_to_virt(const struct page_info *pg)
+{
+    return mfn_to_virt(mfn_x(page_to_mfn(pg)));
+}
+
+/* Convert between Xen-heap virtual addresses and page-info structures. */
+static inline struct page_info *virt_to_page(const void *v)
+{
+    BUG_ON("unimplemented");
+    return NULL;
+}
+
+/*
+ * Common code requires get_page_type and put_page_type.
+ * We don't care about typecounts so we just do the minimum to make it
+ * happy.
+ */
+static inline int get_page_type(struct page_info *page, unsigned long type)
+{
+    return 1;
+}
+
+static inline void put_page_type(struct page_info *page)
+{
+}
+
+static inline void put_page_and_type(struct page_info *page)
+{
+    put_page_type(page);
+    put_page(page);
+}
+
+/*
+ * RISC-V does not have an M2P, but common code expects a handful of
+ * M2P-related defines and functions. Provide dummy versions of these.
+ */
+#define INVALID_M2P_ENTRY        (~0UL)
+#define SHARED_M2P_ENTRY         (~0UL - 1UL)
+#define SHARED_M2P(_e)           ((_e) == SHARED_M2P_ENTRY)
+
+#define set_gpfn_from_mfn(mfn, pfn) do { (void)(mfn), (void)(pfn); } while (0)
+#define mfn_to_gfn(d, mfn) ((void)(d), _gfn(mfn_x(mfn)))
+
+#define PDX_GROUP_SHIFT (PAGE_SHIFT + VPN_BITS)
+
+static inline unsigned long domain_get_maximum_gpfn(struct domain *d)
+{
+    BUG_ON("unimplemented");
+    return 0;
+}
+
+static inline long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    BUG_ON("unimplemented");
+    return 0;
+}
+
+/*
+ * On RISCV, all the RAM is currently direct mapped in Xen.
+ * Hence return always true.
+ */
+static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
+{
+    return true;
+}
+
+#define PG_shift(idx)   (BITS_PER_LONG - (idx))
+#define PG_mask(x, idx) (x ## UL << PG_shift(idx))
+
+#define PGT_none          PG_mask(0, 1)  /* no special uses of this page   */
+#define PGT_writable_page PG_mask(1, 1)  /* has writable mappings?         */
+#define PGT_type_mask     PG_mask(1, 1)  /* Bits 31 or 63.                 */
+
+/* Count of uses of this frame as its current type. */
+#define PGT_count_width   PG_shift(2)
+#define PGT_count_mask    ((1UL << PGT_count_width) - 1)
+
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
+/* Cleared when the owning guest 'frees' this page. */
+#define _PGC_allocated    PG_shift(1)
+#define PGC_allocated     PG_mask(1, 1)
+/* Page is Xen heap? */
+#define _PGC_xen_heap     PG_shift(2)
+#define PGC_xen_heap      PG_mask(1, 2)
+/* Page is broken? */
+#define _PGC_broken       PG_shift(7)
+#define PGC_broken        PG_mask(1, 7)
+/* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
+#define PGC_state         PG_mask(3, 9)
+#define PGC_state_inuse   PG_mask(0, 9)
+#define PGC_state_offlining PG_mask(1, 9)
+#define PGC_state_offlined PG_mask(2, 9)
+#define PGC_state_free    PG_mask(3, 9)
+#define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+
+/* Count of references to this frame. */
+#define PGC_count_width   PG_shift(9)
+#define PGC_count_mask    ((1UL << PGC_count_width) - 1)
+
+#define _PGC_extra        PG_shift(10)
+#define PGC_extra         PG_mask(1, 10)
+
+#define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap)
+#define is_xen_heap_mfn(mfn) \
+    (mfn_valid(mfn) && is_xen_heap_page(mfn_to_page(mfn)))
+
+#define is_xen_fixed_mfn(mfn)                                   \
+    ((mfn_to_maddr(mfn) >= virt_to_maddr((vaddr_t)_start)) &&   \
+     (mfn_to_maddr(mfn) <= virt_to_maddr((vaddr_t)_end - 1)))
+
+#define page_get_owner(p)    (p)->v.inuse.domain
+#define page_set_owner(p, d) ((p)->v.inuse.domain = (d))
+
+/* TODO: implement */
+#define mfn_valid(mfn) ({ (void)(mfn); 0; })
+
+#define domain_set_alloc_bitsize(d) ((void)(d))
+#define domain_clamp_alloc_bitsize(d, b) ((void)(d), (b))
+
+#define PFN_ORDER(pg) ((pg)->v.free.order)
+
 extern unsigned char cpu0_boot_stack[];
 
 void setup_initial_pagetables(void);
@@ -20,4 +255,9 @@ unsigned long calc_phys_offset(void);
 
 void turn_on_mmu(unsigned long ra);
 
+static inline unsigned int arch_get_dma_bitsize(void)
+{
+    return 32; /* TODO */
+}
+
 #endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 053f043a3d..fe3a43be20 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -5,12 +5,12 @@
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/macros.h>
+#include <xen/mm.h>
 #include <xen/pfn.h>
 
 #include <asm/early_printk.h>
 #include <asm/csr.h>
 #include <asm/current.h>
-#include <asm/mm.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 6593f601c1..98a94c4c48 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -2,9 +2,9 @@
 
 #include <xen/compile.h>
 #include <xen/init.h>
+#include <xen/mm.h>
 
 #include <asm/early_printk.h>
-#include <asm/mm.h>
 
 /* Xen stack for bringing up the first CPU. */
 unsigned char __initdata cpu0_boot_stack[STACK_SIZE]
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 12/15] xen/riscv: introduce vm_event_*() functions
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (10 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 11/15] xen/riscv: add minimal stuff to mm.h to build full Xen Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 13/15] xen/riscv: add minimal amount of stubs to build full Xen Oleksii Kurochko
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Tamas K Lengyel, Alexandru Isaila, Petre Pircalabu

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V5-V9:
 - Only rebase was done.
---
Changes in V4:
  - New patch.
---
 xen/arch/riscv/Makefile   |  1 +
 xen/arch/riscv/vm_event.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)
 create mode 100644 xen/arch/riscv/vm_event.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 2fefe14e7c..1ed1a8369b 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
 obj-y += traps.o
+obj-y += vm_event.o
 
 $(TARGET): $(TARGET)-syms
 	$(OBJCOPY) -O binary -S $< $@
diff --git a/xen/arch/riscv/vm_event.c b/xen/arch/riscv/vm_event.c
new file mode 100644
index 0000000000..bb1fc73bc1
--- /dev/null
+++ b/xen/arch/riscv/vm_event.c
@@ -0,0 +1,19 @@
+#include <xen/bug.h>
+
+struct vm_event_st;
+struct vcpu;
+
+void vm_event_fill_regs(struct vm_event_st *req)
+{
+    BUG_ON("unimplemented");
+}
+
+void vm_event_set_registers(struct vcpu *v, struct vm_event_st *rsp)
+{
+    BUG_ON("unimplemented");
+}
+
+void vm_event_monitor_next_interrupt(struct vcpu *v)
+{
+    /* Not supported on RISCV. */
+}
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 13/15] xen/riscv: add minimal amount of stubs to build full Xen
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (11 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 12/15] xen/riscv: introduce vm_event_*() functions Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 14/15] xen/riscv: enable full Xen build Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 15/15] xen/README: add compiler and binutils versions for RISC-V64 Oleksii Kurochko
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V7-V9:
 - Only rebase was done.
---
Changes in V6:
 - update the commit in stubs.c around /* ... common/irq.c ... */
 - add Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V5:
 - drop unrelated changes
 - assert_failed("unimplmented...") change to BUG_ON()
---
Changes in V4:
  - added new stubs which are necessary for compilation after rebase: __cpu_up(), __cpu_disable(), __cpu_die()
    from smpboot.c
  - back changes related to printk() in early_printk() as they should be removed in the next patch to avoid
    compilation error.
  - update definition of cpu_khz: __read_mostly -> __ro_after_init.
  - drop vm_event_reset_vmtrace(). It is defibed in asm-generic/vm_event.h.
  - move vm_event_*() functions from stubs.c to riscv/vm_event.c.
  - s/BUG/BUG_ON("unimplemented") in stubs.c
  - back irq_actor_none() and irq_actor_none() as common/irq.c isn't compiled at this moment,
    so this function are needed to avoid compilation error.
  - defined max_page to avoid compilation error, it will be removed as soon as common/page_alloc.c will
    be compiled.
---
Changes in V3:
 - code style fixes.
 - update attribute for frametable_base_pdx  and frametable_virt_end to __ro_after_init.
   insteaf of read_mostly.
 - use BUG() instead of assert_failed/WARN for newly introduced stubs.
 - drop "#include <public/vm_event.h>" in stubs.c and use forward declaration instead.
 - drop ack_node() and end_node() as they aren't used now.
---
Changes in V2:
 - define udelay stub
 - remove 'select HAS_PDX' from RISC-V Kconfig because of
   https://lore.kernel.org/xen-devel/20231006144405.1078260-1-andrew.cooper3@citrix.com/
---
 xen/arch/riscv/Makefile |   1 +
 xen/arch/riscv/mm.c     |  50 +++++
 xen/arch/riscv/setup.c  |   8 +
 xen/arch/riscv/stubs.c  | 439 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/riscv/traps.c  |  25 +++
 5 files changed, 523 insertions(+)
 create mode 100644 xen/arch/riscv/stubs.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 1ed1a8369b..60afbc0ad9 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -4,6 +4,7 @@ obj-y += mm.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
+obj-y += stubs.o
 obj-y += traps.o
 obj-y += vm_event.o
 
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index fe3a43be20..2c3fb7d72e 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include <xen/bug.h>
 #include <xen/cache.h>
 #include <xen/compiler.h>
 #include <xen/init.h>
@@ -14,6 +15,9 @@
 #include <asm/page.h>
 #include <asm/processor.h>
 
+unsigned long __ro_after_init frametable_base_pdx;
+unsigned long __ro_after_init frametable_virt_end;
+
 struct mmu_desc {
     unsigned int num_levels;
     unsigned int pgtbl_count;
@@ -294,3 +298,49 @@ unsigned long __init calc_phys_offset(void)
     phys_offset = load_start - XEN_VIRT_START;
     return phys_offset;
 }
+
+void put_page(struct page_info *page)
+{
+    BUG_ON("unimplemented");
+}
+
+unsigned long get_upper_mfn_bound(void)
+{
+    /* No memory hotplug yet, so current memory limit is the final one. */
+    return max_page - 1;
+}
+
+void arch_dump_shared_mem_info(void)
+{
+    BUG_ON("unimplemented");
+}
+
+int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
+{
+    BUG_ON("unimplemented");
+    return -1;
+}
+
+int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
+                              union add_to_physmap_extra extra,
+                              unsigned long idx, gfn_t gfn)
+{
+    BUG_ON("unimplemented");
+
+    return 0;
+}
+
+int destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+    BUG_ON("unimplemented");
+    return -1;
+}
+
+int map_pages_to_xen(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    BUG_ON("unimplemented");
+    return -1;
+}
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 98a94c4c48..8bb5bdb2ae 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -1,11 +1,19 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include <xen/bug.h>
 #include <xen/compile.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 
+#include <public/version.h>
+
 #include <asm/early_printk.h>
 
+void arch_get_xen_caps(xen_capabilities_info_t *info)
+{
+    BUG_ON("unimplemented");
+}
+
 /* Xen stack for bringing up the first CPU. */
 unsigned char __initdata cpu0_boot_stack[STACK_SIZE]
     __aligned(STACK_SIZE);
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
new file mode 100644
index 0000000000..8285bcffef
--- /dev/null
+++ b/xen/arch/riscv/stubs.c
@@ -0,0 +1,439 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <xen/cpumask.h>
+#include <xen/domain.h>
+#include <xen/irq.h>
+#include <xen/nodemask.h>
+#include <xen/time.h>
+#include <public/domctl.h>
+
+#include <asm/current.h>
+
+/* smpboot.c */
+
+cpumask_t cpu_online_map;
+cpumask_t cpu_present_map;
+cpumask_t cpu_possible_map;
+
+/* ID of the PCPU we're running on */
+DEFINE_PER_CPU(unsigned int, cpu_id);
+/* XXX these seem awfully x86ish... */
+/* representing HT siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
+/* representing HT and core siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
+
+nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
+
+/*
+ * max_page is defined in page_alloc.c which isn't complied for now.
+ * definition of max_page will be remove as soon as page_alloc is built.
+ */
+unsigned long __read_mostly max_page;
+
+/* time.c */
+
+unsigned long __ro_after_init cpu_khz;  /* CPU clock frequency in kHz. */
+
+s_time_t get_s_time(void)
+{
+    BUG_ON("unimplemented");
+}
+
+int reprogram_timer(s_time_t timeout)
+{
+    BUG_ON("unimplemented");
+}
+
+void send_timer_event(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
+{
+    BUG_ON("unimplemented");
+}
+
+/* shutdown.c */
+
+void machine_restart(unsigned int delay_millisecs)
+{
+    BUG_ON("unimplemented");
+}
+
+void machine_halt(void)
+{
+    BUG_ON("unimplemented");
+}
+
+/* domctl.c */
+
+long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
+                    XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_get_domain_info(const struct domain *d,
+                          struct xen_domctl_getdomaininfo *info)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
+{
+    BUG_ON("unimplemented");
+}
+
+/* monitor.c */
+
+int arch_monitor_domctl_event(struct domain *d,
+                              struct xen_domctl_monitor_op *mop)
+{
+    BUG_ON("unimplemented");
+}
+
+/* smp.c */
+
+void arch_flush_tlb_mask(const cpumask_t *mask)
+{
+    BUG_ON("unimplemented");
+}
+
+void smp_send_event_check_mask(const cpumask_t *mask)
+{
+    BUG_ON("unimplemented");
+}
+
+void smp_send_call_function_mask(const cpumask_t *mask)
+{
+    BUG_ON("unimplemented");
+}
+
+/* irq.c */
+
+struct pirq *alloc_pirq_struct(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, int will_share)
+{
+    BUG_ON("unimplemented");
+}
+
+void pirq_guest_unbind(struct domain *d, struct pirq *pirq)
+{
+    BUG_ON("unimplemented");
+}
+
+void pirq_set_affinity(struct domain *d, int pirq, const cpumask_t *mask)
+{
+    BUG_ON("unimplemented");
+}
+
+hw_irq_controller no_irq_type = {
+    .typename = "none",
+    .startup = irq_startup_none,
+    .shutdown = irq_shutdown_none,
+    .enable = irq_enable_none,
+    .disable = irq_disable_none,
+};
+
+int arch_init_one_irq_desc(struct irq_desc *desc)
+{
+    BUG_ON("unimplemented");
+}
+
+void smp_send_state_dump(unsigned int cpu)
+{
+    BUG_ON("unimplemented");
+}
+
+/* domain.c */
+
+DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
+unsigned long __per_cpu_offset[NR_CPUS];
+
+void context_switch(struct vcpu *prev, struct vcpu *next)
+{
+    BUG_ON("unimplemented");
+}
+
+void continue_running(struct vcpu *same)
+{
+    BUG_ON("unimplemented");
+}
+
+void sync_local_execstate(void)
+{
+    BUG_ON("unimplemented");
+}
+
+void sync_vcpu_execstate(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void startup_cpu_idle_loop(void)
+{
+    BUG_ON("unimplemented");
+}
+
+void free_domain_struct(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void dump_pageframe_info(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void free_vcpu_struct(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_vcpu_create(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_vcpu_destroy(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void vcpu_switch_to_aarch64_mode(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_domain_create(struct domain *d,
+                       struct xen_domctl_createdomain *config,
+                       unsigned int flags)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_domain_teardown(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_domain_destroy(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_domain_shutdown(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_domain_pause(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_domain_unpause(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_domain_soft_reset(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_domain_creation_finished(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_set_info_guest(struct vcpu *v, vcpu_guest_context_u c)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    BUG_ON("unimplemented");
+}
+
+int arch_vcpu_reset(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+int domain_relinquish_resources(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_dump_domain_info(struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_dump_vcpu_info(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void vcpu_mark_events_pending(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void vcpu_update_evtchn_irq(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void vcpu_block_unless_event_pending(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void vcpu_kick(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+struct domain *alloc_domain_struct(void)
+{
+    BUG_ON("unimplemented");
+}
+
+struct vcpu *alloc_vcpu_struct(const struct domain *d)
+{
+    BUG_ON("unimplemented");
+}
+
+unsigned long
+hypercall_create_continuation(unsigned int op, const char *format, ...)
+{
+    BUG_ON("unimplemented");
+}
+
+int __init parse_arch_dom0_param(const char *s, const char *e)
+{
+    BUG_ON("unimplemented");
+}
+
+/* guestcopy.c */
+
+unsigned long raw_copy_to_guest(void *to, const void *from, unsigned int len)
+{
+    BUG_ON("unimplemented");
+}
+
+unsigned long raw_copy_from_guest(void *to, const void __user *from,
+                                  unsigned int len)
+{
+    BUG_ON("unimplemented");
+}
+
+/* sysctl.c */
+
+long arch_do_sysctl(struct xen_sysctl *sysctl,
+                    XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
+{
+    BUG_ON("unimplemented");
+}
+
+void arch_do_physinfo(struct xen_sysctl_physinfo *pi)
+{
+    BUG_ON("unimplemented");
+}
+
+/* p2m.c */
+
+int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
+{
+    BUG_ON("unimplemented");
+}
+
+int unmap_mmio_regions(struct domain *d,
+                       gfn_t start_gfn,
+                       unsigned long nr,
+                       mfn_t mfn)
+{
+    BUG_ON("unimplemented");
+}
+
+int map_mmio_regions(struct domain *d,
+                     gfn_t start_gfn,
+                     unsigned long nr,
+                     mfn_t mfn)
+{
+    BUG_ON("unimplemented");
+}
+
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    BUG_ON("unimplemented");
+}
+
+/* Return the size of the pool, in bytes. */
+int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
+{
+    BUG_ON("unimplemented");
+}
+
+/* delay.c */
+
+void udelay(unsigned long usecs)
+{
+    BUG_ON("unimplemented");
+}
+
+/* guest_access.h */ 
+
+static inline unsigned long raw_clear_guest(void *to, unsigned int len)
+{
+    BUG_ON("unimplemented");
+}
+
+/* smpboot.c */
+
+int __cpu_up(unsigned int cpu)
+{
+    BUG_ON("unimplemented");
+}
+
+void __cpu_disable(void)
+{
+    BUG_ON("unimplemented");
+}
+
+void __cpu_die(unsigned int cpu)
+{
+    BUG_ON("unimplemented");
+}
+
+/*
+ * The following functions are defined in common/irq.c, but common/irq.c isn't
+ * built for now. These changes will be removed there when common/irq.c is
+ * ready.
+ */
+
+void cf_check irq_actor_none(struct irq_desc *desc)
+{
+    BUG_ON("unimplemented");
+}
+
+unsigned int cf_check irq_startup_none(struct irq_desc *desc)
+{
+    BUG_ON("unimplemented");
+
+    return 0;
+}
diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
index ccd3593f5a..5415cf8d90 100644
--- a/xen/arch/riscv/traps.c
+++ b/xen/arch/riscv/traps.c
@@ -4,6 +4,10 @@
  *
  * RISC-V Trap handlers
  */
+
+#include <xen/lib.h>
+#include <xen/sched.h>
+
 #include <asm/processor.h>
 #include <asm/traps.h>
 
@@ -11,3 +15,24 @@ void do_trap(struct cpu_user_regs *cpu_regs)
 {
     die();
 }
+
+void vcpu_show_execution_state(struct vcpu *v)
+{
+    BUG_ON("unimplemented");
+}
+
+void show_execution_state(const struct cpu_user_regs *regs)
+{
+    printk("implement show_execution_state(regs)\n");
+}
+
+void arch_hypercall_tasklet_result(struct vcpu *v, long res)
+{
+    BUG_ON("unimplemented");
+}
+
+enum mc_disposition arch_do_multicall_call(struct mc_state *state)
+{
+    BUG_ON("unimplemented");
+    return mc_continue;
+}
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 14/15] xen/riscv: enable full Xen build
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (12 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 13/15] xen/riscv: add minimal amount of stubs to build full Xen Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  2024-05-06 10:15 ` [PATCH v9 15/15] xen/README: add compiler and binutils versions for RISC-V64 Oleksii Kurochko
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V5-V9:
 - Nothing changed. Only rebase.
---
Changes in V4:
 - drop stubs for irq_actor_none() and irq_actor_none() as common/irq.c is compiled now.
 - drop defintion of max_page in stubs.c as common/page_alloc.c is compiled now.
 - drop printk() related changes in riscv/early_printk.c as common version will be used.
---
Changes in V3:
 - Reviewed-by: Jan Beulich <jbeulich@suse.com>
 - unrealted change dropped in tiny64_defconfig
---
Changes in V2:
 - Nothing changed. Only rebase.
---

 xen/arch/riscv/Makefile       |  16 +++-
 xen/arch/riscv/arch.mk        |   4 -
 xen/arch/riscv/early_printk.c | 168 ----------------------------------
 xen/arch/riscv/stubs.c        |  24 -----
 4 files changed, 15 insertions(+), 197 deletions(-)

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 60afbc0ad9..81b77b13d6 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -12,10 +12,24 @@ $(TARGET): $(TARGET)-syms
 	$(OBJCOPY) -O binary -S $< $@
 
 $(TARGET)-syms: $(objtree)/prelink.o $(obj)/xen.lds
-	$(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) -o $@
+	$(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+	    $(objtree)/common/symbols-dummy.o -o $(dot-target).0
+	$(NM) -pa --format=sysv $(dot-target).0 \
+		| $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+		> $(dot-target).0.S
+	$(MAKE) $(build)=$(@D) $(dot-target).0.o
+	$(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< \
+	    $(dot-target).0.o -o $(dot-target).1
+	$(NM) -pa --format=sysv $(dot-target).1 \
+		| $(objtree)/tools/symbols $(all_symbols) --sysv --sort \
+		> $(dot-target).1.S
+	$(MAKE) $(build)=$(@D) $(dot-target).1.o
+	$(LD) $(XEN_LDFLAGS) -T $(obj)/xen.lds -N $< $(build_id_linker) \
+	    $(dot-target).1.o -o $@
 	$(NM) -pa --format=sysv $@ \
 		| $(objtree)/tools/symbols --all-symbols --xensyms --sysv --sort \
 		> $@.map
+	rm -f $(@D)/.$(@F).[0-9]*
 
 $(obj)/xen.lds: $(src)/xen.lds.S FORCE
 	$(call if_changed_dep,cpp_lds_S)
diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk
index 8c071aff65..17827c302c 100644
--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -38,7 +38,3 @@ extensions := $(subst $(space),,$(extensions))
 # -mcmodel=medlow would force Xen into the lower half.
 
 CFLAGS += $(riscv-generic-flags)$(extensions) -mstrict-align -mcmodel=medany
-
-# TODO: Drop override when more of the build is working
-override ALL_OBJS-y = arch/$(SRCARCH)/built_in.o
-override ALL_LIBS-y =
diff --git a/xen/arch/riscv/early_printk.c b/xen/arch/riscv/early_printk.c
index 60742a042d..610c814f54 100644
--- a/xen/arch/riscv/early_printk.c
+++ b/xen/arch/riscv/early_printk.c
@@ -40,171 +40,3 @@ void early_printk(const char *str)
         str++;
     }
 }
-
-/*
- * The following #if 1 ... #endif should be removed after printk
- * and related stuff are ready.
- */
-#if 1
-
-#include <xen/stdarg.h>
-#include <xen/string.h>
-
-/**
- * strlen - Find the length of a string
- * @s: The string to be sized
- */
-size_t (strlen)(const char * s)
-{
-    const char *sc;
-
-    for (sc = s; *sc != '\0'; ++sc)
-        /* nothing */;
-    return sc - s;
-}
-
-/**
- * memcpy - Copy one area of memory to another
- * @dest: Where to copy to
- * @src: Where to copy from
- * @count: The size of the area.
- *
- * You should not use this function to access IO space, use memcpy_toio()
- * or memcpy_fromio() instead.
- */
-void *(memcpy)(void *dest, const void *src, size_t count)
-{
-    char *tmp = (char *) dest, *s = (char *) src;
-
-    while (count--)
-        *tmp++ = *s++;
-
-    return dest;
-}
-
-int vsnprintf(char* str, size_t size, const char* format, va_list args)
-{
-    size_t i = 0; /* Current position in the output string */
-    size_t written = 0; /* Total number of characters written */
-    char* dest = str;
-
-    while ( format[i] != '\0' && written < size - 1 )
-    {
-        if ( format[i] == '%' )
-        {
-            i++;
-
-            if ( format[i] == '\0' )
-                break;
-
-            if ( format[i] == '%' )
-            {
-                if ( written < size - 1 )
-                {
-                    dest[written] = '%';
-                    written++;
-                }
-                i++;
-                continue;
-            }
-
-            /*
-             * Handle format specifiers.
-             * For simplicity, only %s and %d are implemented here.
-             */
-
-            if ( format[i] == 's' )
-            {
-                char* arg = va_arg(args, char*);
-                size_t arglen = strlen(arg);
-
-                size_t remaining = size - written - 1;
-
-                if ( arglen > remaining )
-                    arglen = remaining;
-
-                memcpy(dest + written, arg, arglen);
-
-                written += arglen;
-                i++;
-            }
-            else if ( format[i] == 'd' )
-            {
-                int arg = va_arg(args, int);
-
-                /* Convert the integer to string representation */
-                char numstr[32]; /* Assumes a maximum of 32 digits */
-                int numlen = 0;
-                int num = arg;
-                size_t remaining;
-
-                if ( arg < 0 )
-                {
-                    if ( written < size - 1 )
-                    {
-                        dest[written] = '-';
-                        written++;
-                    }
-
-                    num = -arg;
-                }
-
-                do
-                {
-                    numstr[numlen] = '0' + num % 10;
-                    num = num / 10;
-                    numlen++;
-                } while ( num > 0 );
-
-                /* Reverse the string */
-                for (int j = 0; j < numlen / 2; j++)
-                {
-                    char tmp = numstr[j];
-                    numstr[j] = numstr[numlen - 1 - j];
-                    numstr[numlen - 1 - j] = tmp;
-                }
-
-                remaining = size - written - 1;
-
-                if ( numlen > remaining )
-                    numlen = remaining;
-
-                memcpy(dest + written, numstr, numlen);
-
-                written += numlen;
-                i++;
-            }
-        }
-        else
-        {
-            if ( written < size - 1 )
-            {
-                dest[written] = format[i];
-                written++;
-            }
-            i++;
-        }
-    }
-
-    if ( size > 0 )
-        dest[written] = '\0';
-
-    return written;
-}
-
-void printk(const char *format, ...)
-{
-    static char buf[1024];
-
-    va_list args;
-    va_start(args, format);
-
-    (void)vsnprintf(buf, sizeof(buf), format, args);
-
-    early_printk(buf);
-
-    va_end(args);
-}
-
-#endif
-
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 8285bcffef..bda35fc347 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -24,12 +24,6 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 
 nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
 
-/*
- * max_page is defined in page_alloc.c which isn't complied for now.
- * definition of max_page will be remove as soon as page_alloc is built.
- */
-unsigned long __read_mostly max_page;
-
 /* time.c */
 
 unsigned long __ro_after_init cpu_khz;  /* CPU clock frequency in kHz. */
@@ -419,21 +413,3 @@ void __cpu_die(unsigned int cpu)
 {
     BUG_ON("unimplemented");
 }
-
-/*
- * The following functions are defined in common/irq.c, but common/irq.c isn't
- * built for now. These changes will be removed there when common/irq.c is
- * ready.
- */
-
-void cf_check irq_actor_none(struct irq_desc *desc)
-{
-    BUG_ON("unimplemented");
-}
-
-unsigned int cf_check irq_startup_none(struct irq_desc *desc)
-{
-    BUG_ON("unimplemented");
-
-    return 0;
-}
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v9 15/15] xen/README: add compiler and binutils versions for RISC-V64
  2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
                   ` (13 preceding siblings ...)
  2024-05-06 10:15 ` [PATCH v9 14/15] xen/riscv: enable full Xen build Oleksii Kurochko
@ 2024-05-06 10:15 ` Oleksii Kurochko
  14 siblings, 0 replies; 36+ messages in thread
From: Oleksii Kurochko @ 2024-05-06 10:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini

This patch doesn't represent a strict lower bound for GCC and
GNU Binutils; rather, these versions are specifically employed by
the Xen RISC-V container and are anticipated to undergo continuous
testing. Older GCC and GNU Binutils would work,
but this is not a guarantee.

While it is feasible to utilize Clang, it's important to note that,
currently, there is no Xen RISC-V CI job in place to verify the
seamless functioning of the build with Clang.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V5-V9:
 - Nothing changed. Only rebase.
---
 Changes in V6:
  - update the message in README.
---
 Changes in V5:
  - update the commit message and README file with additional explanation about GCC and
    GNU Binutils version. Additionally, it was added information about Clang.
---
 Changes in V4:
  - Update version of GCC (12.2) and GNU Binutils (2.39) to the version
    which are in Xen's contrainter for RISC-V
---
 Changes in V3:
  - new patch
---
 README | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README b/README
index c8a108449e..30da5ff9c0 100644
--- a/README
+++ b/README
@@ -48,6 +48,10 @@ provided by your OS distributor:
       - For ARM 64-bit:
         - GCC 5.1 or later
         - GNU Binutils 2.24 or later
+      - For RISC-V 64-bit:
+        - GCC 12.2 or later
+        - GNU Binutils 2.39 or later
+          Older GCC and GNU Binutils would work, but this is not a guarantee.
     * POSIX compatible awk
     * Development install of zlib (e.g., zlib-dev)
     * Development install of Python 2.7 or later (e.g., python-dev)
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-06 10:15 ` [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit() Oleksii Kurochko
@ 2024-05-15  8:52   ` Jan Beulich
  2024-05-15 15:29     ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-15  8:52 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> The following generic functions were introduced:
> * test_bit
> * generic__test_and_set_bit
> * generic__test_and_clear_bit
> * generic__test_and_change_bit
> 
> Also, the patch introduces the following generics which are
> used by the functions mentioned above:
> * BITOP_BITS_PER_WORD
> * BITOP_MASK
> * BITOP_WORD
> * BITOP_TYPE
> 
> These functions and macros can be useful for architectures
> that don't have corresponding arch-specific instructions.

Logically this paragraph may better move ahead of the BITOP_* one.

> Because of that x86 has the following check in the macros test_bit(),
> __test_and_set_bit(), __test_and_clear_bit(), __test_and_change_bit():
>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> It was necessary to make bitop bad size check generic too, so
> arch_check_bitop_size() was introduced.

Not anymore, with the most recent adjustments? There's nothing arch-
specific anymore in the checking.

> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_set_bit(int nr, void *addr)
> +static inline int arch__test_and_set_bit(int nr, volatile void *addr)

I think I raised this point before: Why arch__ here, ...

> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
>   * If two examples of this operation race, one can appear to succeed
>   * but actually fail.  You must protect multiple accesses with a lock.
>   */
> -static inline int __test_and_clear_bit(int nr, void *addr)
> +static inline int arch__test_and_clear_bit(int nr, volatile void *addr)

... here, ...

> @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int nr, void *addr)
>  
>      return oldbit;
>  }
> -#define __test_and_clear_bit(nr, addr) ({               \
> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> -    __test_and_clear_bit(nr, addr);                     \
> -})
> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>  
>  /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr, void *addr)
> +static inline int arch__test_and_change_bit(int nr, volatile void *addr)

... and here, while ...

> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr, const volatile void *addr)
>      return oldbit;
>  }
>  
> -#define test_bit(nr, addr) ({                           \
> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> +#define arch_test_bit(nr, addr) ({                      \

... just arch_ here? I don't like the double underscore infixes very
much, but I'm okay with them as long as they're applied consistently.

> --- a/xen/include/xen/bitops.h
> +++ b/xen/include/xen/bitops.h
> @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long x)
>   * scope
>   */
>  
> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
> +
> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> +
> +extern void __bitop_bad_size(void);
> +
>  /* --------------------- Please tidy above here --------------------- */
>  
>  #include <asm/bitops.h>
>  
> +#ifndef arch_check_bitop_size
> +
> +#define bitop_bad_size(addr) sizeof(*(addr)) < sizeof(bitop_uint_t)

Nit: Missing parentheses around the whole expression.

> +#define arch_check_bitop_size(addr) \
> +    if ( bitop_bad_size(addr) ) __bitop_bad_size();

Apart from the arch_ prefix that now wants dropping, this macro (if we
want one in the first place) also wants writing in a way such that it
can be used as a normal expression, without double semicolons or other
anomalies (potentially) resulting at use sites. E.g.

#define check_bitop_size(addr) do { \
    if ( bitop_bad_size(addr) )     \
        __bitop_bad_size();         \
} while ( 0 )

> +#endif /* arch_check_bitop_size */
> +
> +/**
> + * generic__test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_set_bit(unsigned long nr, volatile void *addr)

The original per-arch functions all use "int" for their first parameter.
Here you use unsigned long, without any mention in the description of the
potential behavioral change. Which is even worse given that then x86 ends
up inconsistent with Arm and PPC in this regard, by ...

> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old | mask;
> +    return (old & mask);
> +}
> +
> +/**
> + * generic__test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail.  You must protect multiple accesses with a lock.
> + */
> +static always_inline bool
> +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old & ~mask;
> +    return (old & mask);
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static always_inline bool
> +generic__test_and_change_bit(unsigned long nr, volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +    bitop_uint_t old = *p;
> +
> +    *p = old ^ mask;
> +    return (old & mask);
> +}
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static always_inline bool generic_test_bit(int nr, const volatile void *addr)
> +{
> +    bitop_uint_t mask = BITOP_MASK(nr);
> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
> +
> +    return (*p & mask);
> +}
> +
> +static always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile void *addr)
> +{
> +#ifndef arch__test_and_set_bit
> +#define arch__test_and_set_bit generic__test_and_set_bit
> +#endif
> +
> +    return arch__test_and_set_bit(nr, addr);

... silently truncating and sign-converting nr here.

As to generic_test_bit() - please don't cast away const-ness there.

> +}
> +#define __test_and_set_bit(nr, addr) ({             \
> +    arch_check_bitop_size(addr);                    \
> +    __test_and_set_bit(nr, addr);                   \
> +})
> +
> +static always_inline bool
> +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)

Oddly enough here at least you use bitop_uint_t, but that's still ...

> +{
> +#ifndef arch__test_and_clear_bit
> +#define arch__test_and_clear_bit generic__test_and_clear_bit
> +#endif
> +
> +    return arch__test_and_clear_bit(nr, addr);

... meaning a signedness conversion on x86 then. And beware: You can't
simply change x86'es code to use bitop_uint_t. The underlying insns used
interpret the bit position as a signed number, i.e. permitting accesses
below the incoming pointer (whether it really makes sense to be that way
is a separate question). I'm afraid I have no good suggestion how to deal
with that: Any approach I can think of is either detrimental to the
generic implementation or would have unwanted effects on the x86 one.
Others, anyone?

> --- a/xen/include/xen/types.h
> +++ b/xen/include/xen/types.h
> @@ -64,6 +64,12 @@ typedef __u64 __be64;
>  
>  typedef unsigned int __attribute__((__mode__(__pointer__))) uintptr_t;
>  
> +#ifndef BITOP_TYPE
> +#define BITOP_BITS_PER_WORD 32
> +
> +typedef uint32_t bitop_uint_t;
> +#endif

I think you mentioned to me before why this needs to live here, not in
xen/bitops.h. Yet I don't recall the reason, and the description (hint,
hint) doesn't say anything either.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-06 10:15 ` [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic Oleksii Kurochko
@ 2024-05-15  9:09   ` Jan Beulich
  2024-05-15 13:55     ` Oleksii K.
  2024-05-17  9:06     ` Oleksii K.
  0 siblings, 2 replies; 36+ messages in thread
From: Jan Beulich @ 2024-05-15  9:09 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> Changes in V9:
>  - update return type of fls and flsl() to unsigned int to be aligned with other
>    bit ops.

But this then needs carrying through to ...

> --- a/xen/arch/arm/include/asm/arm64/bitops.h
> +++ b/xen/arch/arm/include/asm/arm64/bitops.h
> @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long __ffs(unsigned long word)
>   */
>  #define ffz(x)  __ffs(~(x))
>  
> -static inline int flsl(unsigned long x)
> +static inline int arch_flsl(unsigned long x)

... e.g. here. You don't want to introduce signed/unsigned mismatches.

Also why do you keep "inline" here, while ...

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -425,7 +425,7 @@ static always_inline unsigned int arch_ffsl(unsigned long x)
>   *
>   * This is defined the same way as ffs.
>   */
> -static inline int flsl(unsigned long x)
> +static always_inline int arch_flsl(unsigned long x)

... you switch to always_inline here?

(replying out of order)

> --- a/xen/arch/arm/include/asm/arm32/bitops.h
> +++ b/xen/arch/arm/include/asm/arm32/bitops.h
> @@ -1,7 +1,7 @@
>  #ifndef _ARM_ARM32_BITOPS_H
>  #define _ARM_ARM32_BITOPS_H
>  
> -#define flsl fls
> +#define arch_flsl fls

It's the Arm maintainers to ultimately judge, but I'd be inclined to suggest

#define arch_flsl arch_fls

instead. That's not only behaviorally closer to what was there before, but
also reduces (a tiny bit) the amount of work the compiler needs to carry out.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 05/15] xen/riscv: introduce bitops.h
  2024-05-06 10:15 ` [PATCH v9 05/15] xen/riscv: introduce bitops.h Oleksii Kurochko
@ 2024-05-15  9:29   ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2024-05-15  9:29 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> Changes in V9:
>  - add Acked-by: Jan Beulich <jbeulich@suse.com>
>  - drop redefinition of bitop_uint_t in asm/types.h as some operation in Xen common code expects
>    to work with 32-bit quantities.
>  - s/BITS_PER_LONG/BITOP_BITS_PER_WORD in asm/bitops.h around __AMO() macros.

Yet then ...

> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/bitops.h
> @@ -0,0 +1,137 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2012 Regents of the University of California */
> +
> +#ifndef _ASM_RISCV_BITOPS_H
> +#define _ASM_RISCV_BITOPS_H
> +
> +#include <asm/system.h>
> +
> +#if BITOP_BITS_PER_WORD == 64
> +#define __AMO(op)   "amo" #op ".d"
> +#elif BITOP_BITS_PER_WORD == 32
> +#define __AMO(op)   "amo" #op ".w"
> +#else
> +#error "Unexpected BITS_PER_LONG"

... there's a leftover here.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h
  2024-05-06 10:15 ` [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h Oleksii Kurochko
@ 2024-05-15  9:38   ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2024-05-15  9:38 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> @@ -0,0 +1,239 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2014 Regents of the University of California */
> +
> +#ifndef _ASM_RISCV_CMPXCHG_H
> +#define _ASM_RISCV_CMPXCHG_H
> +
> +#include <xen/compiler.h>
> +#include <xen/lib.h>
> +
> +#include <asm/fence.h>
> +#include <asm/io.h>
> +#include <asm/system.h>
> +
> +#define _amoswap_generic(ptr, new, ret, sfx) \
> +    asm volatile ( \
> +        " amoswap" sfx " %0, %2, %1" \
> +        : "=r" (ret), "+A" (*(ptr)) \
> +        : "r" (new) \
> +        : "memory" );
> +
> +/*
> + * For LR and SC, the A extension requires that the address held in rs1 be
> + * naturally aligned to the size of the operand (i.e., eight-byte aligned
> + * for 64-bit words and four-byte aligned for 32-bit words).
> + * If the address is not naturally aligned, an address-misaligned exception
> + * or an access-fault exception will be generated.
> + *
> + * Thereby:
> + * - for 1-byte xchg access the containing word by clearing low two bits.
> + * - for 2-byte xchg access the containing word by clearing bit 1.
> + *
> + * If resulting 4-byte access is still misalgined, it will fault just as
> + * non-emulated 4-byte access would.
> + */
> +#define emulate_xchg_1_2(ptr, new, lr_sfx, sc_sfx) \
> +({ \
> +    uint32_t *aligned_ptr; \
> +    unsigned long alignment_mask = sizeof(*aligned_ptr) - sizeof(*(ptr)); \
> +    unsigned int new_val_bit = \
> +        ((unsigned long)(ptr) & alignment_mask) * BITS_PER_BYTE; \
> +    unsigned long mask = \
> +        GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0) << new_val_bit; \
> +    unsigned int new_ = (new) << new_val_bit; \
> +    unsigned int old; \
> +    unsigned int scratch; \
> +    \
> +    aligned_ptr = (uint32_t *)((unsigned long)(ptr) & ~alignment_mask); \
> +    \
> +    asm volatile ( \
> +        "0: lr.w" lr_sfx " %[old], %[ptr_]\n" \
> +        "   andn  %[scratch], %[old], %[mask]\n" \
> +        "   or   %[scratch], %[scratch], %z[new_]\n" \
> +        "   sc.w" sc_sfx " %[scratch], %[scratch], %[ptr_]\n" \
> +        "   bnez %[scratch], 0b\n" \
> +        : [old] "=&r" (old), [scratch] "=&r" (scratch), \
> +          [ptr_] "+A" (*aligned_ptr) \
> +        : [new_] "rJ" (new_), [mask] "r" (mask) \
> +        : "memory" ); \
> +    \
> +    (__typeof__(*(ptr)))((old & mask) >> new_val_bit); \
> +})
> +
> +/*
> + * This function doesn't exist, so you'll get a linker error
> + * if something tries to do an invalid xchg().
> + */
> +extern unsigned long __bad_xchg(volatile void *ptr, int size);
> +
> +static always_inline unsigned long __xchg(volatile void *ptr, unsigned long new, int size)

Nit: Line too long.

I also find "int size" odd: This can't be negative, can it? Ought to
be "unsigned int" then. Same for __cmpxchg() then.

> +#define xchg(ptr, x) \
> +({ \
> +    __typeof__(*(ptr)) n_ = (x); \
> +    (__typeof__(*(ptr))) \
> +        __xchg((ptr), (unsigned long)n_, sizeof(*(ptr))); \

Nit: I'm pretty sure I mentioned before that the parentheses around
the first argument are unnecessary. Same for cmpxchg() then.

With these taken care of
Acked-by: Jan Beulich <jbeulich@suse.com>

> --- a/xen/arch/riscv/include/asm/config.h
> +++ b/xen/arch/riscv/include/asm/config.h
> @@ -119,6 +119,8 @@
>  
>  #define BITS_PER_LLONG 64
>  
> +#define BITS_PER_BYTE 8
> +
>  /* xen_ulong_t is always 64 bits */
>  #define BITS_PER_XEN_ULONG 64

I'm in no way going to insist, but imo this would better be inserted
ahead of BYTES_PER_LONG.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 07/15] xen/riscv: introduce atomic.h
  2024-05-06 10:15 ` [PATCH v9 07/15] xen/riscv: introduce atomic.h Oleksii Kurochko
@ 2024-05-15  9:49   ` Jan Beulich
  2024-05-15 13:59     ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-15  9:49 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 06.05.2024 12:15, Oleksii Kurochko wrote:
> Changes in V9:
>  - update the defintion of write_atomic macros:
>    drop the return value as this macros isn't expeceted to return something
>    drop unnessary parentheses around p.

Yet then what about add_sized()? With that also tidied
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-15  9:09   ` Jan Beulich
@ 2024-05-15 13:55     ` Oleksii K.
  2024-05-15 14:07       ` Jan Beulich
  2024-05-17  9:06     ` Oleksii K.
  1 sibling, 1 reply; 36+ messages in thread
From: Oleksii K. @ 2024-05-15 13:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On Wed, 2024-05-15 at 11:09 +0200, Jan Beulich wrote:
> On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > Changes in V9:
> >  - update return type of fls and flsl() to unsigned int to be
> > aligned with other
> >    bit ops.
> 
> But this then needs carrying through to ...
> 
> > --- a/xen/arch/arm/include/asm/arm64/bitops.h
> > +++ b/xen/arch/arm/include/asm/arm64/bitops.h
> > @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long
> > __ffs(unsigned long word)
> >   */
> >  #define ffz(x)  __ffs(~(x))
> >  
> > -static inline int flsl(unsigned long x)
> > +static inline int arch_flsl(unsigned long x)
> 
> ... e.g. here. You don't want to introduce signed/unsigned
> mismatches.
Do you mean that generic flsl() uses 'unsigned int' as a return type,
but arch_flsl continue to use 'int'?

> 
> Also why do you keep "inline" here, while ...
> 
> > --- a/xen/arch/x86/include/asm/bitops.h
> > +++ b/xen/arch/x86/include/asm/bitops.h
> > @@ -425,7 +425,7 @@ static always_inline unsigned int
> > arch_ffsl(unsigned long x)
> >   *
> >   * This is defined the same way as ffs.
> >   */
> > -static inline int flsl(unsigned long x)
> > +static always_inline int arch_flsl(unsigned long x)
> 
> ... you switch to always_inline here?
Because Adnrew's patch with bitops.h for x86 changes to always_inline,
so to be consistent, at least, for architecture.

~ Oleksii

> 
> (replying out of order)
> 
> > --- a/xen/arch/arm/include/asm/arm32/bitops.h
> > +++ b/xen/arch/arm/include/asm/arm32/bitops.h
> > @@ -1,7 +1,7 @@
> >  #ifndef _ARM_ARM32_BITOPS_H
> >  #define _ARM_ARM32_BITOPS_H
> >  
> > -#define flsl fls
> > +#define arch_flsl fls
> 
> It's the Arm maintainers to ultimately judge, but I'd be inclined to
> suggest
> 
> #define arch_flsl arch_fls
> 
> instead. That's not only behaviorally closer to what was there
> before, but
> also reduces (a tiny bit) the amount of work the compiler needs to
> carry out.
> 
> Jan



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 07/15] xen/riscv: introduce atomic.h
  2024-05-15  9:49   ` Jan Beulich
@ 2024-05-15 13:59     ` Oleksii K.
  0 siblings, 0 replies; 36+ messages in thread
From: Oleksii K. @ 2024-05-15 13:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On Wed, 2024-05-15 at 11:49 +0200, Jan Beulich wrote:
> On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > Changes in V9:
> >  - update the defintion of write_atomic macros:
> >    drop the return value as this macros isn't expeceted to return
> > something
> >    drop unnessary parentheses around p.
> 
> Yet then what about add_sized()? With that also tidied
> Acked-by: Jan Beulich <jbeulich@suse.com>
Sure, parentheses around p in add_sized() should be dropped too.
I will update the patch in the next version of the patch series.

~ Oleksii


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-15 13:55     ` Oleksii K.
@ 2024-05-15 14:07       ` Jan Beulich
  2024-05-15 15:31         ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-15 14:07 UTC (permalink / raw)
  To: Oleksii K.
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On 15.05.2024 15:55, Oleksii K. wrote:
> On Wed, 2024-05-15 at 11:09 +0200, Jan Beulich wrote:
>> On 06.05.2024 12:15, Oleksii Kurochko wrote:
>>> Changes in V9:
>>>  - update return type of fls and flsl() to unsigned int to be
>>> aligned with other
>>>    bit ops.
>>
>> But this then needs carrying through to ...
>>
>>> --- a/xen/arch/arm/include/asm/arm64/bitops.h
>>> +++ b/xen/arch/arm/include/asm/arm64/bitops.h
>>> @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long
>>> __ffs(unsigned long word)
>>>   */
>>>  #define ffz(x)  __ffs(~(x))
>>>  
>>> -static inline int flsl(unsigned long x)
>>> +static inline int arch_flsl(unsigned long x)
>>
>> ... e.g. here. You don't want to introduce signed/unsigned
>> mismatches.
> Do you mean that generic flsl() uses 'unsigned int' as a return type,
> but arch_flsl continue to use 'int'?

Yes.

>> Also why do you keep "inline" here, while ...
>>
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -425,7 +425,7 @@ static always_inline unsigned int
>>> arch_ffsl(unsigned long x)
>>>   *
>>>   * This is defined the same way as ffs.
>>>   */
>>> -static inline int flsl(unsigned long x)
>>> +static always_inline int arch_flsl(unsigned long x)
>>
>> ... you switch to always_inline here?
> Because Adnrew's patch with bitops.h for x86 changes to always_inline,
> so to be consistent, at least, for architecture.

And why not extend this to Arm?

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header
  2024-05-06 10:15 ` [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header Oleksii Kurochko
@ 2024-05-15 14:34   ` Michal Orzel
  2024-05-15 15:08   ` Rahul Singh
  1 sibling, 0 replies; 36+ messages in thread
From: Michal Orzel @ 2024-05-15 14:34 UTC (permalink / raw)
  To: Oleksii Kurochko, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Shawn Anastasio, Rahul Singh



On 06/05/2024 12:15, Oleksii Kurochko wrote:
> 
> 
> The mentioned macros exist only because of Linux compatible purpose.
> 
> The patch defines __ffs() in terms of Xen bitops and it is safe
> to define in this way ( as __ffs() - 1 ) as considering that __ffs()
> was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
> so it is assumed that such cases are not encountered in the current code.
> 
> To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
> were defined locally in find-next-bit.c.
> 
> Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
> in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
> this file, but to keep smmu-v3.c looks close to linux it was deciced just
> to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

~Michal



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header
  2024-05-06 10:15 ` [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header Oleksii Kurochko
  2024-05-15 14:34   ` Michal Orzel
@ 2024-05-15 15:08   ` Rahul Singh
  1 sibling, 0 replies; 36+ messages in thread
From: Rahul Singh @ 2024-05-15 15:08 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Shawn Anastasio

Hi Oleksii,

> On 6 May 2024, at 11:15 AM, Oleksii Kurochko <oleksii.kurochko@gmail.com> wrote:
> 
> The mentioned macros exist only because of Linux compatible purpose.
> 
> The patch defines __ffs() in terms of Xen bitops and it is safe
> to define in this way ( as __ffs() - 1 ) as considering that __ffs()
> was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
> so it is assumed that such cases are not encountered in the current code.
> 
> To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
> were defined locally in find-next-bit.c.
> 
> Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
> in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
> this file, but to keep smmu-v3.c looks close to linux it was deciced just
> to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

For SMMUv3 changes:
Acked-by: Rahul Singh <rahul.singh@arm.com>

Regards,
Rahul
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-15  8:52   ` Jan Beulich
@ 2024-05-15 15:29     ` Oleksii K.
  2024-05-15 15:41       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Oleksii K. @ 2024-05-15 15:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > The following generic functions were introduced:
> > * test_bit
> > * generic__test_and_set_bit
> > * generic__test_and_clear_bit
> > * generic__test_and_change_bit
> > 
> > Also, the patch introduces the following generics which are
> > used by the functions mentioned above:
> > * BITOP_BITS_PER_WORD
> > * BITOP_MASK
> > * BITOP_WORD
> > * BITOP_TYPE
> > 
> > These functions and macros can be useful for architectures
> > that don't have corresponding arch-specific instructions.
> 
> Logically this paragraph may better move ahead of the BITOP_* one.
> 
> > Because of that x86 has the following check in the macros
> > test_bit(),
> > __test_and_set_bit(), __test_and_clear_bit(),
> > __test_and_change_bit():
> >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > It was necessary to make bitop bad size check generic too, so
> > arch_check_bitop_size() was introduced.
> 
> Not anymore, with the most recent adjustments? There's nothing arch-
> specific anymore in the checking.
> 
> > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
> > volatile void *addr)
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > -static inline int __test_and_set_bit(int nr, void *addr)
> > +static inline int arch__test_and_set_bit(int nr, volatile void
> > *addr)
> 
> I think I raised this point before: Why arch__ here, ...
> 
> > @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr,
> > volatile void *addr)
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > -static inline int __test_and_clear_bit(int nr, void *addr)
> > +static inline int arch__test_and_clear_bit(int nr, volatile void
> > *addr)
> 
> ... here, ...
> 
> > @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int
> > nr, void *addr)
> >  
> >      return oldbit;
> >  }
> > -#define __test_and_clear_bit(nr, addr) ({               \
> > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > -    __test_and_clear_bit(nr, addr);                     \
> > -})
> > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> >  
> >  /* WARNING: non atomic and it can be reordered! */
> > -static inline int __test_and_change_bit(int nr, void *addr)
> > +static inline int arch__test_and_change_bit(int nr, volatile void
> > *addr)
> 
> ... and here, while ...
> 
> > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
> > const volatile void *addr)
> >      return oldbit;
> >  }
> >  
> > -#define test_bit(nr, addr) ({                           \
> > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > +#define arch_test_bit(nr, addr) ({                      \
> 
> ... just arch_ here? I don't like the double underscore infixes very
> much, but I'm okay with them as long as they're applied consistently.

Common code and x86 use __test_and_clear_bit(), and this patch provides
a generic version of __test_and_clear_bit(). To emphasize that
generic__test_and_clear_bit() is a common implementation of
__test_and_clear_bit(), the double underscore was retained. Also,
test_and_clear_bit() exists and if one day it will be needed to provide
a generic version of it, then it will be needed to have
generic__test_and_clear_bit() and generic_test_and_clear_bit()

A similar logic was chosen for test_bit.


> 
> > --- a/xen/include/xen/bitops.h
> > +++ b/xen/include/xen/bitops.h
> > @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long
> > x)
> >   * scope
> >   */
> >  
> > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > BITOP_BITS_PER_WORD))
> > +
> > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > +
> > +extern void __bitop_bad_size(void);
> > +
> >  /* --------------------- Please tidy above here ------------------
> > --- */
> >  
> >  #include <asm/bitops.h>
> >  
> > +#ifndef arch_check_bitop_size
> > +
> > +#define bitop_bad_size(addr) sizeof(*(addr)) <
> > sizeof(bitop_uint_t)
> 
> Nit: Missing parentheses around the whole expression.
> 
> > +#define arch_check_bitop_size(addr) \
> > +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
> 
> Apart from the arch_ prefix that now wants dropping, this macro (if
> we
> want one in the first place) 
What do you mean by 'want' here? I thought it is pretty necessary from
safety point of view to have this check.
Except arch_ prefix does it make sense to drop "#ifndef
arch_check_bitop_size" around this macros?

> also wants writing in a way such that it
> can be used as a normal expression, without double semicolons or
> other
> anomalies (potentially) resulting at use sites. E.g.
> 
> #define check_bitop_size(addr) do { \
>     if ( bitop_bad_size(addr) )     \
>         __bitop_bad_size();         \
> } while ( 0 )
> 
> > +#endif /* arch_check_bitop_size */
> > +
> > +/**
> > + * generic__test_and_set_bit - Set a bit and return its old value
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> > +static always_inline bool
> > +generic__test_and_set_bit(unsigned long nr, volatile void *addr)
> 
> The original per-arch functions all use "int" for their first
> parameter.
> Here you use unsigned long, without any mention in the description of
> the
> potential behavioral change. Which is even worse given that then x86
> ends
> up inconsistent with Arm and PPC in this regard, by ...
> 
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old | mask;
> > +    return (old & mask);
> > +}
> > +
> > +/**
> > + * generic__test_and_clear_bit - Clear a bit and return its old
> > value
> > + * @nr: Bit to clear
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> > +static always_inline bool
> > +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old & ~mask;
> > +    return (old & mask);
> > +}
> > +
> > +/* WARNING: non atomic and it can be reordered! */
> > +static always_inline bool
> > +generic__test_and_change_bit(unsigned long nr, volatile void
> > *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old ^ mask;
> > +    return (old & mask);
> > +}
> > +/**
> > + * generic_test_bit - Determine whether a bit is set
> > + * @nr: bit number to test
> > + * @addr: Address to start counting from
> > + */
> > +static always_inline bool generic_test_bit(int nr, const volatile
> > void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +
> > +    return (*p & mask);
> > +}
> > +
> > +static always_inline bool
> > +__test_and_set_bit(unsigned long nr, volatile void *addr)
> > +{
> > +#ifndef arch__test_and_set_bit
> > +#define arch__test_and_set_bit generic__test_and_set_bit
> > +#endif
> > +
> > +    return arch__test_and_set_bit(nr, addr);
> 
> ... silently truncating and sign-converting nr here.
Missed that fact. AFAIU there is no specific reason for bit number to
be 'int' for this function, so it makes sense to update x86's prototype
of arch__test_and_set_bit() to:
   static inline int arch__test_and_set_bit(unsigned long nr, volatile
   void *addr).
   
But probably I can't use 'unsigned long' here too, as it should a
compilation error around 'btsl' instruction.

So it can be or 'unsinged int' or 'int'.
> 
> As to generic_test_bit() - please don't cast away const-ness there.
> 
> > +}
> > +#define __test_and_set_bit(nr, addr) ({             \
> > +    arch_check_bitop_size(addr);                    \
> > +    __test_and_set_bit(nr, addr);                   \
> > +})
> > +
> > +static always_inline bool
> > +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> 
> Oddly enough here at least you use bitop_uint_t, but that's still ...
> 
> > +{
> > +#ifndef arch__test_and_clear_bit
> > +#define arch__test_and_clear_bit generic__test_and_clear_bit
> > +#endif
> > +
> > +    return arch__test_and_clear_bit(nr, addr);
> 
> ... meaning a signedness conversion on x86 then. And beware: You
> can't
> simply change x86'es code to use bitop_uint_t. The underlying insns
> used
> interpret the bit position as a signed number, i.e. permitting
> accesses
> below the incoming pointer (whether it really makes sense to be that
> way
> is a separate question). I'm afraid I have no good suggestion how to
> deal
> with that: Any approach I can think of is either detrimental to the
> generic implementation or would have unwanted effects on the x86 one.
> Others, anyone?
Is the signedness conversion here a big problem? I suppose no one will
pass a negative number to nr.

It seems to me that it would be better to use 'int' everywhere and not
use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
provides enough range for possible bit number.

> 
> > --- a/xen/include/xen/types.h
> > +++ b/xen/include/xen/types.h
> > @@ -64,6 +64,12 @@ typedef __u64 __be64;
> >  
> >  typedef unsigned int __attribute__((__mode__(__pointer__)))
> > uintptr_t;
> >  
> > +#ifndef BITOP_TYPE
> > +#define BITOP_BITS_PER_WORD 32
> > +
> > +typedef uint32_t bitop_uint_t;
> > +#endif
> 
> I think you mentioned to me before why this needs to live here, not
> in
> xen/bitops.h. Yet I don't recall the reason, and the description
> (hint,
> hint) doesn't say anything either.
If I remember correctly ( after this phrase I think I have to update
the description ) the reason was that if I put that to xen/bitops.h:

    ...
    #include <asm/bitops.h>
    
    #ifndef BITOP_TYPE
    #define BITOP_BITS_PER_WORD 32
    /* typedef uint32_t bitop_uint_t; */
    #endif
    ...

Then we will have an issue that we can't use the generic definition of 
BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is defined
after inclusion of <asm/bitops.h>.

But if to put it to <xen/types.h> then such problem won't occur.

If it still makes sense, then I'll update the description in the next
patch version.

~ Oleksii

> 
> Jan



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-15 14:07       ` Jan Beulich
@ 2024-05-15 15:31         ` Oleksii K.
  0 siblings, 0 replies; 36+ messages in thread
From: Oleksii K. @ 2024-05-15 15:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On Wed, 2024-05-15 at 16:07 +0200, Jan Beulich wrote:
> On 15.05.2024 15:55, Oleksii K. wrote:
> > On Wed, 2024-05-15 at 11:09 +0200, Jan Beulich wrote:
> > > On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > > > Changes in V9:
> > > >  - update return type of fls and flsl() to unsigned int to be
> > > > aligned with other
> > > >    bit ops.
> > > 
> > > But this then needs carrying through to ...
> > > 
> > > > --- a/xen/arch/arm/include/asm/arm64/bitops.h
> > > > +++ b/xen/arch/arm/include/asm/arm64/bitops.h
> > > > @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long
> > > > __ffs(unsigned long word)
> > > >   */
> > > >  #define ffz(x)  __ffs(~(x))
> > > >  
> > > > -static inline int flsl(unsigned long x)
> > > > +static inline int arch_flsl(unsigned long x)
> > > 
> > > ... e.g. here. You don't want to introduce signed/unsigned
> > > mismatches.
> > Do you mean that generic flsl() uses 'unsigned int' as a return
> > type,
> > but arch_flsl continue to use 'int'?
> 
> Yes.
> 
> > > Also why do you keep "inline" here, while ...
> > > 
> > > > --- a/xen/arch/x86/include/asm/bitops.h
> > > > +++ b/xen/arch/x86/include/asm/bitops.h
> > > > @@ -425,7 +425,7 @@ static always_inline unsigned int
> > > > arch_ffsl(unsigned long x)
> > > >   *
> > > >   * This is defined the same way as ffs.
> > > >   */
> > > > -static inline int flsl(unsigned long x)
> > > > +static always_inline int arch_flsl(unsigned long x)
> > > 
> > > ... you switch to always_inline here?
> > Because Adnrew's patch with bitops.h for x86 changes to
> > always_inline,
> > so to be consistent, at least, for architecture.
> 
> And why not extend this to Arm?
No specific reason, just wanted to do minimal amount of necessary
changes. I'll do that in the the next patch version.

~ Oleksii


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-15 15:29     ` Oleksii K.
@ 2024-05-15 15:41       ` Jan Beulich
  2024-05-15 17:03         ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-15 15:41 UTC (permalink / raw)
  To: Oleksii K.
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On 15.05.2024 17:29, Oleksii K. wrote:
> On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
>> On 06.05.2024 12:15, Oleksii Kurochko wrote:
>>> The following generic functions were introduced:
>>> * test_bit
>>> * generic__test_and_set_bit
>>> * generic__test_and_clear_bit
>>> * generic__test_and_change_bit
>>>
>>> Also, the patch introduces the following generics which are
>>> used by the functions mentioned above:
>>> * BITOP_BITS_PER_WORD
>>> * BITOP_MASK
>>> * BITOP_WORD
>>> * BITOP_TYPE
>>>
>>> These functions and macros can be useful for architectures
>>> that don't have corresponding arch-specific instructions.
>>
>> Logically this paragraph may better move ahead of the BITOP_* one.
>>
>>> Because of that x86 has the following check in the macros
>>> test_bit(),
>>> __test_and_set_bit(), __test_and_clear_bit(),
>>> __test_and_change_bit():
>>>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>> It was necessary to make bitop bad size check generic too, so
>>> arch_check_bitop_size() was introduced.
>>
>> Not anymore, with the most recent adjustments? There's nothing arch-
>> specific anymore in the checking.
>>
>>> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
>>> volatile void *addr)
>>>   * If two examples of this operation race, one can appear to
>>> succeed
>>>   * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>>   */
>>> -static inline int __test_and_set_bit(int nr, void *addr)
>>> +static inline int arch__test_and_set_bit(int nr, volatile void
>>> *addr)
>>
>> I think I raised this point before: Why arch__ here, ...
>>
>>> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int nr,
>>> volatile void *addr)
>>>   * If two examples of this operation race, one can appear to
>>> succeed
>>>   * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>>   */
>>> -static inline int __test_and_clear_bit(int nr, void *addr)
>>> +static inline int arch__test_and_clear_bit(int nr, volatile void
>>> *addr)
>>
>> ... here, ...
>>
>>> @@ -243,13 +237,10 @@ static inline int __test_and_clear_bit(int
>>> nr, void *addr)
>>>  
>>>      return oldbit;
>>>  }
>>> -#define __test_and_clear_bit(nr, addr) ({               \
>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>> -    __test_and_clear_bit(nr, addr);                     \
>>> -})
>>> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>>>  
>>>  /* WARNING: non atomic and it can be reordered! */
>>> -static inline int __test_and_change_bit(int nr, void *addr)
>>> +static inline int arch__test_and_change_bit(int nr, volatile void
>>> *addr)
>>
>> ... and here, while ...
>>
>>> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
>>> const volatile void *addr)
>>>      return oldbit;
>>>  }
>>>  
>>> -#define test_bit(nr, addr) ({                           \
>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>> +#define arch_test_bit(nr, addr) ({                      \
>>
>> ... just arch_ here? I don't like the double underscore infixes very
>> much, but I'm okay with them as long as they're applied consistently.
> 
> Common code and x86 use __test_and_clear_bit(), and this patch provides
> a generic version of __test_and_clear_bit(). To emphasize that
> generic__test_and_clear_bit() is a common implementation of
> __test_and_clear_bit(), the double underscore was retained. Also,
> test_and_clear_bit() exists and if one day it will be needed to provide
> a generic version of it, then it will be needed to have
> generic__test_and_clear_bit() and generic_test_and_clear_bit()
> 
> A similar logic was chosen for test_bit.

Right, but in all of your reply arch_ doesn't appear at all. Yet the
question was: Why then not arch__test_bit(), to match the other arch
helpers?

>>> --- a/xen/include/xen/bitops.h
>>> +++ b/xen/include/xen/bitops.h
>>> @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned long
>>> x)
>>>   * scope
>>>   */
>>>  
>>> +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
>>> BITOP_BITS_PER_WORD))
>>> +
>>> +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
>>> +
>>> +extern void __bitop_bad_size(void);
>>> +
>>>  /* --------------------- Please tidy above here ------------------
>>> --- */
>>>  
>>>  #include <asm/bitops.h>
>>>  
>>> +#ifndef arch_check_bitop_size
>>> +
>>> +#define bitop_bad_size(addr) sizeof(*(addr)) <
>>> sizeof(bitop_uint_t)
>>
>> Nit: Missing parentheses around the whole expression.
>>
>>> +#define arch_check_bitop_size(addr) \
>>> +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>
>> Apart from the arch_ prefix that now wants dropping, this macro (if
>> we
>> want one in the first place) 
> What do you mean by 'want' here? I thought it is pretty necessary from
> safety point of view to have this check.

I don't question the check. What I was questioning is the need for a
macro to wrap that, seeing how x86 did without. I'm not outright
objecting to such a macro, though.

> Except arch_ prefix does it make sense to drop "#ifndef
> arch_check_bitop_size" around this macros?

Of course, as with arch_ dropped from the name there's no intention to
allow arch overrides.

>>> +/**
>>> + * generic__test_and_set_bit - Set a bit and return its old value
>>> + * @nr: Bit to set
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>> +static always_inline bool
>>> +generic__test_and_set_bit(unsigned long nr, volatile void *addr)
>>
>> The original per-arch functions all use "int" for their first
>> parameter.
>> Here you use unsigned long, without any mention in the description of
>> the
>> potential behavioral change. Which is even worse given that then x86
>> ends
>> up inconsistent with Arm and PPC in this regard, by ...
>>
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old | mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/**
>>> + * generic__test_and_clear_bit - Clear a bit and return its old
>>> value
>>> + * @nr: Bit to clear
>>> + * @addr: Address to count from
>>> + *
>>> + * This operation is non-atomic and can be reordered.
>>> + * If two examples of this operation race, one can appear to
>>> succeed
>>> + * but actually fail.  You must protect multiple accesses with a
>>> lock.
>>> + */
>>> +static always_inline bool
>>> +generic__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old & ~mask;
>>> +    return (old & mask);
>>> +}
>>> +
>>> +/* WARNING: non atomic and it can be reordered! */
>>> +static always_inline bool
>>> +generic__test_and_change_bit(unsigned long nr, volatile void
>>> *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +    bitop_uint_t old = *p;
>>> +
>>> +    *p = old ^ mask;
>>> +    return (old & mask);
>>> +}
>>> +/**
>>> + * generic_test_bit - Determine whether a bit is set
>>> + * @nr: bit number to test
>>> + * @addr: Address to start counting from
>>> + */
>>> +static always_inline bool generic_test_bit(int nr, const volatile
>>> void *addr)
>>> +{
>>> +    bitop_uint_t mask = BITOP_MASK(nr);
>>> +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
>>> BITOP_WORD(nr);
>>> +
>>> +    return (*p & mask);
>>> +}
>>> +
>>> +static always_inline bool
>>> +__test_and_set_bit(unsigned long nr, volatile void *addr)
>>> +{
>>> +#ifndef arch__test_and_set_bit
>>> +#define arch__test_and_set_bit generic__test_and_set_bit
>>> +#endif
>>> +
>>> +    return arch__test_and_set_bit(nr, addr);
>>
>> ... silently truncating and sign-converting nr here.
> Missed that fact. AFAIU there is no specific reason for bit number to
> be 'int' for this function, so it makes sense to update x86's prototype
> of arch__test_and_set_bit() to:
>    static inline int arch__test_and_set_bit(unsigned long nr, volatile
>    void *addr).
>    
> But probably I can't use 'unsigned long' here too, as it should a
> compilation error around 'btsl' instruction.
> 
> So it can be or 'unsinged int' or 'int'.
>>
>> As to generic_test_bit() - please don't cast away const-ness there.
>>
>>> +}
>>> +#define __test_and_set_bit(nr, addr) ({             \
>>> +    arch_check_bitop_size(addr);                    \
>>> +    __test_and_set_bit(nr, addr);                   \
>>> +})
>>> +
>>> +static always_inline bool
>>> +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
>>
>> Oddly enough here at least you use bitop_uint_t, but that's still ...
>>
>>> +{
>>> +#ifndef arch__test_and_clear_bit
>>> +#define arch__test_and_clear_bit generic__test_and_clear_bit
>>> +#endif
>>> +
>>> +    return arch__test_and_clear_bit(nr, addr);
>>
>> ... meaning a signedness conversion on x86 then. And beware: You
>> can't
>> simply change x86'es code to use bitop_uint_t. The underlying insns
>> used
>> interpret the bit position as a signed number, i.e. permitting
>> accesses
>> below the incoming pointer (whether it really makes sense to be that
>> way
>> is a separate question). I'm afraid I have no good suggestion how to
>> deal
>> with that: Any approach I can think of is either detrimental to the
>> generic implementation or would have unwanted effects on the x86 one.
>> Others, anyone?
> Is the signedness conversion here a big problem? I suppose no one will
> pass a negative number to nr.
> 
> It seems to me that it would be better to use 'int' everywhere and not
> use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
> provides enough range for possible bit number.

Indeed, and that's then hopefully less of a risk as to introducing hard
to spot issues. Provided Arm and PPC are okay with that type then as well.

>>> --- a/xen/include/xen/types.h
>>> +++ b/xen/include/xen/types.h
>>> @@ -64,6 +64,12 @@ typedef __u64 __be64;
>>>  
>>>  typedef unsigned int __attribute__((__mode__(__pointer__)))
>>> uintptr_t;
>>>  
>>> +#ifndef BITOP_TYPE
>>> +#define BITOP_BITS_PER_WORD 32
>>> +
>>> +typedef uint32_t bitop_uint_t;
>>> +#endif
>>
>> I think you mentioned to me before why this needs to live here, not
>> in
>> xen/bitops.h. Yet I don't recall the reason, and the description
>> (hint,
>> hint) doesn't say anything either.
> If I remember correctly ( after this phrase I think I have to update
> the description ) the reason was that if I put that to xen/bitops.h:
> 
>     ...
>     #include <asm/bitops.h>
>     
>     #ifndef BITOP_TYPE
>     #define BITOP_BITS_PER_WORD 32
>     /* typedef uint32_t bitop_uint_t; */
>     #endif
>     ...
> 
> Then we will have an issue that we can't use the generic definition of 
> BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is defined
> after inclusion of <asm/bitops.h>.
> 
> But if to put it to <xen/types.h> then such problem won't occur.
> 
> If it still makes sense, then I'll update the description in the next
> patch version.

I see. But we don't need to allow for arch overrides here anymore, so in
xen/bitops.h couldn't we as well have

#define BITOP_BITS_PER_WORD 32
typedef uint32_t bitop_uint_t;

... (if necessary)

#include <asm/bitops.h>

?

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-15 15:41       ` Jan Beulich
@ 2024-05-15 17:03         ` Oleksii K.
  2024-05-16  7:04           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Oleksii K. @ 2024-05-15 17:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
> On 15.05.2024 17:29, Oleksii K. wrote:
> > On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> > > On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > > > The following generic functions were introduced:
> > > > * test_bit
> > > > * generic__test_and_set_bit
> > > > * generic__test_and_clear_bit
> > > > * generic__test_and_change_bit
> > > > 
> > > > Also, the patch introduces the following generics which are
> > > > used by the functions mentioned above:
> > > > * BITOP_BITS_PER_WORD
> > > > * BITOP_MASK
> > > > * BITOP_WORD
> > > > * BITOP_TYPE
> > > > 
> > > > These functions and macros can be useful for architectures
> > > > that don't have corresponding arch-specific instructions.
> > > 
> > > Logically this paragraph may better move ahead of the BITOP_*
> > > one.
> > > 
> > > > Because of that x86 has the following check in the macros
> > > > test_bit(),
> > > > __test_and_set_bit(), __test_and_clear_bit(),
> > > > __test_and_change_bit():
> > > >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > > It was necessary to make bitop bad size check generic too, so
> > > > arch_check_bitop_size() was introduced.
> > > 
> > > Not anymore, with the most recent adjustments? There's nothing
> > > arch-
> > > specific anymore in the checking.
> > > 
> > > > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
> > > > volatile void *addr)
> > > >   * If two examples of this operation race, one can appear to
> > > > succeed
> > > >   * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > >   */
> > > > -static inline int __test_and_set_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_set_bit(int nr, volatile void
> > > > *addr)
> > > 
> > > I think I raised this point before: Why arch__ here, ...
> > > 
> > > > @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int
> > > > nr,
> > > > volatile void *addr)
> > > >   * If two examples of this operation race, one can appear to
> > > > succeed
> > > >   * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > >   */
> > > > -static inline int __test_and_clear_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_clear_bit(int nr, volatile
> > > > void
> > > > *addr)
> > > 
> > > ... here, ...
> > > 
> > > > @@ -243,13 +237,10 @@ static inline int
> > > > __test_and_clear_bit(int
> > > > nr, void *addr)
> > > >  
> > > >      return oldbit;
> > > >  }
> > > > -#define __test_and_clear_bit(nr, addr) ({               \
> > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > -    __test_and_clear_bit(nr, addr);                     \
> > > > -})
> > > > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> > > >  
> > > >  /* WARNING: non atomic and it can be reordered! */
> > > > -static inline int __test_and_change_bit(int nr, void *addr)
> > > > +static inline int arch__test_and_change_bit(int nr, volatile
> > > > void
> > > > *addr)
> > > 
> > > ... and here, while ...
> > > 
> > > > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
> > > > const volatile void *addr)
> > > >      return oldbit;
> > > >  }
> > > >  
> > > > -#define test_bit(nr, addr) ({                           \
> > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > +#define arch_test_bit(nr, addr) ({                      \
> > > 
> > > ... just arch_ here? I don't like the double underscore infixes
> > > very
> > > much, but I'm okay with them as long as they're applied
> > > consistently.
> > 
> > Common code and x86 use __test_and_clear_bit(), and this patch
> > provides
> > a generic version of __test_and_clear_bit(). To emphasize that
> > generic__test_and_clear_bit() is a common implementation of
> > __test_and_clear_bit(), the double underscore was retained. Also,
> > test_and_clear_bit() exists and if one day it will be needed to
> > provide
> > a generic version of it, then it will be needed to have
> > generic__test_and_clear_bit() and generic_test_and_clear_bit()
> > 
> > A similar logic was chosen for test_bit.
> 
> Right, but in all of your reply arch_ doesn't appear at all.
I am a little confused here. According to my logic, should it be
arch___test_and_set_bit() and generic___test_and_set_bit()?

If you are asking why there is no generic implementation for
test_and_clear_bit() without the double underscores, the reason is that
Arm, PPC, and x86 don't use generic code and rely on specific
instructions for this operation. Therefore, I didn't see much sense in
providing a generic version of test_and_clear_bit(), at least, for now.


>  Yet the
> question was: Why then not arch__test_bit(), to match the other arch
> helpers?
Because no one uses __test_bit(). Everywhere is used test_bit().

> 
> > > > --- a/xen/include/xen/bitops.h
> > > > +++ b/xen/include/xen/bitops.h
> > > > @@ -65,10 +65,144 @@ static inline int generic_flsl(unsigned
> > > > long
> > > > x)
> > > >   * scope
> > > >   */
> > > >  
> > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > +
> > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > +
> > > > +extern void __bitop_bad_size(void);
> > > > +
> > > >  /* --------------------- Please tidy above here --------------
> > > > ----
> > > > --- */
> > > >  
> > > >  #include <asm/bitops.h>
> > > >  
> > > > +#ifndef arch_check_bitop_size
> > > > +
> > > > +#define bitop_bad_size(addr) sizeof(*(addr)) <
> > > > sizeof(bitop_uint_t)
> > > 
> > > Nit: Missing parentheses around the whole expression.
> > > 
> > > > +#define arch_check_bitop_size(addr) \
> > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > 
> > > Apart from the arch_ prefix that now wants dropping, this macro
> > > (if
> > > we
> > > want one in the first place) 
> > What do you mean by 'want' here? I thought it is pretty necessary
> > from
> > safety point of view to have this check.
> 
> I don't question the check. What I was questioning is the need for a
> macro to wrap that, seeing how x86 did without. I'm not outright
> objecting to such a macro, though.
We can follow x86 approach as we there's no any more intention to allow
arch overrides.

> 
> > Except arch_ prefix does it make sense to drop "#ifndef
> > arch_check_bitop_size" around this macros?
> 
> Of course, as with arch_ dropped from the name there's no intention
> to
> allow arch overrides.
> 
> > > > +/**
> > > > + * generic__test_and_set_bit - Set a bit and return its old
> > > > value
> > > > + * @nr: Bit to set
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > > +static always_inline bool
> > > > +generic__test_and_set_bit(unsigned long nr, volatile void
> > > > *addr)
> > > 
> > > The original per-arch functions all use "int" for their first
> > > parameter.
> > > Here you use unsigned long, without any mention in the
> > > description of
> > > the
> > > potential behavioral change. Which is even worse given that then
> > > x86
> > > ends
> > > up inconsistent with Arm and PPC in this regard, by ...
> > > 
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old | mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/**
> > > > + * generic__test_and_clear_bit - Clear a bit and return its
> > > > old
> > > > value
> > > > + * @nr: Bit to clear
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > > +static always_inline bool
> > > > +generic__test_and_clear_bit(bitop_uint_t nr, volatile void
> > > > *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old & ~mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/* WARNING: non atomic and it can be reordered! */
> > > > +static always_inline bool
> > > > +generic__test_and_change_bit(unsigned long nr, volatile void
> > > > *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old ^ mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +/**
> > > > + * generic_test_bit - Determine whether a bit is set
> > > > + * @nr: bit number to test
> > > > + * @addr: Address to start counting from
> > > > + */
> > > > +static always_inline bool generic_test_bit(int nr, const
> > > > volatile
> > > > void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +
> > > > +    return (*p & mask);
> > > > +}
> > > > +
> > > > +static always_inline bool
> > > > +__test_and_set_bit(unsigned long nr, volatile void *addr)
> > > > +{
> > > > +#ifndef arch__test_and_set_bit
> > > > +#define arch__test_and_set_bit generic__test_and_set_bit
> > > > +#endif
> > > > +
> > > > +    return arch__test_and_set_bit(nr, addr);
> > > 
> > > ... silently truncating and sign-converting nr here.
> > Missed that fact. AFAIU there is no specific reason for bit number
> > to
> > be 'int' for this function, so it makes sense to update x86's
> > prototype
> > of arch__test_and_set_bit() to:
> >    static inline int arch__test_and_set_bit(unsigned long nr,
> > volatile
> >    void *addr).
> >    
> > But probably I can't use 'unsigned long' here too, as it should a
> > compilation error around 'btsl' instruction.
> > 
> > So it can be or 'unsinged int' or 'int'.
> > > 
> > > As to generic_test_bit() - please don't cast away const-ness
> > > there.
> > > 
> > > > +}
> > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > +    arch_check_bitop_size(addr);                    \
> > > > +    __test_and_set_bit(nr, addr);                   \
> > > > +})
> > > > +
> > > > +static always_inline bool
> > > > +__test_and_clear_bit(bitop_uint_t nr, volatile void *addr)
> > > 
> > > Oddly enough here at least you use bitop_uint_t, but that's still
> > > ...
> > > 
> > > > +{
> > > > +#ifndef arch__test_and_clear_bit
> > > > +#define arch__test_and_clear_bit generic__test_and_clear_bit
> > > > +#endif
> > > > +
> > > > +    return arch__test_and_clear_bit(nr, addr);
> > > 
> > > ... meaning a signedness conversion on x86 then. And beware: You
> > > can't
> > > simply change x86'es code to use bitop_uint_t. The underlying
> > > insns
> > > used
> > > interpret the bit position as a signed number, i.e. permitting
> > > accesses
> > > below the incoming pointer (whether it really makes sense to be
> > > that
> > > way
> > > is a separate question). I'm afraid I have no good suggestion how
> > > to
> > > deal
> > > with that: Any approach I can think of is either detrimental to
> > > the
> > > generic implementation or would have unwanted effects on the x86
> > > one.
> > > Others, anyone?
> > Is the signedness conversion here a big problem? I suppose no one
> > will
> > pass a negative number to nr.
> > 
> > It seems to me that it would be better to use 'int' everywhere and
> > not
> > use bitop_uint_t for 'nr' since it is just a bit position. 'Int'
> > provides enough range for possible bit number.
> 
> Indeed, and that's then hopefully less of a risk as to introducing
> hard
> to spot issues. Provided Arm and PPC are okay with that type then as
> well.
Then I will update prototypes of generic functions correspondingly.

> 
> > > > --- a/xen/include/xen/types.h
> > > > +++ b/xen/include/xen/types.h
> > > > @@ -64,6 +64,12 @@ typedef __u64 __be64;
> > > >  
> > > >  typedef unsigned int __attribute__((__mode__(__pointer__)))
> > > > uintptr_t;
> > > >  
> > > > +#ifndef BITOP_TYPE
> > > > +#define BITOP_BITS_PER_WORD 32
> > > > +
> > > > +typedef uint32_t bitop_uint_t;
> > > > +#endif
> > > 
> > > I think you mentioned to me before why this needs to live here,
> > > not
> > > in
> > > xen/bitops.h. Yet I don't recall the reason, and the description
> > > (hint,
> > > hint) doesn't say anything either.
> > If I remember correctly ( after this phrase I think I have to
> > update
> > the description ) the reason was that if I put that to
> > xen/bitops.h:
> > 
> >     ...
> >     #include <asm/bitops.h>
> >     
> >     #ifndef BITOP_TYPE
> >     #define BITOP_BITS_PER_WORD 32
> >     /* typedef uint32_t bitop_uint_t; */
> >     #endif
> >     ...
> > 
> > Then we will have an issue that we can't use the generic definition
> > of 
> > BITOP_BITS_PER_WORD and bitop_uint_t in asm/bitops.h as it is
> > defined
> > after inclusion of <asm/bitops.h>.
> > 
> > But if to put it to <xen/types.h> then such problem won't occur.
> > 
> > If it still makes sense, then I'll update the description in the
> > next
> > patch version.
> 
> I see. But we don't need to allow for arch overrides here anymore, so
> in
> xen/bitops.h couldn't we as well have
> 
> #define BITOP_BITS_PER_WORD 32
> typedef uint32_t bitop_uint_t;
> 
> ... (if necessary)
> 
> #include <asm/bitops.h>
> 
> ?
Good point. If there is not need to allow for arch overrides then we
can use suggested above approach. Thanks.

~ Oleksii


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-15 17:03         ` Oleksii K.
@ 2024-05-16  7:04           ` Jan Beulich
  2024-05-16 10:34             ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-16  7:04 UTC (permalink / raw)
  To: Oleksii K.
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On 15.05.2024 19:03, Oleksii K. wrote:
> On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
>> On 15.05.2024 17:29, Oleksii K. wrote:
>>> On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
>>>> On 06.05.2024 12:15, Oleksii Kurochko wrote:
>>>>> The following generic functions were introduced:
>>>>> * test_bit
>>>>> * generic__test_and_set_bit
>>>>> * generic__test_and_clear_bit
>>>>> * generic__test_and_change_bit
>>>>>
>>>>> Also, the patch introduces the following generics which are
>>>>> used by the functions mentioned above:
>>>>> * BITOP_BITS_PER_WORD
>>>>> * BITOP_MASK
>>>>> * BITOP_WORD
>>>>> * BITOP_TYPE
>>>>>
>>>>> These functions and macros can be useful for architectures
>>>>> that don't have corresponding arch-specific instructions.
>>>>
>>>> Logically this paragraph may better move ahead of the BITOP_*
>>>> one.
>>>>
>>>>> Because of that x86 has the following check in the macros
>>>>> test_bit(),
>>>>> __test_and_set_bit(), __test_and_clear_bit(),
>>>>> __test_and_change_bit():
>>>>>     if ( bitop_bad_size(addr) ) __bitop_bad_size();
>>>>> It was necessary to make bitop bad size check generic too, so
>>>>> arch_check_bitop_size() was introduced.
>>>>
>>>> Not anymore, with the most recent adjustments? There's nothing
>>>> arch-
>>>> specific anymore in the checking.
>>>>
>>>>> @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int nr,
>>>>> volatile void *addr)
>>>>>   * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>>   * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>>   */
>>>>> -static inline int __test_and_set_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_set_bit(int nr, volatile void
>>>>> *addr)
>>>>
>>>> I think I raised this point before: Why arch__ here, ...
>>>>
>>>>> @@ -232,7 +226,7 @@ static inline int test_and_clear_bit(int
>>>>> nr,
>>>>> volatile void *addr)
>>>>>   * If two examples of this operation race, one can appear to
>>>>> succeed
>>>>>   * but actually fail.  You must protect multiple accesses with
>>>>> a
>>>>> lock.
>>>>>   */
>>>>> -static inline int __test_and_clear_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_clear_bit(int nr, volatile
>>>>> void
>>>>> *addr)
>>>>
>>>> ... here, ...
>>>>
>>>>> @@ -243,13 +237,10 @@ static inline int
>>>>> __test_and_clear_bit(int
>>>>> nr, void *addr)
>>>>>  
>>>>>      return oldbit;
>>>>>  }
>>>>> -#define __test_and_clear_bit(nr, addr) ({               \
>>>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>>>> -    __test_and_clear_bit(nr, addr);                     \
>>>>> -})
>>>>> +#define arch__test_and_clear_bit arch__test_and_clear_bit
>>>>>  
>>>>>  /* WARNING: non atomic and it can be reordered! */
>>>>> -static inline int __test_and_change_bit(int nr, void *addr)
>>>>> +static inline int arch__test_and_change_bit(int nr, volatile
>>>>> void
>>>>> *addr)
>>>>
>>>> ... and here, while ...
>>>>
>>>>> @@ -307,8 +295,7 @@ static inline int variable_test_bit(int nr,
>>>>> const volatile void *addr)
>>>>>      return oldbit;
>>>>>  }
>>>>>  
>>>>> -#define test_bit(nr, addr) ({                           \
>>>>> -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
>>>>> +#define arch_test_bit(nr, addr) ({                      \
>>>>
>>>> ... just arch_ here? I don't like the double underscore infixes
>>>> very
>>>> much, but I'm okay with them as long as they're applied
>>>> consistently.
>>>
>>> Common code and x86 use __test_and_clear_bit(), and this patch
>>> provides
>>> a generic version of __test_and_clear_bit(). To emphasize that
>>> generic__test_and_clear_bit() is a common implementation of
>>> __test_and_clear_bit(), the double underscore was retained. Also,
>>> test_and_clear_bit() exists and if one day it will be needed to
>>> provide
>>> a generic version of it, then it will be needed to have
>>> generic__test_and_clear_bit() and generic_test_and_clear_bit()
>>>
>>> A similar logic was chosen for test_bit.
>>
>> Right, but in all of your reply arch_ doesn't appear at all.
> I am a little confused here. According to my logic, should it be
> arch___test_and_set_bit() and generic___test_and_set_bit()?

Why 3 underscores in a row? I'm clearly not following.

> If you are asking why there is no generic implementation for
> test_and_clear_bit() without the double underscores, the reason is that
> Arm, PPC, and x86 don't use generic code and rely on specific
> instructions for this operation. Therefore, I didn't see much sense in
> providing a generic version of test_and_clear_bit(), at least, for now.

No, there was no question in that direction. And hence ...

>>  Yet the
>> question was: Why then not arch__test_bit(), to match the other arch
>> helpers?
> Because no one uses __test_bit(). Everywhere is used test_bit().

... this seems unrelated (constrained by my earlier lack of following you).

(Later) Wait, maybe I've finally figured it: You use arch__test_and_*()
because those underlie __test_and_*(), but arch_test_bit() because there's
solely test_bit() (same for the generic_* naming). I guess I can accept
that then, despite the slight anomaly you point out, resulting in the
question towards 3 underscores in a row. To clarify, my thinking was more
towards there not possibly being generic forms of test_and_*() (i.e. no
possible set of arch_test_and_*() or generic_test_and_*()), thus using
double inner underscores in arch__test_*() and generic__test_*() to
signify that those are purely internal functions, which aren't supposed to
be called directly.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-16  7:04           ` Jan Beulich
@ 2024-05-16 10:34             ` Oleksii K.
  2024-05-16 10:49               ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Oleksii K. @ 2024-05-16 10:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On Thu, 2024-05-16 at 09:04 +0200, Jan Beulich wrote:
> On 15.05.2024 19:03, Oleksii K. wrote:
> > On Wed, 2024-05-15 at 17:41 +0200, Jan Beulich wrote:
> > > On 15.05.2024 17:29, Oleksii K. wrote:
> > > > On Wed, 2024-05-15 at 10:52 +0200, Jan Beulich wrote:
> > > > > On 06.05.2024 12:15, Oleksii Kurochko wrote:
> > > > > > The following generic functions were introduced:
> > > > > > * test_bit
> > > > > > * generic__test_and_set_bit
> > > > > > * generic__test_and_clear_bit
> > > > > > * generic__test_and_change_bit
> > > > > > 
> > > > > > Also, the patch introduces the following generics which are
> > > > > > used by the functions mentioned above:
> > > > > > * BITOP_BITS_PER_WORD
> > > > > > * BITOP_MASK
> > > > > > * BITOP_WORD
> > > > > > * BITOP_TYPE
> > > > > > 
> > > > > > These functions and macros can be useful for architectures
> > > > > > that don't have corresponding arch-specific instructions.
> > > > > 
> > > > > Logically this paragraph may better move ahead of the BITOP_*
> > > > > one.
> > > > > 
> > > > > > Because of that x86 has the following check in the macros
> > > > > > test_bit(),
> > > > > > __test_and_set_bit(), __test_and_clear_bit(),
> > > > > > __test_and_change_bit():
> > > > > >     if ( bitop_bad_size(addr) ) __bitop_bad_size();
> > > > > > It was necessary to make bitop bad size check generic too,
> > > > > > so
> > > > > > arch_check_bitop_size() was introduced.
> > > > > 
> > > > > Not anymore, with the most recent adjustments? There's
> > > > > nothing
> > > > > arch-
> > > > > specific anymore in the checking.
> > > > > 
> > > > > > @@ -183,7 +180,7 @@ static inline int test_and_set_bit(int
> > > > > > nr,
> > > > > > volatile void *addr)
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > -static inline int __test_and_set_bit(int nr, void *addr)
> > > > > > +static inline int arch__test_and_set_bit(int nr, volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > I think I raised this point before: Why arch__ here, ...
> > > > > 
> > > > > > @@ -232,7 +226,7 @@ static inline int
> > > > > > test_and_clear_bit(int
> > > > > > nr,
> > > > > > volatile void *addr)
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > -static inline int __test_and_clear_bit(int nr, void *addr)
> > > > > > +static inline int arch__test_and_clear_bit(int nr,
> > > > > > volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > ... here, ...
> > > > > 
> > > > > > @@ -243,13 +237,10 @@ static inline int
> > > > > > __test_and_clear_bit(int
> > > > > > nr, void *addr)
> > > > > >  
> > > > > >      return oldbit;
> > > > > >  }
> > > > > > -#define __test_and_clear_bit(nr, addr) ({               \
> > > > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > > > -    __test_and_clear_bit(nr, addr);                     \
> > > > > > -})
> > > > > > +#define arch__test_and_clear_bit arch__test_and_clear_bit
> > > > > >  
> > > > > >  /* WARNING: non atomic and it can be reordered! */
> > > > > > -static inline int __test_and_change_bit(int nr, void
> > > > > > *addr)
> > > > > > +static inline int arch__test_and_change_bit(int nr,
> > > > > > volatile
> > > > > > void
> > > > > > *addr)
> > > > > 
> > > > > ... and here, while ...
> > > > > 
> > > > > > @@ -307,8 +295,7 @@ static inline int variable_test_bit(int
> > > > > > nr,
> > > > > > const volatile void *addr)
> > > > > >      return oldbit;
> > > > > >  }
> > > > > >  
> > > > > > -#define test_bit(nr, addr) ({                           \
> > > > > > -    if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> > > > > > +#define arch_test_bit(nr, addr) ({                      \
> > > > > 
> > > > > ... just arch_ here? I don't like the double underscore
> > > > > infixes
> > > > > very
> > > > > much, but I'm okay with them as long as they're applied
> > > > > consistently.
> > > > 
> > > > Common code and x86 use __test_and_clear_bit(), and this patch
> > > > provides
> > > > a generic version of __test_and_clear_bit(). To emphasize that
> > > > generic__test_and_clear_bit() is a common implementation of
> > > > __test_and_clear_bit(), the double underscore was retained.
> > > > Also,
> > > > test_and_clear_bit() exists and if one day it will be needed to
> > > > provide
> > > > a generic version of it, then it will be needed to have
> > > > generic__test_and_clear_bit() and generic_test_and_clear_bit()
> > > > 
> > > > A similar logic was chosen for test_bit.
> > > 
> > > Right, but in all of your reply arch_ doesn't appear at all.
> > I am a little confused here. According to my logic, should it be
> > arch___test_and_set_bit() and generic___test_and_set_bit()?
> 
> Why 3 underscores in a row? I'm clearly not following.
> 
> > If you are asking why there is no generic implementation for
> > test_and_clear_bit() without the double underscores, the reason is
> > that
> > Arm, PPC, and x86 don't use generic code and rely on specific
> > instructions for this operation. Therefore, I didn't see much sense
> > in
> > providing a generic version of test_and_clear_bit(), at least, for
> > now.
> 
> No, there was no question in that direction. And hence ...
> 
> > >  Yet the
> > > question was: Why then not arch__test_bit(), to match the other
> > > arch
> > > helpers?
> > Because no one uses __test_bit(). Everywhere is used test_bit().
> 
> ... this seems unrelated (constrained by my earlier lack of following
> you).
> 
> (Later) Wait, maybe I've finally figured it: You use
> arch__test_and_*()
> because those underlie __test_and_*(), but arch_test_bit() because
> there's
> solely test_bit() (same for the generic_* naming).
Yes, that what I meant.

>  I guess I can accept
> that then, despite the slight anomaly you point out, resulting in the
> question towards 3 underscores in a row. To clarify, my thinking was
> more
> towards there not possibly being generic forms of test_and_*() (i.e.
> no
> possible set of arch_test_and_*() or generic_test_and_*()), thus
> using
> double inner underscores in arch__test_*() and generic__test_*() to
> signify that those are purely internal functions, which aren't
> supposed to
> be called directly.
I understand your point regarding functions that start with "__".
For example, __test_and_clear_bit() is used not only internally (in
terms of architecture code) but also in common code, so it is not
strictly internal. I may have misunderstood what "internal function"
means in this context.

I thought that, at least for bit operations, "__bit_operation" means
that the bit operation is non-atomic and can be reordered, which
implies that it's not a good idea to use it in common code without
additional steps.

Anyway, I am not sure I understand which approach I should use in this
patch. You mentioned that possibly test_and_() can't have a generic
form, meaning it won't be a set of arch_test_and_() functions.

So, can I rename arch__test_() and generic__test_() to arch_test_() and
generic_test_(), respectively, and use the renamed functions in
_test_and*() in xen/bitops.h? Is my understanding correct?
If my understanding is correct, I am happy to apply mentioned changes
in the next patch version.

~ Oleksii


> 
> Jan



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-16 10:34             ` Oleksii K.
@ 2024-05-16 10:49               ` Jan Beulich
  2024-05-17  8:42                 ` Oleksii K.
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2024-05-16 10:49 UTC (permalink / raw)
  To: Oleksii K.
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On 16.05.2024 12:34, Oleksii K. wrote:
> On Thu, 2024-05-16 at 09:04 +0200, Jan Beulich wrote:
>> (Later) Wait, maybe I've finally figured it: You use
>> arch__test_and_*()
>> because those underlie __test_and_*(), but arch_test_bit() because
>> there's
>> solely test_bit() (same for the generic_* naming).
> Yes, that what I meant.
> 
>>  I guess I can accept
>> that then, despite the slight anomaly you point out, resulting in the
>> question towards 3 underscores in a row. To clarify, my thinking was
>> more
>> towards there not possibly being generic forms of test_and_*() (i.e.
>> no
>> possible set of arch_test_and_*() or generic_test_and_*()), thus
>> using
>> double inner underscores in arch__test_*() and generic__test_*() to
>> signify that those are purely internal functions, which aren't
>> supposed to
>> be called directly.
> I understand your point regarding functions that start with "__".
> For example, __test_and_clear_bit() is used not only internally (in
> terms of architecture code) but also in common code, so it is not
> strictly internal. I may have misunderstood what "internal function"
> means in this context.
> 
> I thought that, at least for bit operations, "__bit_operation" means
> that the bit operation is non-atomic and can be reordered, which
> implies that it's not a good idea to use it in common code without
> additional steps.

Correct, up to the comma; those may very well be used in common code,
provided non-atomic forms indeed suffice. But in my reply I didn't talk
about double-underscore-prefixes in names of involved functions. I
talked about inner double underscores.

> Anyway, I am not sure I understand which approach I should use in this
> patch. You mentioned that possibly test_and_() can't have a generic
> form, meaning it won't be a set of arch_test_and_() functions.
> 
> So, can I rename arch__test_() and generic__test_() to arch_test_() and
> generic_test_(), respectively, and use the renamed functions in
> _test_and*() in xen/bitops.h? Is my understanding correct?

You could. You could also stick to what you have now - as said, I can
accept that with the worked out explanation. Or you could switch to
using arch__test_bit() and generic__test_bit(), thus having the double
inner underscores identify "internal to the implementation" functions.
My preference would be in backwards order of what I have just enumerated
as possible options. I wonder whether really no-one else has any opinion
here ...

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit()
  2024-05-16 10:49               ` Jan Beulich
@ 2024-05-17  8:42                 ` Oleksii K.
  0 siblings, 0 replies; 36+ messages in thread
From: Oleksii K. @ 2024-05-17  8:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné, Ross Lagerwall, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Shawn Anastasio, xen-devel

On Thu, 2024-05-16 at 12:49 +0200, Jan Beulich wrote:
> > Anyway, I am not sure I understand which approach I should use in
> > this
> > patch. You mentioned that possibly test_and_() can't have a generic
> > form, meaning it won't be a set of arch_test_and_() functions.
> > 
> > So, can I rename arch__test_() and generic__test_() to arch_test_()
> > and
> > generic_test_(), respectively, and use the renamed functions in
> > _test_and*() in xen/bitops.h? Is my understanding correct?
> 
> You could. You could also stick to what you have now - as said, I can
> accept that with the worked out explanation. Or you could switch to
> using arch__test_bit() and generic__test_bit(), thus having the
> double
> inner underscores identify "internal to the implementation"
> functions.
> My preference would be in backwards order of what I have just
> enumerated
> as possible options. I wonder whether really no-one else has any
> opinion
> here ...

I see that __test_bit() doesn't exist now and perhaps won't exist at
all, but in this patch we are providing the generic for test_bit(), not
__test_bit(). Thereby according to provided by me naming for test_bit()
should be defined using {generic, arch}_test_bit().

~ Oleksii




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-15  9:09   ` Jan Beulich
  2024-05-15 13:55     ` Oleksii K.
@ 2024-05-17  9:06     ` Oleksii K.
  2024-05-17  9:52       ` Jan Beulich
  1 sibling, 1 reply; 36+ messages in thread
From: Oleksii K. @ 2024-05-17  9:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On Wed, 2024-05-15 at 11:09 +0200, Jan Beulich wrote:
> But this then needs carrying through to ...
> 
> > --- a/xen/arch/arm/include/asm/arm64/bitops.h
> > +++ b/xen/arch/arm/include/asm/arm64/bitops.h
> > @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long
> > __ffs(unsigned long word)
> >    */
> >   #define ffz(x)  __ffs(~(x))
> >   
> > -static inline int flsl(unsigned long x)
> > +static inline int arch_flsl(unsigned long x)
> 
> ... e.g. here. You don't want to introduce signed/unsigned
> mismatches.
Is it critical for x86 to return int for flsl() and fls() or I can
update the return type for x86 too?

   static always_inline int arch_flsl(unsigned long x)
   {
       long r;
   
       asm ( "bsr %1,%0\n\t"
             "jnz 1f\n\t"
             "mov $-1,%0\n"
             "1:" : "=r" (r) : "rm" (x));
       return (int)r+1;
   }
   #define arch_flsl arch_flsl
   
   static always_inline int arch_fls(unsigned int x)
   {
       int r;
   
       asm ( "bsr %1,%0\n\t"
             "jnz 1f\n\t"
             "mov $-1,%0\n"
             "1:" : "=r" (r) : "rm" (x));
       return r + 1;
   }
   #define arch_fls arch_fls

Any specific reason why 'long' and 'int' types for r are used?

~ Oleksii


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic
  2024-05-17  9:06     ` Oleksii K.
@ 2024-05-17  9:52       ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2024-05-17  9:52 UTC (permalink / raw)
  To: Oleksii K.
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Shawn Anastasio,
	Roger Pau Monné, xen-devel

On 17.05.2024 11:06, Oleksii K. wrote:
> On Wed, 2024-05-15 at 11:09 +0200, Jan Beulich wrote:
>> But this then needs carrying through to ...
>>
>>> --- a/xen/arch/arm/include/asm/arm64/bitops.h
>>> +++ b/xen/arch/arm/include/asm/arm64/bitops.h
>>> @@ -22,17 +22,15 @@ static /*__*/always_inline unsigned long
>>> __ffs(unsigned long word)
>>>    */
>>>   #define ffz(x)  __ffs(~(x))
>>>   
>>> -static inline int flsl(unsigned long x)
>>> +static inline int arch_flsl(unsigned long x)
>>
>> ... e.g. here. You don't want to introduce signed/unsigned
>> mismatches.
> Is it critical for x86 to return int for flsl() and fls() or I can
> update the return type for x86 too?

The return types ought to be changed imo, for everything to end up
consistent.

>    static always_inline int arch_flsl(unsigned long x)
>    {
>        long r;
>    
>        asm ( "bsr %1,%0\n\t"
>              "jnz 1f\n\t"
>              "mov $-1,%0\n"
>              "1:" : "=r" (r) : "rm" (x));
>        return (int)r+1;
>    }
>    #define arch_flsl arch_flsl
>    
>    static always_inline int arch_fls(unsigned int x)
>    {
>        int r;
>    
>        asm ( "bsr %1,%0\n\t"
>              "jnz 1f\n\t"
>              "mov $-1,%0\n"
>              "1:" : "=r" (r) : "rm" (x));
>        return r + 1;
>    }
>    #define arch_fls arch_fls
> 
> Any specific reason why 'long' and 'int' types for r are used?

I don't think so. I expect it merely fits with no-one having cared about
signedness back at the time.

Jan


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-05-17  9:52 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-06 10:15 [PATCH v9 00/15] Enable build of full Xen for RISC-V Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 01/15] xen/riscv: disable unnecessary configs Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 02/15] xen: introduce generic non-atomic test_*bit() Oleksii Kurochko
2024-05-15  8:52   ` Jan Beulich
2024-05-15 15:29     ` Oleksii K.
2024-05-15 15:41       ` Jan Beulich
2024-05-15 17:03         ` Oleksii K.
2024-05-16  7:04           ` Jan Beulich
2024-05-16 10:34             ` Oleksii K.
2024-05-16 10:49               ` Jan Beulich
2024-05-17  8:42                 ` Oleksii K.
2024-05-06 10:15 ` [PATCH v9 03/15] xen/bitops: implement fls{l}() in common logic Oleksii Kurochko
2024-05-15  9:09   ` Jan Beulich
2024-05-15 13:55     ` Oleksii K.
2024-05-15 14:07       ` Jan Beulich
2024-05-15 15:31         ` Oleksii K.
2024-05-17  9:06     ` Oleksii K.
2024-05-17  9:52       ` Jan Beulich
2024-05-06 10:15 ` [PATCH v9 04/15] xen/bitops: put __ffs() into linux compatible header Oleksii Kurochko
2024-05-15 14:34   ` Michal Orzel
2024-05-15 15:08   ` Rahul Singh
2024-05-06 10:15 ` [PATCH v9 05/15] xen/riscv: introduce bitops.h Oleksii Kurochko
2024-05-15  9:29   ` Jan Beulich
2024-05-06 10:15 ` [PATCH v9 06/15] xen/riscv: introduce cmpxchg.h Oleksii Kurochko
2024-05-15  9:38   ` Jan Beulich
2024-05-06 10:15 ` [PATCH v9 07/15] xen/riscv: introduce atomic.h Oleksii Kurochko
2024-05-15  9:49   ` Jan Beulich
2024-05-15 13:59     ` Oleksii K.
2024-05-06 10:15 ` [PATCH v9 08/15] xen/riscv: introduce monitor.h Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 09/15] xen/riscv: add definition of __read_mostly Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 10/15] xen/riscv: add required things to current.h Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 11/15] xen/riscv: add minimal stuff to mm.h to build full Xen Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 12/15] xen/riscv: introduce vm_event_*() functions Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 13/15] xen/riscv: add minimal amount of stubs to build full Xen Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 14/15] xen/riscv: enable full Xen build Oleksii Kurochko
2024-05-06 10:15 ` [PATCH v9 15/15] xen/README: add compiler and binutils versions for RISC-V64 Oleksii Kurochko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.