* Re: [PATCH v13 12/27] drm/i915/dp: Add YCBCR444 handling for sink formats
From: Ville Syrjälä @ 2026-04-13 19:08 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc
In-Reply-To: <20260413-color-format-v13-12-ab37d4dfba48@collabora.com>
On Mon, Apr 13, 2026 at 12:07:26PM +0200, Nicolas Frattaroli wrote:
> In anticipation of userspace being able to explicitly select supported
> sink formats, add handling of the YCBCR444 sink format. The AUTO path
> does not choose this format, but with explicit format selection added to
> the driver, it becomes a possibility.
>
> Check for sink support of YCBCR444 to intel_dp_sink_format_valid.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/gpu/drm/i915/display/intel_dp.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
> index 35b8fb5740aa..47bd3d59ea93 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -1364,6 +1364,11 @@ intel_dp_sink_format_valid(struct intel_connector *connector,
>
> return MODE_OK;
> case INTEL_OUTPUT_FORMAT_RGB:
> + return MODE_OK;
> + case INTEL_OUTPUT_FORMAT_YCBCR444:
> + if (!(info->color_formats & BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444)))
> + return MODE_BAD;
The DP situation is a lot more more fuzzy than the HDMI situation
due to the PCON stuff. So I'm not quite sure what we should do here.
At the very least I think we want the equivalent of
intel_dp_can_ycbcr420() for 444, and the same intel_dp_has_hdmi_sink()
check that we have for 420.
> +
> return MODE_OK;
> default:
> MISSING_CASE(sink_format);
>
> --
> 2.53.0
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: [PATCH v13 13/27] drm/i915: Implement the "color format" DRM property
From: Ville Syrjälä @ 2026-04-13 19:21 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc
In-Reply-To: <20260413-color-format-v13-13-ab37d4dfba48@collabora.com>
On Mon, Apr 13, 2026 at 12:07:27PM +0200, Nicolas Frattaroli wrote:
> Implement the "color format" DRM property for both DP and HDMI. The
> values of the property include RGB, YCbCr420, YCbCr444 and Auto. Auto
> will pick RGB, with a fallback to YCbCr420.
>
> The mask of supported formats by the source exposed by the property is
> an optimistic scenario, as specific DFP-related caveats can't be
> established before an EDID is present.
>
> Should the explicitly requested color format not be supported by the
> sink (or by the source in combination with the sink), then an error is
> returned to userspace, so that it can make a better choice.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/gpu/drm/i915/display/intel_connector.c | 10 +++++++
> drivers/gpu/drm/i915/display/intel_connector.h | 1 +
> drivers/gpu/drm/i915/display/intel_dp.c | 38 +++++++++++++++++++++++---
> drivers/gpu/drm/i915/display/intel_hdmi.c | 38 +++++++++++++++++++++++---
> 4 files changed, 79 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_connector.c b/drivers/gpu/drm/i915/display/intel_connector.c
> index 7ef9338d67ab..b1a21dd77af6 100644
> --- a/drivers/gpu/drm/i915/display/intel_connector.c
> +++ b/drivers/gpu/drm/i915/display/intel_connector.c
> @@ -338,3 +338,13 @@ intel_attach_scaling_mode_property(struct drm_connector *connector)
>
> connector->state->scaling_mode = DRM_MODE_SCALE_ASPECT;
> }
> +
> +void
> +intel_attach_color_format_property(struct drm_connector *connector)
> +{
> + const unsigned long fmts = BIT(DRM_OUTPUT_COLOR_FORMAT_RGB444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR420);
We're going to need different formats for different platforms, and
for DP vs. HDMI.
For HDMI it should be fairly simple if we have the
ycbcr420_allowed and ycbcr444_allowed things to consult.
For DP I'm not sure if we want to advertise YCbCr output support
for platforms that can't produce it without help from the PCON.
If we know there is an on board PCON that can do it, then the answer
is probably yes. But without that it might be best to not advertise
the relevant formats unless source_can_output() tells us that it can
be directly output. We could at least start with that, and revisit
it later if some situations arise where eg. having explicit 4:2:0
output on older platforms is beneficial.
I think you want to split this to separate DP vs. HDMI patches since
the two require quite different logic.
> +
> + drm_connector_attach_color_format_property(connector, fmts);
> +}
> diff --git a/drivers/gpu/drm/i915/display/intel_connector.h b/drivers/gpu/drm/i915/display/intel_connector.h
> index 0aa86626e646..c77b7aac02cb 100644
> --- a/drivers/gpu/drm/i915/display/intel_connector.h
> +++ b/drivers/gpu/drm/i915/display/intel_connector.h
> @@ -34,5 +34,6 @@ void intel_attach_dp_colorspace_property(struct drm_connector *connector);
> void intel_attach_scaling_mode_property(struct drm_connector *connector);
> void intel_connector_queue_modeset_retry_work(struct intel_connector *connector);
> void intel_connector_cancel_modeset_retry_work(struct intel_connector *connector);
> +void intel_attach_color_format_property(struct drm_connector *connector);
>
> #endif /* __INTEL_CONNECTOR_H__ */
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
> index 47bd3d59ea93..3b2293415b55 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -3398,10 +3398,10 @@ intel_dp_compute_output_format(struct intel_encoder *encoder,
> }
>
> static int
> -intel_dp_compute_formats(struct intel_encoder *encoder,
> - struct intel_crtc_state *crtc_state,
> - struct drm_connector_state *conn_state,
> - bool respect_downstream_limits)
> +intel_dp_compute_formats_auto(struct intel_encoder *encoder,
> + struct intel_crtc_state *crtc_state,
> + struct drm_connector_state *conn_state,
> + bool respect_downstream_limits)
> {
> struct intel_display *display = to_intel_display(encoder);
> struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
> @@ -3437,6 +3437,34 @@ intel_dp_compute_formats(struct intel_encoder *encoder,
> return ret;
> }
>
> +static int
> +intel_dp_compute_formats(struct intel_encoder *encoder,
> + struct intel_crtc_state *crtc_state,
> + struct drm_connector_state *conn_state,
> + bool respect_downstream_limits)
> +{
> + switch (conn_state->color_format) {
> + case DRM_CONNECTOR_COLOR_FORMAT_RGB444:
> + return intel_dp_compute_output_format(encoder, crtc_state, conn_state,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_RGB);
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR444:
> + return intel_dp_compute_output_format(encoder, crtc_state, conn_state,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_YCBCR444);
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR420:
> + return intel_dp_compute_output_format(encoder, crtc_state, conn_state,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_YCBCR420);
> + case DRM_CONNECTOR_COLOR_FORMAT_AUTO:
> + return intel_dp_compute_formats_auto(encoder, crtc_state, conn_state,
> + respect_downstream_limits);
> + default:
> + MISSING_CASE(conn_state->color_format);
> + return -EINVAL;
> + }
> +}
> +
> void
> intel_dp_audio_compute_config(struct intel_encoder *encoder,
> struct intel_crtc_state *pipe_config,
> @@ -7025,6 +7053,8 @@ intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *_connec
>
> if (HAS_VRR(display))
> drm_connector_attach_vrr_capable_property(&connector->base);
> +
> + intel_attach_color_format_property(&connector->base);
> }
>
> static void
> diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c b/drivers/gpu/drm/i915/display/intel_hdmi.c
> index 5ab5b5f85cde..632498e3702b 100644
> --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> @@ -2307,10 +2307,10 @@ static int intel_hdmi_compute_output_format(struct intel_encoder *encoder,
> return intel_hdmi_compute_clock(encoder, crtc_state, respect_downstream_limits);
> }
>
> -static int intel_hdmi_compute_formats(struct intel_encoder *encoder,
> - struct intel_crtc_state *crtc_state,
> - const struct drm_connector_state *conn_state,
> - bool respect_downstream_limits)
> +static int intel_hdmi_compute_formats_auto(struct intel_encoder *encoder,
> + struct intel_crtc_state *crtc_state,
> + const struct drm_connector_state *conn_state,
> + bool respect_downstream_limits)
> {
> struct intel_display *display = to_intel_display(encoder);
> struct intel_connector *connector = to_intel_connector(conn_state->connector);
> @@ -2345,6 +2345,35 @@ static int intel_hdmi_compute_formats(struct intel_encoder *encoder,
> return ret;
> }
>
> +static int intel_hdmi_compute_formats(struct intel_encoder *encoder,
> + struct intel_crtc_state *crtc_state,
> + const struct drm_connector_state *conn_state,
> + bool respect_downstream_limits)
> +{
> + struct intel_connector *connector = to_intel_connector(conn_state->connector);
> +
> + switch (conn_state->color_format) {
> + case DRM_CONNECTOR_COLOR_FORMAT_RGB444:
> + return intel_hdmi_compute_output_format(encoder, crtc_state, connector,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_RGB);
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR444:
> + return intel_hdmi_compute_output_format(encoder, crtc_state, connector,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_YCBCR444);
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR420:
> + return intel_hdmi_compute_output_format(encoder, crtc_state, connector,
> + respect_downstream_limits,
> + INTEL_OUTPUT_FORMAT_YCBCR420);
> + case DRM_CONNECTOR_COLOR_FORMAT_AUTO:
> + return intel_hdmi_compute_formats_auto(encoder, crtc_state, conn_state,
> + respect_downstream_limits);
> + default:
> + MISSING_CASE(conn_state->color_format);
> + return -EINVAL;
> + }
> +}
> +
> static bool intel_hdmi_is_cloned(const struct intel_crtc_state *crtc_state)
> {
> return crtc_state->uapi.encoder_mask &&
> @@ -2729,6 +2758,7 @@ intel_hdmi_add_properties(struct intel_hdmi *intel_hdmi, struct drm_connector *_
>
> intel_attach_hdmi_colorspace_property(&connector->base);
> drm_connector_attach_content_type_property(&connector->base);
> + intel_attach_color_format_property(&connector->base);
>
> if (DISPLAY_VER(display) >= 10)
> drm_connector_attach_hdr_output_metadata_property(&connector->base);
>
> --
> 2.53.0
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: [PATCH v2 01/12] sched/isolation: Separate housekeeping types in enum hk_type
From: Waiman Long @ 2026-04-13 19:25 UTC (permalink / raw)
To: Qiliang Yuan, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Paul E. McKenney,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Anna-Maria Behnsen, Ingo Molnar,
Thomas Gleixner, Tejun Heo, Andrew Morton, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest
In-Reply-To: <20260413-wujing-dhm-v2-1-06df21caba5d@gmail.com>
On 4/13/26 3:43 AM, Qiliang Yuan wrote:
> Most kernel noise types (TICK, TIMER, RCU, etc.) are currently aliased
> to a single HK_TYPE_KERNEL_NOISE enum value. This prevents fine-grained
> runtime isolation control as all masks are forced to be identical.
>
> Un-alias service-specific housekeeping types in enum hk_type. This
> separation provides the necessary granularity for DHM subsystems to
> subscribe to and maintain independent affinity masks.
Usually, if we want to run a latency sensitive workload like DPDK, we
try to minimize all sorts of kernel noises or interference as much as
possible. Do you have a good use case where it is advantageous to remove
some types of kernel noises from a given set of CPUs but not the others?
Cheers,
Longman
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
> include/linux/sched/isolation.h | 20 ++++++++------------
> kernel/sched/isolation.c | 10 +++++++++-
> 2 files changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index dc3975ff1b2e1..b9a041247565c 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -17,21 +17,17 @@ enum hk_type {
> /* Inverse of boot-time isolcpus=managed_irq argument */
> HK_TYPE_MANAGED_IRQ,
> /* Inverse of boot-time nohz_full= or isolcpus=nohz arguments */
> - HK_TYPE_KERNEL_NOISE,
> + HK_TYPE_TICK,
> + HK_TYPE_TIMER,
> + HK_TYPE_RCU,
> + HK_TYPE_MISC,
> + HK_TYPE_WQ,
> + HK_TYPE_KTHREAD,
> HK_TYPE_MAX,
> -
> - /*
> - * The following housekeeping types are only set by the nohz_full
> - * boot commandline option. So they can share the same value.
> - */
> - HK_TYPE_TICK = HK_TYPE_KERNEL_NOISE,
> - HK_TYPE_TIMER = HK_TYPE_KERNEL_NOISE,
> - HK_TYPE_RCU = HK_TYPE_KERNEL_NOISE,
> - HK_TYPE_MISC = HK_TYPE_KERNEL_NOISE,
> - HK_TYPE_WQ = HK_TYPE_KERNEL_NOISE,
> - HK_TYPE_KTHREAD = HK_TYPE_KERNEL_NOISE
> };
>
> +#define HK_TYPE_KERNEL_NOISE HK_TYPE_TICK
> +
> #ifdef CONFIG_CPU_ISOLATION
> DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
> extern int housekeeping_any_cpu(enum hk_type type);
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index ef152d401fe20..e05ed5118e651 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -15,9 +15,17 @@ enum hk_flags {
> HK_FLAG_DOMAIN_BOOT = BIT(HK_TYPE_DOMAIN_BOOT),
> HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN),
> HK_FLAG_MANAGED_IRQ = BIT(HK_TYPE_MANAGED_IRQ),
> - HK_FLAG_KERNEL_NOISE = BIT(HK_TYPE_KERNEL_NOISE),
> + HK_FLAG_TICK = BIT(HK_TYPE_TICK),
> + HK_FLAG_TIMER = BIT(HK_TYPE_TIMER),
> + HK_FLAG_RCU = BIT(HK_TYPE_RCU),
> + HK_FLAG_MISC = BIT(HK_TYPE_MISC),
> + HK_FLAG_WQ = BIT(HK_TYPE_WQ),
> + HK_FLAG_KTHREAD = BIT(HK_TYPE_KTHREAD),
> };
>
> +#define HK_FLAG_KERNEL_NOISE (HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | \
> + HK_FLAG_MISC | HK_FLAG_WQ | HK_FLAG_KTHREAD)
> +
> DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
> EXPORT_SYMBOL_GPL(housekeeping_overridden);
>
>
^ permalink raw reply
* [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: syzbot ci @ 2026-04-13 19:49 UTC (permalink / raw)
To: andrew, ast, bpf, corbet, daniel, davem, edumazet, frederic, hawk,
horms, j.koeppeler, jhs, jiri, john.fastabend, kernel-team,
krikku, kuba, kuniyu, linux-doc, linux-kernel, linux-kselftest,
netdev, pabeni, sdf, shuah, skhan, yajun.deng
Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260413094442.1376022-1-hawk@kernel.org>
syzbot ci has tested the following series
[v2] veth: add Byte Queue Limits (BQL) support
https://lore.kernel.org/all/20260413094442.1376022-1-hawk@kernel.org
* [PATCH net-next v2 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices
* [PATCH net-next v2 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction
* [PATCH net-next v2 3/5] veth: add tx_timeout watchdog as BQL safety net
* [PATCH net-next v2 4/5] net: sched: add timeout count to NETDEV WATCHDOG message
* [PATCH net-next v2 5/5] selftests: net: add veth BQL stress test
and found the following issue:
WARNING in veth_napi_del_range
Full report is available here:
https://ci.syzbot.org/series/ee732006-8545-4abd-a105-b4b1592a7baf
***
WARNING in veth_napi_del_range
tree: net-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base: 8806d502e0a7e7d895b74afbd24e8550a65a2b17
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/90743a26-f003-44cf-abcc-5991c47588b2/config
syz repro: https://ci.syzbot.org/findings/d068bfb2-9f8b-466a-95b4-cd7e7b00006c/syz_repro
------------[ cut here ]------------
index >= dev->num_tx_queues
WARNING: ./include/linux/netdevice.h:2672 at netdev_get_tx_queue include/linux/netdevice.h:2672 [inline], CPU#0: syz.1.27/6002
WARNING: ./include/linux/netdevice.h:2672 at veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142, CPU#0: syz.1.27/6002
Modules linked in:
CPU: 0 UID: 0 PID: 6002 Comm: syz.1.27 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:netdev_get_tx_queue include/linux/netdevice.h:2672 [inline]
RIP: 0010:veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142
Code: 00 e8 ad 96 69 fe 44 39 6c 24 10 74 5e e8 41 61 44 fb 41 ff c5 49 bc 00 00 00 00 00 fc ff df e9 6d ff ff ff e8 2a 61 44 fb 90 <0f> 0b 90 42 80 3c 23 00 75 8e eb 94 48 8b 0c 24 80 e1 07 80 c1 03
RSP: 0018:ffffc90003adf918 EFLAGS: 00010293
RAX: ffffffff86814ec6 RBX: 1ffff110227a6c03 RCX: ffff888103a857c0
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
RBP: 1ffff110227a6c9a R08: ffff888113f01ab7 R09: 0000000000000000
R10: ffff888113f01a98 R11: ffffed10227e0357 R12: dffffc0000000000
R13: 0000000000000002 R14: 0000000000000002 R15: ffff888113d36018
FS: 000055555ea16500(0000) GS:ffff88818de4a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc287456b8 CR3: 000000010cdd0000 CR4: 00000000000006f0
Call Trace:
<TASK>
veth_napi_del drivers/net/veth.c:1153 [inline]
veth_disable_xdp+0x1b0/0x310 drivers/net/veth.c:1255
veth_xdp_set drivers/net/veth.c:1693 [inline]
veth_xdp+0x48e/0x730 drivers/net/veth.c:1717
dev_xdp_propagate+0x125/0x260 net/core/dev_api.c:348
bond_xdp_set drivers/net/bonding/bond_main.c:5715 [inline]
bond_xdp+0x3ca/0x830 drivers/net/bonding/bond_main.c:5761
dev_xdp_install+0x42c/0x600 net/core/dev.c:10387
dev_xdp_detach_link net/core/dev.c:10579 [inline]
bpf_xdp_link_release+0x362/0x540 net/core/dev.c:10595
bpf_link_free+0x103/0x480 kernel/bpf/syscall.c:3292
bpf_link_put_direct kernel/bpf/syscall.c:3344 [inline]
bpf_link_release+0x6b/0x80 kernel/bpf/syscall.c:3351
__fput+0x44f/0xa70 fs/file_table.c:469
task_work_run+0x1d9/0x270 kernel/task_work.c:233
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
__exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
exit_to_user_mode_loop+0xed/0x480 kernel/entry/common.c:98
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5bda39c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdca2969e8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00007f5bda617da0 RCX: 00007f5bda39c819
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007f5bda617da0 R08: 00007f5bda616128 R09: 0000000000000000
R10: 000000000003fd78 R11: 0000000000000246 R12: 0000000000010fb8
R13: 00007f5bda61609c R14: 0000000000010cdd R15: 00007ffdca296af0
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply
* Re: [PATCH v7 21/22] x86/virt/tdx: Document TDX module update
From: Edgecombe, Rick P @ 2026-04-13 19:54 UTC (permalink / raw)
To: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Gao, Chao
Cc: corbet@lwn.net, Li, Xiaoyao, Huang, Kai, Zhao, Yan Y,
dave.hansen@linux.intel.com, kas@kernel.org, seanjc@google.com,
binbin.wu@linux.intel.com, pbonzini@redhat.com, Chatre, Reinette,
Verma, Vishal L, nik.borisov@suse.com, mingo@redhat.com,
Weiny, Ira, skhan@linuxfoundation.org,
tony.lindgren@linux.intel.com, Annapurve, Vishal,
sagis@google.com, hpa@zytor.com, tglx@kernel.org,
paulmck@kernel.org, bp@alien8.de, yilun.xu@linux.intel.com,
dan.j.williams@intel.com, x86@kernel.org
In-Reply-To: <20260331124214.117808-22-chao.gao@intel.com>
On Tue, 2026-03-31 at 05:41 -0700, Chao Gao wrote:
> Document TDX module update as a subsection of "TDX Host Kernel Support" to
> provide background information and cover key points that developers and
> users may need to know, for example:
>
> - update is done in stop_machine() context
> - update instructions and results
> - update policy and tooling
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
> v5:
> - use "update" when refer to the update feature/concept [Kai]
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
^ permalink raw reply
* htmldocs: Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst:1: WARNING: Title overline too short.
From: kernel test robot @ 2026-04-13 19:56 UTC (permalink / raw)
To: Linlin Zhang; +Cc: oe-kbuild-all, 0day robot, linux-doc
tree: https://github.com/intel-lab-lkp/linux/commits/Linlin-Zhang/block-export-blk-crypto-symbols-required-by-dm-inlinecrypt/20260413-212518
head: 9b0494c109a48c2e2a286f44e61f2f5dbf35b31d
commit: 9b0494c109a48c2e2a286f44e61f2f5dbf35b31d dm: add documentation for dm-inlinecrypt target
date: 6 hours ago
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260413/202604132146.Iz4XtlRm-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604132146.Iz4XtlRm-lkp@intel.com/
All warnings (new ones prefixed by >>):
Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,\b(\S*)(Documentation/[A-Za-z0-9
Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: Documentation/devicetree/dt-object-internal.txt
Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,^Documentation/scheduler/sched-pelt
Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,(Documentation/translations/[
Using alabaster theme
>> Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst:1: WARNING: Title overline too short.
--
Documentation/userspace-api/landlock:526: ./include/uapi/linux/landlock.h:45: ERROR: Unknown target name: "network flags". [docutils]
Documentation/userspace-api/landlock:526: ./include/uapi/linux/landlock.h:50: ERROR: Unknown target name: "scope flags". [docutils]
Documentation/userspace-api/landlock:526: ./include/uapi/linux/landlock.h:24: ERROR: Unknown target name: "filesystem flags". [docutils]
Documentation/userspace-api/landlock:535: ./include/uapi/linux/landlock.h:166: ERROR: Unknown target name: "filesystem flags". [docutils]
Documentation/userspace-api/landlock:535: ./include/uapi/linux/landlock.h:189: ERROR: Unknown target name: "network flags". [docutils]
>> Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst: WARNING: document isn't included in any toctree [toc.not_included]
Documentation/networking/skbuff:36: ./include/linux/skbuff.h:181: WARNING: Failed to create a cross reference. A title or caption not found: 'crc' [ref.ref]
vim +1 Documentation/admin-guide/device-mapper/dm-inlinecrypt.rst
> 1 ========
2 dm-inlinecrypt
3 ========
4
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-13 20:04 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Eliot Courtney, Danilo Krummrich, linux-kernel, Miguel Ojeda,
Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Dave Airlie,
Daniel Almeida, Koen Koning, dri-devel, rust-for-linux,
Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
Timur Tabi, Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHOKJ3MJNO5P.SXKOAYKX13JL@nvidia.com>
Hi Alex,
On 4/9/2026 6:56 AM, Alexandre Courbot wrote:
> On Thu Apr 9, 2026 at 5:19 AM JST, Joel Fernandes wrote:
>> Hi Alex, Eliot, Danilo,
>>
>> Thanks for taking a look. Let me respond to the specific points below.
>>
>> On Wed, 08 Apr 2026, Alexandre Courbot wrote:
>>> After a quick look I'd say that having a trait here would actually be
>>> *good* for correctness and maintainability.
>>>
>>> The current design implies that every operation on a page table (most
>>> likely using the walker) goes through a branching point. Just looking at
>>> `PtWalk::read_pte_at_level`, there are already at least 6
>>> `if version == 2 { } else { }` branches that all resolve to the same
>>> result. Include walking down the PDEs and you have at least a dozen of
>>> these just to resolve a virtual address. I know CPUs are fast, but this
>>> is still wasted cycles for no good reason.
>>
>> I did some measurements and there is no notieceable difference in both
>> approaches. I ran perf and loaded nova with self-tests running. The extra
>> potential branching is lost in the noise. In both cases, loading nova and
>> running the self-tests has ~119.7M branch instructions on my Ampere. The total
>> instruction count is also identical (~615M).
>
> That's expected - as I said, CPUs are fast - and that's also not my
> point. My issue is that we are doing countless tests that all resolve to
> the code path, a code path that is already known during probe time.
> That's a huge code smell.
>
> When we create the GPU, we know whether we will be using v2 or v3 page
> tables. That we need to test that again 12 times per address resolution
> is a design issue, irrespective of performance. There are 24 version
> match sites in patch 12 alone.
>
> And that's precisely a good justification for using monomorphization. v2
> and v3 are technically two different page table implementations (they
> even have their own distinct module in your series), we just use
> generics to factorize the (source) code a bit.
To repeat my point though, the extra tests don't complicate the code or change
performance. The bottleneck is here is MMIO, not CPU cycles (and arguably
testing shows no change in CPU cycles either). So I tend to prefer cleaner code
(which in my view is the few matches in pagetable.rs versus templatizing lots of
code). I do see your point of view though and I Ok with (trying out) the
alternative as well, and it seems to be what mostly everyone wants. It is a
little more boiler plate in higher layers which are not exposed to version
matching, but not the end of the world :).
>> I measured like this:
>> perf stat -e
>> branches,branch-misses,cache-references,cache-misses,instructions,cycles --
>> modprobe nova_core
>>
>> So I think the branching argument is not a strong one. I also did more
>> measurements and the dominant time taken is MMIO. During the map prep and
>> execute, page table walks are done. A TLB flush alone costs ~1.4 microseconds.
>> And PRAMIN BAR0 writes to write the PTE is also about 1 microsecond. Considering
>> this, I don't think the extra branching argument holds (even without branch
>> prediction and speculation).
>>
>> Also some branches cannot be eliminated even with parameterization:
>>
>> if level == self.mmu_version.dual_pde_level() {
>> // 128-bit dual PDE read
>> } else {
>> // Regular 64-bit PDE read
>> }
>>
>> This isn't really a version branch -- it's a structural branch that
>> distinguishes between 64-bit PDE and 128-bit dual PDE entries. Any MMU
>> version with a dual PDE level would need this same distinction.
>
> The dual PDE level should be an associated constant - you still need to
> do the test, but note that you would also do it if there was only a
> single page table version. It's orthogonal to whether we use a trait or
> not here.
Sure, fair enough.
>
>>
>> I also did code-generation size analysis (see diff of code used below):
>>
>> Code generation analysis:
>>
>> Module .ko size: Before: 511,792 bytes After: 524,464 bytes (+2.5%)
>> .text section: Before: 112,620 bytes After: 116,628 bytes (+4,008 bytes)
>>
>> The +4K .text growth is the monomorphization cost: every generic function
>> is compiled twice (once for MmuV2, once for MmuV3).
>
> I would say this is working as intended then.
2.5% is not much for 1 feature but overtime IMO it can add up and become
significant, I think we should keep an eye on size.
>>> If you use a trait here, and make `PtWalk` generic against it, you can
>>> optimize this away. We had a similar situation when we introduced Turing
>>> support and the v2 ucode header, and tried both approaches: the
>>> trait-based one was slightly shorter, and arguably more readable.
>>
>> Actually I was the one who suggested traits for Falcon ucode descriptor if you
>> see this thread [1]. So basically you and Eliot are telling me to do what I
>> suggested in [1]. :-) However, I disagree that it is the right choice for this code.
>>
>> [1] https://lore.kernel.org/all/20251117231028.GA1095236@joelbox2/
>>
>> I think the two cases are quite different in complexity:
>
> Exactly. The complexity is different (this one involves multiple traits
> and associated types) but the pattern is the same - and that's a pattern
> traits are designed to address. If we were supposed to stop applying it
> when things go beyond a certain level of complexity, the conceptors of
> Rust would not have bothered addings things like associated types.
>
> These traits are nothing new, they simply formalize a reality that
> already exists in your code, which is that each version of the page
> table needs to implement a given set of methods. It's already there with
> the version doing dispatches, only it is not articulated clearly to the
> reader. So in that respect, having traits make the code *more* readable
> imho.
Makes the lower layers more readable and the higher layers more complex. Me, I'd
have preferred to keep the version matching stuff to the lower layers /
centralized. Maybe there is still a way to do that, while still using traits.
>>
>> The falcon ucode descriptor is essentially a set of flat field accessors
>> and a few params (imem_sec_load_params, dmem_load_params).
>> The trait has ~10 simple getter methods. There's no multi-level hierarchy,
>> no walker, and no generic propagation.
>>
>> The MMU page table case is structurally different. Making PtWalk generic
>> over an Mmu trait would require:
>>
>> - PtWalk<M: Mmu> (the walker)
>> - Plus all the associated types: M::Pte, M::Pde, M::DualPde each
>> needing their own trait bounds
>>
>> And we would also need:
>> - Vmm<M: Mmu> (which creates PtWalk)
>> - BarUser<M: Mmu> (which creates Vmm)
>>
>> I am also against making Vmm an enum as Eliot suggested:
>> enum Vmm {
>> V2(VmmInner<MmuV2>),
>> V3(VmmInner<MmuV3>),
>> }
>>
>> That moves the version complexity up to the reader. Code complexity IMO should
>> decrease as we go up abstractions, making it easier for users (Vmm/Bar).
>>
>> If you look at the the changes in vmm.rs to handle version dispatch there [2]:
>> Added: +109
>> Removed: -28
>>
>> [2]
>> https://github.com/Edgeworth/linux/commit/3627af550b61256184d589e7ec666c1108971f0e
>>
>> The main benefit of my approach is version-specific dispatch complexity is
>> completely isolated inside MmuVersion thus making the code outside of
>> pagetable.rs much more readable, without having to parametrize anything, and
>> without code size increase. I think that is worth considering.
>>
>>> But the main argument to use a trait here IMO is that it enables
>>> associated types and constants. That's particularly critical since some
>>> equivalent fields have different lengths between v2 and v3. An
>>> associated `Bounded` type for these would force the caller to validate
>>> the length of these fields before calling a non-fallible operation,
>>> which is exactly the level of caution that we want when dealing with
>>> page tables.
>>
>> I think Bounded validation is orthogonal to the dispatch model.
>> We can add Bounded to the current design without restructuring
>> into traits. For example:
>>
>> // In ver2::Pte
>> pub fn new_vram(pfn: Bounded<Pfn, 25>, writable: bool) -> Self { ... }
>>
>> // In ver3::Pte
>> pub fn new_vram(pfn: Bounded<Pfn, 40>, writable: bool) -> Self { ... }
>>
>> The unified Pte enum wrapper already dispatches to the correct
>> version-specific constructor, which would enforce the correct Bounded
>> constraint for that version.
>
> But then what type does the `new_vram` dispatch method take? Generic
> code lets us expose the expected `Bounded` type to the caller, which can
> do the proper validation. This is a small example, but I expect this
> pattern to come up in other parts of the code as well.
Maybe I didn't follow your point, but the caller would do a version match and
pass the correct bounded number of bits, but granted that's uglier if/once we
switched to bounded types here.
>>
>>> In order to fully benefit from it, we will need the bitfield macro from
>>> the `kernel` crate so the PDE/PTE fields can be `Bounded`, I will try to
>>> make it available quickly in a patch that you can depend on.
>>
>> That would be great, and I'd be happy to integrate Bounded validation once
>> the macro is available. I just don't think we need to restructure the
>> dispatch model in order to benefit from it.
>
> I'll finish the series and hopefully send it a bit later today. That's
> another significant rework for the series (sorry about that) but it
> should be worth the effort for the added correctness.
Sure!
thanks,
--
Joel Fernandes
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-13 20:10 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Eliot Courtney, linux-kernel, Miguel Ojeda,
Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Dave Airlie,
Daniel Almeida, Koen Koning, dri-devel, rust-for-linux,
Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, Alexey Ivanov,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHOKLTRRIX2Z.1YA9X0D0X21K@kernel.org>
Hi Danilo,
On 4/9/2026 7:00 AM, Danilo Krummrich wrote:
> On Thu Apr 9, 2026 at 12:33 PM CEST, Joel Fernandes wrote:
>> Since it is 3 against 1 here, I rest my case :-).
>
> That's not how I'd view it. :)
>
> Anyways, in case I'm included in "3", that's not my position. My point was to
> ensure we keep discussing advantages and disadvantages on their merits, as I
> think you both have good points.
Heh, yes I actually *did not* include you in the 3 since you sounded to be open
to both. ;-)
>
>> I am still in disagreement since I do not see much benefit (that is why I said
>> pointless above).
>
> That is fair -- in this case please explain why the advantages pointed out by
> others are not worth it, propose something that picks up the best of both
> worlds, etc.
>
> You can also turn it around and ask people whether they can tweak their counter
> proposal to get rid of specific parts you dislike for a reason.
>
> IOW, keep the ball rolling, so we can come up with the best possible solution.
Good advice, thanks! I will try to come up with something that is acceptable to
everyone and we can further debate pros/cons on v11.
There are some merits on the alternative proposal from Eliot/Alex that I'd like
to explore while seeing if I can keep some of the merits in mine as well.
thanks,
--
Joel Fernandes
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-13 20:25 UTC (permalink / raw)
To: Gary Guo, Alexandre Courbot, Eliot Courtney, Danilo Krummrich
Cc: linux-kernel, Miguel Ojeda, Boqun Feng, Bjorn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Huang Rui, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, John Hubbard, Alistair Popple,
Timur Tabi, Edwin Peer, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, alexeyi, joel,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <DHOKN9XVOTIB.1A4JY7CDJFWPS@garyguo.net>
On 4/9/2026 7:02 AM, Gary Guo wrote:
> On Wed Apr 8, 2026 at 9:19 PM BST, Joel Fernandes wrote:
>> Hi Alex, Eliot, Danilo,
>>
>> Thanks for taking a look. Let me respond to the specific points below.
>>
>> On Wed, 08 Apr 2026, Alexandre Courbot wrote:
>>> After a quick look I'd say that having a trait here would actually be
>>> *good* for correctness and maintainability.
>>>
>>> The current design implies that every operation on a page table (most
>>> likely using the walker) goes through a branching point. Just looking at
>>> `PtWalk::read_pte_at_level`, there are already at least 6
>>> `if version == 2 { } else { }` branches that all resolve to the same
>>> result. Include walking down the PDEs and you have at least a dozen of
>>> these just to resolve a virtual address. I know CPUs are fast, but this
>>> is still wasted cycles for no good reason.
>>
>> I did some measurements and there is no notieceable difference in both
>> approaches. I ran perf and loaded nova with self-tests running. The extra
>> potential branching is lost in the noise. In both cases, loading nova and
>> running the self-tests has ~119.7M branch instructions on my Ampere. The total
>> instruction count is also identical (~615M).
>>
>> I measured like this:
>> perf stat -e
>> branches,branch-misses,cache-references,cache-misses,instructions,cycles --
>> modprobe nova_core
>>
>> So I think the branching argument is not a strong one. I also did more
>> measurements and the dominant time taken is MMIO. During the map prep and
>> execute, page table walks are done. A TLB flush alone costs ~1.4 microseconds.
>> And PRAMIN BAR0 writes to write the PTE is also about 1 microsecond. Considering
>> this, I don't think the extra branching argument holds (even without branch
>> prediction and speculation).
>>
>> Also some branches cannot be eliminated even with parameterization:
>>
>> if level == self.mmu_version.dual_pde_level() {
>> // 128-bit dual PDE read
>> } else {
>> // Regular 64-bit PDE read
>> }
>>
>> This isn't really a version branch -- it's a structural branch that
>> distinguishes between 64-bit PDE and 128-bit dual PDE entries. Any MMU
>> version with a dual PDE level would need this same distinction.
>>
>> I also did code-generation size analysis (see diff of code used below):
>>
>> Code generation analysis:
>>
>> Module .ko size: Before: 511,792 bytes After: 524,464 bytes (+2.5%)
>> .text section: Before: 112,620 bytes After: 116,628 bytes (+4,008 bytes)
>>
>> The +4K .text growth is the monomorphization cost: every generic function
>> is compiled twice (once for MmuV2, once for MmuV3).
>>
>>> If you use a trait here, and make `PtWalk` generic against it, you can
>>> optimize this away. We had a similar situation when we introduced Turing
>>> support and the v2 ucode header, and tried both approaches: the
>>> trait-based one was slightly shorter, and arguably more readable.
>>
>> Actually I was the one who suggested traits for Falcon ucode descriptor if you
>> see this thread [1]. So basically you and Eliot are telling me to do what I
>> suggested in [1]. :-) However, I disagree that it is the right choice for this code.
>>
>> [1] https://lore.kernel.org/all/20251117231028.GA1095236@joelbox2/
>>
>> I think the two cases are quite different in complexity:
>>
>> The falcon ucode descriptor is essentially a set of flat field accessors
>> and a few params (imem_sec_load_params, dmem_load_params).
>> The trait has ~10 simple getter methods. There's no multi-level hierarchy,
>> no walker, and no generic propagation.
>>
>> The MMU page table case is structurally different. Making PtWalk generic
>> over an Mmu trait would require:
>>
>> - PtWalk<M: Mmu> (the walker)
>> - Plus all the associated types: M::Pte, M::Pde, M::DualPde each
>> needing their own trait bounds
>>
>> And we would also need:
>> - Vmm<M: Mmu> (which creates PtWalk)
>> - BarUser<M: Mmu> (which creates Vmm)
>>
>> I am also against making Vmm an enum as Eliot suggested:
>> enum Vmm {
>> V2(VmmInner<MmuV2>),
>> V3(VmmInner<MmuV3>),
>> }
>>
>> That moves the version complexity up to the reader. Code complexity IMO should
>> decrease as we go up abstractions, making it easier for users (Vmm/Bar).
>>
>> If you look at the the changes in vmm.rs to handle version dispatch there [2]:
>> Added: +109
>> Removed: -28
>>
>> [2]
>> https://github.com/Edgeworth/linux/commit/3627af550b61256184d589e7ec666c1108971f0e
>>
>> The main benefit of my approach is version-specific dispatch complexity is
>> completely isolated inside MmuVersion thus making the code outside of
>> pagetable.rs much more readable, without having to parametrize anything, and
>> without code size increase. I think that is worth considering.
>>
>>> But the main argument to use a trait here IMO is that it enables
>>> associated types and constants. That's particularly critical since some
>>> equivalent fields have different lengths between v2 and v3. An
>>> associated `Bounded` type for these would force the caller to validate
>>> the length of these fields before calling a non-fallible operation,
>>> which is exactly the level of caution that we want when dealing with
>>> page tables.
>>
>> I think Bounded validation is orthogonal to the dispatch model.
>> We can add Bounded to the current design without restructuring
>> into traits. For example:
>>
>> // In ver2::Pte
>> pub fn new_vram(pfn: Bounded<Pfn, 25>, writable: bool) -> Self { ... }
>>
>> // In ver3::Pte
>> pub fn new_vram(pfn: Bounded<Pfn, 40>, writable: bool) -> Self { ... }
>>
>> The unified Pte enum wrapper already dispatches to the correct
>> version-specific constructor, which would enforce the correct Bounded
>> constraint for that version.
>>
>>> In order to fully benefit from it, we will need the bitfield macro from
>>> the `kernel` crate so the PDE/PTE fields can be `Bounded`, I will try to
>>> make it available quickly in a patch that you can depend on.
>>
>> That would be great, and I'd be happy to integrate Bounded validation once
>> the macro is available. I just don't think we need to restructure the
>> dispatch model in order to benefit from it.
>>
>>> But long story short, and although I need to dive deeper into the code,
>>> this looks like a good candidate for using a trait and associated types.
>>
>> The walker code (walk.rs) is already version-agnostic and reads cleanly.
>> The version dispatch is encapsulated behind method calls, not exposed as
>> inline if/else blocks.
>>
>> Generic propagation (or version-specific dispatch at higher levels) adds more
>> complexity at higher layers.
>>
>> Enclosed below [3] is the diff I used for my testing with the data, I don't
>> really see a net readability win there (IMO, it is a net-loss in readability).
>>
>> [3]
>> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=trait-pt-dispatch&id=5eb0e98af11ba608ff4d0f7a06065ee863f5066a
>
> IMO this diff is quite has got me quite in favour of trait approach.
>
> I wanted about to purpose something similar (or maybe I had already?) trait
> approach some versions ago but didn't due to the eventual need of `match` like
> dispatch (like you had with `vmm_dispatch`), but your code made that looks not
> as bad as I thought it would be.
>
That's the drawback right, now vmm_dispatch has to deal with version difference
where as before, the lower layers would. Maybe we can keep the vmm layer the way
it is now, but do the dispatch itself at the lower layers, while still using
traits like in the diff. I'll try that as well. :)
thanks,
--
Joel Fernandes
^ permalink raw reply
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: Ira Weiny @ 2026-04-13 20:51 UTC (permalink / raw)
To: Alison Schofield, John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <acrpbBt5UsWEiEbm@aschofie-mobl2.lan>
Alison Schofield wrote:
> On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > This patch series along with the bundled patches to fuse are available
> > as a git tag at [0].
> >
> > Dropped the "bundle" thread. If this submission goes smoothly, I'll update
> > the fuse patches to v10 (very little change there as yet).
> >
> > Changes v9 -> v10
> > - Minor modernizations per comments from (mostly) Jonathan
> > - Minor Kconfig simplification
> > - bus.c:dax_match_type(): don't make fsdev_dax eligible for automatic binding
> > where devdax would otherwise bind
> > - dax-private.h: add missing kerneldoc comment for field cached_size in
> > struct dev_dax_range (thanks Dave)
> > - fsdev_write_dax(): s/pmem_addr/addr/ (thanks Dave)
> > - include/linux/dax.h: remove a spuriously-added declaration of inode_dax()
> > (thanks Jonathan)
> >
> > Description:
> >
> > This patch series introduces the required dax support for famfs.
> > Previous versions of the famfs series included both dax and fuse patches.
> > This series separates them into separate patch series' (and the fuse
> > series dependends on this dax series).
> >
> > The famfs user space code can be found at [1]
> >
> > Dax Overview:
> >
> > This series introduces a new "famfs mode" of devdax, whose driver is
> > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > dax_iomap_fault() calls against a character dax instance. A dax device
> > now can be converted among three modes: 'system-ram', 'devdax' and
> > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> >
> > In famfs mode, a dax device initializes its pages consistent with the
> > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > dax_iomap_fault() for mmap faults.
> >
>
> Here's what I found:
>
> famfs-v10 on 7.0-rc5 + ndctl v84:
> dax suite all pass 13/13, so no regression appears
>
> famfs-v10 on 7.0-rc5 +
> (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> top 3 patches + edit daxctl-famfs.sh to use cxl-test:
>
> existing dax suite keeps passing
> daxctl-famfs.sh oops w the new test at # Restore original mode"
> seems easy to repoduce, maybe cannot go back to system-ram???
John have you been able to reproduce this?
Ira
>
> Let me know if you need more info.
>
> -- Alison
>
>
^ permalink raw reply
* Re: [PATCH 3/6] hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
From: jane.chu @ 2026-04-13 21:32 UTC (permalink / raw)
To: Oscar Salvador
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad0rUB4FuNUOJ1pN@localhost.localdomain>
On 4/13/2026 10:43 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:54PM -0600, Jane Chu wrote:
>> hugetlb_fault_mutex_hash() is used to serialize faults and page cache
>> operations on the same hugetlb file offset. The helper currently expects
>> its index argument in hugetlb page granularity, so callers have to
>> open-code conversions from the PAGE_SIZE-based indices commonly used
>> in the rest of MM helpers.
>>
>> Change hugetlb_fault_mutex_hash() to take a PAGE_SIZE-based index
>> instead, and perform the hugetlb-granularity conversion inside the helper.
>> Update all callers accordingly.
>>
>> This makes the helper interface consistent with filemap_get_folio(),
>> and linear_page_index(), while preserving the same lock selection for
>> a given hugetlb file offset.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>> fs/hugetlbfs/inode.c | 19 ++++++++++---------
>> mm/hugetlb.c | 28 +++++++++++++++++++---------
>> mm/memfd.c | 11 ++++++-----
>> mm/userfaultfd.c | 7 +++----
>> 4 files changed, 38 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index cf79fb830377..e24e9bf54e14 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -575,7 +575,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>> struct address_space *mapping = &inode->i_data;
>> const pgoff_t end = lend >> PAGE_SHIFT;
>> struct folio_batch fbatch;
>> - pgoff_t next, index;
>> + pgoff_t next, idx;
>> int i, freed = 0;
>> bool truncate_op = (lend == LLONG_MAX);
>>
>> @@ -586,15 +586,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
>> struct folio *folio = fbatch.folios[i];
>> u32 hash = 0;
>>
>> - index = folio->index >> huge_page_order(h);
>> - hash = hugetlb_fault_mutex_hash(mapping, index);
>> + hash = hugetlb_fault_mutex_hash(mapping, folio->index);
>> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>>
>> /*
>> * Remove folio that was part of folio_batch.
>> */
>> + idx = folio->index >> huge_page_order(h);
>> remove_inode_single_folio(h, inode, mapping, folio,
>> - index, truncate_op);
>> + idx, truncate_op);
>
> Since this is the only place we call remove_inode_single_folio(), and that we do not
> the index (at least index >> huge_page_order()) directly in this function, would it not be
> better to make remove_inode_single_folio do the conversion itself?
In PATCH 6/6, remove_inode_hugepages() is changed to call
remove_inode_single_folio() passing "folio->index" directly,
thus eliminating the above conversion altogether.
I apologize for dividing up the patches this way, function by function,
for my convenience, introduced some temporary changes. The overall
resulted code hopefully is clearer.
>
> Also, I am thinking out loud here but we do have a few places where we
> go: idx = index >> huge_page_order() to convert it into hugepage units, but the casual
> reader might be a bit puzzled about that.
> So, would it be worth to have implement an inline helper with an accurate name
> to do that? It might help whoever reads that?
>
Indeed, will add below inline helpers -
pgoff_t huge_to_base(pgoff_t idx);
pgoff_t base_to_huge(pgoff_t index);
thanks!
-jane
^ permalink raw reply
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: Ira Weiny @ 2026-04-13 21:37 UTC (permalink / raw)
To: Ira Weiny, Alison Schofield, John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <69dd576924b0f_24f910029@iweiny-mobl.notmuch>
Ira Weiny wrote:
> Alison Schofield wrote:
> > On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > > From: John Groves <john@groves.net>
> > >
[snip]
> > >
> > > Description:
> > >
> > > This patch series introduces the required dax support for famfs.
> > > Previous versions of the famfs series included both dax and fuse patches.
> > > This series separates them into separate patch series' (and the fuse
> > > series dependends on this dax series).
> > >
> > > The famfs user space code can be found at [1]
> > >
> > > Dax Overview:
> > >
> > > This series introduces a new "famfs mode" of devdax, whose driver is
> > > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > > dax_iomap_fault() calls against a character dax instance. A dax device
> > > now can be converted among three modes: 'system-ram', 'devdax' and
> > > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> > >
> > > In famfs mode, a dax device initializes its pages consistent with the
> > > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > > dax_iomap_fault() for mmap faults.
> > >
> >
> > Here's what I found:
> >
> > famfs-v10 on 7.0-rc5 + ndctl v84:
> > dax suite all pass 13/13, so no regression appears
> >
> > famfs-v10 on 7.0-rc5 +
> > (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> > top 3 patches + edit daxctl-famfs.sh to use cxl-test:
> >
> > existing dax suite keeps passing
> > daxctl-famfs.sh oops w the new test at # Restore original mode"
> > seems easy to repoduce, maybe cannot go back to system-ram???
>
> John have you been able to reproduce this?
>
> Ira
^ permalink raw reply
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: Ira Weiny @ 2026-04-13 21:40 UTC (permalink / raw)
To: Ira Weiny, Alison Schofield, John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <69dd576924b0f_24f910029@iweiny-mobl.notmuch>
Ira Weiny wrote:
> Alison Schofield wrote:
> > On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > > From: John Groves <john@groves.net>
> > >
[snip]
> > >
> > > Description:
> > >
> > > This patch series introduces the required dax support for famfs.
> > > Previous versions of the famfs series included both dax and fuse patches.
> > > This series separates them into separate patch series' (and the fuse
> > > series dependends on this dax series).
> > >
> > > The famfs user space code can be found at [1]
> > >
> > > Dax Overview:
> > >
> > > This series introduces a new "famfs mode" of devdax, whose driver is
> > > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > > dax_iomap_fault() calls against a character dax instance. A dax device
> > > now can be converted among three modes: 'system-ram', 'devdax' and
> > > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> > >
> > > In famfs mode, a dax device initializes its pages consistent with the
> > > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > > dax_iomap_fault() for mmap faults.
> > >
> >
> > Here's what I found:
> >
> > famfs-v10 on 7.0-rc5 + ndctl v84:
> > dax suite all pass 13/13, so no regression appears
> >
> > famfs-v10 on 7.0-rc5 +
> > (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> > top 3 patches + edit daxctl-famfs.sh to use cxl-test:
> >
> > existing dax suite keeps passing
> > daxctl-famfs.sh oops w the new test at # Restore original mode"
> > seems easy to repoduce, maybe cannot go back to system-ram???
>
> John have you been able to reproduce this?
>
> Ira
John I've found a different crash with the daxctl-famfs.sh test. See
below.
I got the ndctl repo with the test from Alison.
I'm not at all clear what is happening at this point...
Ira
<crash>
[ 519.007691] BUG: TASK stack guard page was hit at ffffc90001767fc8 (stack is ffffc90001768000..ffffc9000176c000)
[ 519.007694] Oops: stack guard page: 0000 [#1] SMP NOPTI
[ 519.007697] CPU: 0 UID: 0 PID: 1465 Comm: daxctl Tainted: G O 7.0.0-rc6ira+ #68 PREEMPT(full)
[ 519.007699] Tainted: [O]=OOT_MODULE
[ 519.007700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025
[ 519.007701] RIP: 0010:sprintf+0xc/0x50
[ 519.007709] Code: 24 10 e8 37 f8 ff ff c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 48 48 8d 45 10 <48>
89 54 24 28 48 89 f2 be ff ff ff 7f 48 89 4c 24 30 48 89 e1 48
[ 519.007710] RSP: 0018:ffffc90001767fd0 EFLAGS: 00010282
[ 519.007712] RAX: ffffc90001768028 RBX: ffffc90001768068 RCX: 0000000000001e08
[ 519.007712] RDX: 0000000000000207 RSI: ffffffff82abab1c RDI: ffffc90001768068
[ 519.007713] RBP: ffffc90001768018 R08: 0000000000000000 R09: 0000000000000001
[ 519.007713] R10: ffffc90001768110 R11: 0000000000000002 R12: 0000000000000800
[ 519.007714] R13: ffffc90001768068 R14: 0000000000000000 R15: ffffffff839c71c0
[ 519.007715] FS: 00007fb94b807c80(0000) GS:ffff8880f9e9c000(0000) knlGS:0000000000000000
[ 519.007717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 519.007717] CR2: ffffc90001767fc8 CR3: 0000000077d2e005 CR4: 0000000000770ef0
[ 519.007720] PKRU: 55555554
[ 519.007721] Call Trace:
[ 519.007722] <TASK>
[ 519.007723] info_print_prefix+0xc0/0xe0
[ 519.007728] record_print_text+0x58/0x2d0
[ 519.007730] printk_get_next_message+0xd8/0x220
[ 519.007733] console_flush_one_record+0x1a5/0x390
[ 519.007735] console_unlock+0x5a/0xe0
[ 519.007737] vprintk_emit+0x2e8/0x340
[ 519.007738] _printk+0x48/0x50
[ 519.007741] ? printk_get_next_message+0x70/0x220
[ 519.007743] __dump_page.cold+0x3c/0x331
[ 519.007746] ? dump_page+0x1b/0x30
[ 519.007748] dump_page+0x1b/0x30
[ 519.007749] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007751] get_pfnblock_migratetype+0xa/0x20
[ 519.007753] __dump_page.cold+0x1c6/0x331
[ 519.007755] ? dump_page+0x1b/0x30
[ 519.007756] dump_page+0x1b/0x30
[ 519.007756] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007757] get_pfnblock_migratetype+0xa/0x20
[ 519.007758] __dump_page.cold+0x1c6/0x331
[ 519.007760] ? dump_page+0x1b/0x30
[ 519.007761] dump_page+0x1b/0x30
[ 519.007761] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007762] get_pfnblock_migratetype+0xa/0x20
[ 519.007763] __dump_page.cold+0x1c6/0x331
[ 519.007765] ? dump_page+0x1b/0x30
[ 519.007765] dump_page+0x1b/0x30
[ 519.007766] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007767] get_pfnblock_migratetype+0xa/0x20
[ 519.007772] __dump_page.cold+0x1c6/0x331
[ 519.007774] ? dump_page+0x1b/0x30
[ 519.007775] dump_page+0x1b/0x30
[ 519.007775] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007776] get_pfnblock_migratetype+0xa/0x20
[ 519.007777] __dump_page.cold+0x1c6/0x331
[ 519.007779] ? dump_page+0x1b/0x30
[ 519.007780] dump_page+0x1b/0x30
[ 519.007780] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007781] get_pfnblock_migratetype+0xa/0x20
[ 519.007782] __dump_page.cold+0x1c6/0x331
[ 519.007784] ? dump_page+0x1b/0x30
[ 519.007785] dump_page+0x1b/0x30
[ 519.007785] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007786] get_pfnblock_migratetype+0xa/0x20
[ 519.007787] __dump_page.cold+0x1c6/0x331
[ 519.007789] ? dump_page+0x1b/0x30
[ 519.007790] dump_page+0x1b/0x30
[ 519.007790] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007791] get_pfnblock_migratetype+0xa/0x20
[ 519.007792] __dump_page.cold+0x1c6/0x331
[ 519.007794] ? dump_page+0x1b/0x30
[ 519.007795] dump_page+0x1b/0x30
[ 519.007795] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007796] get_pfnblock_migratetype+0xa/0x20
[ 519.007797] __dump_page.cold+0x1c6/0x331
[ 519.007799] ? dump_page+0x1b/0x30
[ 519.007800] dump_page+0x1b/0x30
[ 519.007800] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007801] get_pfnblock_migratetype+0xa/0x20
[ 519.007802] __dump_page.cold+0x1c6/0x331
[ 519.007804] ? dump_page+0x1b/0x30
[ 519.007805] dump_page+0x1b/0x30
[ 519.007808] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007809] get_pfnblock_migratetype+0xa/0x20
[ 519.007810] __dump_page.cold+0x1c6/0x331
[ 519.007812] ? dump_page+0x1b/0x30
[ 519.007813] dump_page+0x1b/0x30
[ 519.007813] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007814] get_pfnblock_migratetype+0xa/0x20
[ 519.007815] __dump_page.cold+0x1c6/0x331
[ 519.007817] ? dump_page+0x1b/0x30
[ 519.007818] dump_page+0x1b/0x30
[ 519.007818] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007819] get_pfnblock_migratetype+0xa/0x20
[ 519.007820] __dump_page.cold+0x1c6/0x331
[ 519.007822] ? dump_page+0x1b/0x30
[ 519.007823] dump_page+0x1b/0x30
[ 519.007824] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007824] get_pfnblock_migratetype+0xa/0x20
[ 519.007825] __dump_page.cold+0x1c6/0x331
[ 519.007827] ? dump_page+0x1b/0x30
[ 519.007828] dump_page+0x1b/0x30
[ 519.007829] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007829] get_pfnblock_migratetype+0xa/0x20
[ 519.007830] __dump_page.cold+0x1c6/0x331
[ 519.007833] ? dump_page+0x1b/0x30
[ 519.007833] dump_page+0x1b/0x30
[ 519.007834] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007834] get_pfnblock_migratetype+0xa/0x20
[ 519.007835] __dump_page.cold+0x1c6/0x331
[ 519.007838] ? dump_page+0x1b/0x30
[ 519.007838] dump_page+0x1b/0x30
[ 519.007839] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007840] get_pfnblock_migratetype+0xa/0x20
[ 519.007841] __dump_page.cold+0x1c6/0x331
[ 519.007843] ? dump_page+0x1b/0x30
[ 519.007843] dump_page+0x1b/0x30
[ 519.007844] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007845] get_pfnblock_migratetype+0xa/0x20
[ 519.007846] __dump_page.cold+0x1c6/0x331
[ 519.007848] ? dump_page+0x1b/0x30
[ 519.007849] dump_page+0x1b/0x30
[ 519.007849] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007850] get_pfnblock_migratetype+0xa/0x20
[ 519.007851] __dump_page.cold+0x1c6/0x331
[ 519.007853] ? dump_page+0x1b/0x30
[ 519.007854] dump_page+0x1b/0x30
[ 519.007854] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007855] get_pfnblock_migratetype+0xa/0x20
[ 519.007856] __dump_page.cold+0x1c6/0x331
[ 519.007858] ? dump_page+0x1b/0x30
[ 519.007859] dump_page+0x1b/0x30
[ 519.007859] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007860] get_pfnblock_migratetype+0xa/0x20
[ 519.007861] __dump_page.cold+0x1c6/0x331
[ 519.007863] ? dump_page+0x1b/0x30
[ 519.007864] dump_page+0x1b/0x30
[ 519.007864] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007865] get_pfnblock_migratetype+0xa/0x20
[ 519.007866] __dump_page.cold+0x1c6/0x331
[ 519.007868] ? dump_page+0x1b/0x30
[ 519.007869] dump_page+0x1b/0x30
[ 519.007869] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007870] get_pfnblock_migratetype+0xa/0x20
[ 519.007871] __dump_page.cold+0x1c6/0x331
[ 519.007873] ? dump_page+0x1b/0x30
[ 519.007874] dump_page+0x1b/0x30
[ 519.007874] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007875] get_pfnblock_migratetype+0xa/0x20
[ 519.007876] __dump_page.cold+0x1c6/0x331
[ 519.007878] ? dump_page+0x1b/0x30
[ 519.007879] dump_page+0x1b/0x30
[ 519.007880] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007880] get_pfnblock_migratetype+0xa/0x20
[ 519.007881] __dump_page.cold+0x1c6/0x331
[ 519.007883] ? dump_page+0x1b/0x30
[ 519.007884] dump_page+0x1b/0x30
[ 519.007885] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007885] get_pfnblock_migratetype+0xa/0x20
[ 519.007886] __dump_page.cold+0x1c6/0x331
[ 519.007889] ? dump_page+0x1b/0x30
[ 519.007889] dump_page+0x1b/0x30
[ 519.007890] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007890] get_pfnblock_migratetype+0xa/0x20
[ 519.007891] __dump_page.cold+0x1c6/0x331
[ 519.007894] ? dump_page+0x1b/0x30
[ 519.007894] dump_page+0x1b/0x30
[ 519.007895] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007895] get_pfnblock_migratetype+0xa/0x20
[ 519.007896] __dump_page.cold+0x1c6/0x331
[ 519.007899] ? dump_page+0x1b/0x30
[ 519.007899] dump_page+0x1b/0x30
[ 519.007900] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007900] get_pfnblock_migratetype+0xa/0x20
[ 519.007901] __dump_page.cold+0x1c6/0x331
[ 519.007904] ? dump_page+0x1b/0x30
[ 519.007904] dump_page+0x1b/0x30
[ 519.007905] __get_pfnblock_flags_mask+0x6c/0xe0
[ 519.007905] get_pfnblock_migratetype+0xa/0x20
[ 519.007906] __dump_page.cold+0x1c6/0x331
[ 519.007907] ? do_file_open+0xbe/0x150
[ 519.007910] ? stack_depot_save_flags+0x24/0x910
[ 519.007918] ? dump_page+0x1b/0x30
[ 519.007919] dump_page+0x1b/0x30
[ 519.007920] memmap_init_range+0x2f6/0x310
[ 519.007922] move_pfn_range_to_zone+0xee/0x220
[ 519.007924] mhp_init_memmap_on_memory+0x23/0xb0
[ 519.007926] memory_subsys_online+0x122/0x1a0
[ 519.007929] device_online+0x49/0x80
[ 519.007931] state_store+0x8e/0xa0
[ 519.007932] kernfs_fop_write_iter+0x136/0x1f0
[ 519.007935] vfs_write+0x205/0x460
[ 519.007937] ksys_write+0x57/0xd0
[ 519.007938] do_syscall_64+0x106/0x5f0
[ 519.007940] ? irqentry_exit+0x6c/0x520
[ 519.007941] ? exc_page_fault+0x66/0x180
[ 519.007942] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 519.007944] RIP: 0033:0x7fb94ba3473e
[ 519.007946] Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9>
c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
[ 519.007946] RSP: 002b:00007fff47c8ddd0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 519.007948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb94ba3473e
[ 519.007948] RDX: 000000000000000f RSI: 00007fb94bc21a3e RDI: 0000000000000004
[ 519.007949] RBP: 00007fff47c8dde0 R08: 0000000000000000 R09: 0000000000000000
[ 519.007949] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff47c8e3f8
[ 519.007950] R13: 0000000000000006 R14: 00007fb94bc67000 R15: 0000000000413d88
[ 519.007951] </TASK>
[ 519.007951] Modules linked in: cxl_test(O) cxl_acpi(O) device_dax(O) fsdev_dax kmem nd_pmem(O) nd_btt(O) cxl_pmu dax_cxl dax_pmem(O) cxl_pci nd_e820
(O) nfit(O) cxl_mock_mem(O) cxl_pmem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) libnvdimm(O) nfit_test_iomap(O) cxl_core(O) fwctl [last unloaded: cxl_acpi(O
)]
[ 519.007962] ---[ end trace 0000000000000000 ]---
[ 519.007963] RIP: 0010:sprintf+0xc/0x50
[ 519.007964] Code: 24 10 e8 37 f8 ff ff c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 48 48 8d 45 10 <48>
89 54 24 28 48 89 f2 be ff ff ff 7f 48 89 4c 24 30 48 89 e1 48
[ 519.007965] RSP: 0018:ffffc90001767fd0 EFLAGS: 00010282
[ 519.007966] RAX: ffffc90001768028 RBX: ffffc90001768068 RCX: 0000000000001e08
[ 519.007966] RDX: 0000000000000207 RSI: ffffffff82abab1c RDI: ffffc90001768068
[ 519.007967] RBP: ffffc90001768018 R08: 0000000000000000 R09: 0000000000000001
[ 519.007967] R10: ffffc90001768110 R11: 0000000000000002 R12: 0000000000000800
[ 519.007967] R13: ffffc90001768068 R14: 0000000000000000 R15: ffffffff839c71c0
[ 519.007968] FS: 00007fb94b807c80(0000) GS:ffff8880f9e9c000(0000) knlGS:0000000000000000
[ 519.007969] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 519.007969] CR2: ffffc90001767fc8 CR3: 0000000077d2e005 CR4: 0000000000770ef0
[ 519.007971] PKRU: 55555554
[ 519.007972] Kernel panic - not syncing: Fatal exception in interrupt
[ 519.008404] Kernel Offset: disabled
[ 519.083400] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
^ permalink raw reply
* Re: maintainer profiles
From: Dan Williams @ 2026-04-13 21:39 UTC (permalink / raw)
To: Jonathan Corbet, Randy Dunlap, Linux Documentation,
Linux Kernel Mailing List
Cc: Linux Kernel Workflows
In-Reply-To: <87wlyawum7.fsf@trenco.lwn.net>
Jonathan Corbet wrote:
> Randy Dunlap <rdunlap@infradead.org> writes:
>
> > Hi,
> >
> > Is there supposed to be a difference (or distinction) in the contents of
> >
> > Documentation/process/maintainer-handbooks.rst
> > and
> > Documentation/maintainer/maintainer-entry-profile.rst
> > ?
> >
> > Can they be combined into one location?
>
> Late to the party, sorry ... the original idea, I believe, was that
> maintainer-handbooks.rst would be for developers looking for a guidebook
> for a specific subsystem, while maintainer-entry-profile.rst was about
> how maintainers themselves should write their subsystem guide.
> Doubtless things have drifted since then... But the intended audiences
> were different, so it might be good to think about bringing them back
> into focus.
Right, I think something (roughly / hand-wavy) like the below is the
intent. However, as I write that I notice that the combined list is a
bit of a mess. I also notice that there are more "P:" entries in
MAINTAINERS than there are entries in this maintainer-handbooks.rst
list.
So this probably wants to be a script that can build Documentation links
from MAINTAINERS, or otherwise provide a script for developers to query
a kernel tree for additional submission guides. It is probably not as
important for the built docs to link all guides as it is for developers
(or their agents) to live query a tree they are developing against.
Note the problem goes both ways, there are P: entries not in the
combined handbook list, like the Security subsystem, and there are
handbook entries without a P:, like the Tip tree.
diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
index 6020d188e13d..58e2af333692 100644
--- a/Documentation/maintainer/maintainer-entry-profile.rst
+++ b/Documentation/maintainer/maintainer-entry-profile.rst
@@ -92,24 +92,8 @@ full series, or privately send a reminder email. This section might also
list how review works for this code area and methods to get feedback
that are not directly from the maintainer.
-Existing profiles
------------------
-
-For now, existing maintainer profiles are listed here; we will likely want
-to do something different in the near future.
-
-.. toctree::
- :maxdepth: 1
-
- ../doc-guide/maintainer-profile
- ../nvdimm/maintainer-entry-profile
- ../arch/riscv/patch-acceptance
- ../process/maintainer-soc
- ../process/maintainer-soc-clean-dts
- ../driver-api/media/maintainer-entry-profile
- ../process/maintainer-netdev
- ../driver-api/vfio-pci-device-specific-driver-acceptance
- ../nvme/feature-and-quirk-policy
- ../filesystems/nfs/nfsd-maintainer-entry-profile
- ../filesystems/xfs/xfs-maintainer-entry-profile
- ../mm/damon/maintainer-profile
+Maintainer Handbooks
+--------------------
+
+For examples of other subsystem handbooks see
+Documentation/process/maintainer-handbooks.rst.
diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
index 976391cec528..bc9299a04b1f 100644
--- a/Documentation/process/maintainer-handbooks.rst
+++ b/Documentation/process/maintainer-handbooks.rst
@@ -9,14 +9,33 @@ The purpose of this document is to provide subsystem specific information
which is supplementary to the general development process handbook
:ref:`Documentation/process <development_process_main>`.
+For developers, see below for all the known subsystem specific guides.
+If the subsystem you are contributing to does not have a guide listed
+here, it is fair to seek clarification of questions raised in
+Documentation/maintainer/maintainer-entry-profile.rst.
+
+For maintainers, consider documenting additional requirements and
+expectations if submissions routinely overlook specific submission
+criteria. See Documentation/maintainer/maintainer-entry-profile.rst.
+
Contents:
.. toctree::
:numbered:
:maxdepth: 2
+ maintainer-kvm-x86
maintainer-netdev
maintainer-soc
maintainer-soc-clean-dts
+ maintainer-soc-clean-dts
maintainer-tip
- maintainer-kvm-x86
+ ../arch/riscv/patch-acceptance
+ ../doc-guide/maintainer-profile
+ ../driver-api/media/maintainer-entry-profile
+ ../driver-api/vfio-pci-device-specific-driver-acceptance
+ ../filesystems/nfs/nfsd-maintainer-entry-profile
+ ../filesystems/xfs/xfs-maintainer-entry-profile
+ ../mm/damon/maintainer-profile
+ ../nvdimm/maintainer-entry-profile
+ ../nvme/feature-and-quirk-policy
^ permalink raw reply related
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: John Groves @ 2026-04-13 22:22 UTC (permalink / raw)
To: Ira Weiny
Cc: Alison Schofield, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, John Groves, Jonathan Corbet, Shuah Khan,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
In-Reply-To: <69dd576924b0f_24f910029@iweiny-mobl.notmuch>
On 26/04/13 03:51PM, Ira Weiny wrote:
> Alison Schofield wrote:
> > On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > > From: John Groves <john@groves.net>
> > >
> > > This patch series along with the bundled patches to fuse are available
> > > as a git tag at [0].
> > >
> > > Dropped the "bundle" thread. If this submission goes smoothly, I'll update
> > > the fuse patches to v10 (very little change there as yet).
> > >
> > > Changes v9 -> v10
> > > - Minor modernizations per comments from (mostly) Jonathan
> > > - Minor Kconfig simplification
> > > - bus.c:dax_match_type(): don't make fsdev_dax eligible for automatic binding
> > > where devdax would otherwise bind
> > > - dax-private.h: add missing kerneldoc comment for field cached_size in
> > > struct dev_dax_range (thanks Dave)
> > > - fsdev_write_dax(): s/pmem_addr/addr/ (thanks Dave)
> > > - include/linux/dax.h: remove a spuriously-added declaration of inode_dax()
> > > (thanks Jonathan)
> > >
> > > Description:
> > >
> > > This patch series introduces the required dax support for famfs.
> > > Previous versions of the famfs series included both dax and fuse patches.
> > > This series separates them into separate patch series' (and the fuse
> > > series dependends on this dax series).
> > >
> > > The famfs user space code can be found at [1]
> > >
> > > Dax Overview:
> > >
> > > This series introduces a new "famfs mode" of devdax, whose driver is
> > > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > > dax_iomap_fault() calls against a character dax instance. A dax device
> > > now can be converted among three modes: 'system-ram', 'devdax' and
> > > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> > >
> > > In famfs mode, a dax device initializes its pages consistent with the
> > > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > > dax_iomap_fault() for mmap faults.
> > >
> >
> > Here's what I found:
> >
> > famfs-v10 on 7.0-rc5 + ndctl v84:
> > dax suite all pass 13/13, so no regression appears
> >
> > famfs-v10 on 7.0-rc5 +
> > (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> > top 3 patches + edit daxctl-famfs.sh to use cxl-test:
> >
> > existing dax suite keeps passing
> > daxctl-famfs.sh oops w the new test at # Restore original mode"
> > seems easy to repoduce, maybe cannot go back to system-ram???
>
> John have you been able to reproduce this?
>
> Ira
>
Not yet, but I'm getting ready to try again.
John
^ permalink raw reply
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: John Groves @ 2026-04-13 22:26 UTC (permalink / raw)
To: Ira Weiny
Cc: Alison Schofield, John Groves, Miklos Szeredi, Dan Williams,
Bernd Schubert, John Groves, Jonathan Corbet, Shuah Khan,
Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Joanne Koong, Josef Bacik,
Bagas Sanjaya, Chen Linxuan, James Morse, Fuad Tabba,
Sean Christopherson, Shivank Garg, Ackerley Tng, Gregory Price,
Aravind Ramesh, Ajay Joshi, venkataravis@micron.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
linux-fsdevel@vger.kernel.org
In-Reply-To: <69dd62eba432e_20039100b5@iweiny-mobl.notmuch>
On 26/04/13 04:40PM, Ira Weiny wrote:
> Ira Weiny wrote:
> > Alison Schofield wrote:
> > > On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > > > From: John Groves <john@groves.net>
> > > >
>
> [snip]
>
> > > >
> > > > Description:
> > > >
> > > > This patch series introduces the required dax support for famfs.
> > > > Previous versions of the famfs series included both dax and fuse patches.
> > > > This series separates them into separate patch series' (and the fuse
> > > > series dependends on this dax series).
> > > >
> > > > The famfs user space code can be found at [1]
> > > >
> > > > Dax Overview:
> > > >
> > > > This series introduces a new "famfs mode" of devdax, whose driver is
> > > > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > > > dax_iomap_fault() calls against a character dax instance. A dax device
> > > > now can be converted among three modes: 'system-ram', 'devdax' and
> > > > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> > > >
> > > > In famfs mode, a dax device initializes its pages consistent with the
> > > > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > > > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > > > dax_iomap_fault() for mmap faults.
> > > >
> > >
> > > Here's what I found:
> > >
> > > famfs-v10 on 7.0-rc5 + ndctl v84:
> > > dax suite all pass 13/13, so no regression appears
> > >
> > > famfs-v10 on 7.0-rc5 +
> > > (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> > > top 3 patches + edit daxctl-famfs.sh to use cxl-test:
> > >
> > > existing dax suite keeps passing
> > > daxctl-famfs.sh oops w the new test at # Restore original mode"
> > > seems easy to repoduce, maybe cannot go back to system-ram???
> >
> > John have you been able to reproduce this?
> >
> > Ira
>
> John I've found a different crash with the daxctl-famfs.sh test. See
> below.
>
> I got the ndctl repo with the test from Alison.
>
> I'm not at all clear what is happening at this point...
>
> Ira
>
> <crash>
>
> [ 519.007691] BUG: TASK stack guard page was hit at ffffc90001767fc8 (stack is ffffc90001768000..ffffc9000176c000)
> [ 519.007694] Oops: stack guard page: 0000 [#1] SMP NOPTI
> [ 519.007697] CPU: 0 UID: 0 PID: 1465 Comm: daxctl Tainted: G O 7.0.0-rc6ira+ #68 PREEMPT(full)
> [ 519.007699] Tainted: [O]=OOT_MODULE
> [ 519.007700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250812-19.fc42 08/12/2025
> [ 519.007701] RIP: 0010:sprintf+0xc/0x50
> [ 519.007709] Code: 24 10 e8 37 f8 ff ff c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 48 48 8d 45 10 <48>
> 89 54 24 28 48 89 f2 be ff ff ff 7f 48 89 4c 24 30 48 89 e1 48
> [ 519.007710] RSP: 0018:ffffc90001767fd0 EFLAGS: 00010282
> [ 519.007712] RAX: ffffc90001768028 RBX: ffffc90001768068 RCX: 0000000000001e08
> [ 519.007712] RDX: 0000000000000207 RSI: ffffffff82abab1c RDI: ffffc90001768068
> [ 519.007713] RBP: ffffc90001768018 R08: 0000000000000000 R09: 0000000000000001
> [ 519.007713] R10: ffffc90001768110 R11: 0000000000000002 R12: 0000000000000800
> [ 519.007714] R13: ffffc90001768068 R14: 0000000000000000 R15: ffffffff839c71c0
> [ 519.007715] FS: 00007fb94b807c80(0000) GS:ffff8880f9e9c000(0000) knlGS:0000000000000000
> [ 519.007717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 519.007717] CR2: ffffc90001767fc8 CR3: 0000000077d2e005 CR4: 0000000000770ef0
> [ 519.007720] PKRU: 55555554
> [ 519.007721] Call Trace:
> [ 519.007722] <TASK>
> [ 519.007723] info_print_prefix+0xc0/0xe0
> [ 519.007728] record_print_text+0x58/0x2d0
> [ 519.007730] printk_get_next_message+0xd8/0x220
> [ 519.007733] console_flush_one_record+0x1a5/0x390
> [ 519.007735] console_unlock+0x5a/0xe0
> [ 519.007737] vprintk_emit+0x2e8/0x340
> [ 519.007738] _printk+0x48/0x50
> [ 519.007741] ? printk_get_next_message+0x70/0x220
> [ 519.007743] __dump_page.cold+0x3c/0x331
> [ 519.007746] ? dump_page+0x1b/0x30
> [ 519.007748] dump_page+0x1b/0x30
> [ 519.007749] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007751] get_pfnblock_migratetype+0xa/0x20
> [ 519.007753] __dump_page.cold+0x1c6/0x331
> [ 519.007755] ? dump_page+0x1b/0x30
> [ 519.007756] dump_page+0x1b/0x30
> [ 519.007756] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007757] get_pfnblock_migratetype+0xa/0x20
> [ 519.007758] __dump_page.cold+0x1c6/0x331
> [ 519.007760] ? dump_page+0x1b/0x30
> [ 519.007761] dump_page+0x1b/0x30
> [ 519.007761] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007762] get_pfnblock_migratetype+0xa/0x20
> [ 519.007763] __dump_page.cold+0x1c6/0x331
> [ 519.007765] ? dump_page+0x1b/0x30
> [ 519.007765] dump_page+0x1b/0x30
> [ 519.007766] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007767] get_pfnblock_migratetype+0xa/0x20
> [ 519.007772] __dump_page.cold+0x1c6/0x331
> [ 519.007774] ? dump_page+0x1b/0x30
> [ 519.007775] dump_page+0x1b/0x30
> [ 519.007775] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007776] get_pfnblock_migratetype+0xa/0x20
> [ 519.007777] __dump_page.cold+0x1c6/0x331
> [ 519.007779] ? dump_page+0x1b/0x30
> [ 519.007780] dump_page+0x1b/0x30
> [ 519.007780] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007781] get_pfnblock_migratetype+0xa/0x20
> [ 519.007782] __dump_page.cold+0x1c6/0x331
> [ 519.007784] ? dump_page+0x1b/0x30
> [ 519.007785] dump_page+0x1b/0x30
> [ 519.007785] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007786] get_pfnblock_migratetype+0xa/0x20
> [ 519.007787] __dump_page.cold+0x1c6/0x331
> [ 519.007789] ? dump_page+0x1b/0x30
> [ 519.007790] dump_page+0x1b/0x30
> [ 519.007790] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007791] get_pfnblock_migratetype+0xa/0x20
> [ 519.007792] __dump_page.cold+0x1c6/0x331
> [ 519.007794] ? dump_page+0x1b/0x30
> [ 519.007795] dump_page+0x1b/0x30
> [ 519.007795] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007796] get_pfnblock_migratetype+0xa/0x20
> [ 519.007797] __dump_page.cold+0x1c6/0x331
> [ 519.007799] ? dump_page+0x1b/0x30
> [ 519.007800] dump_page+0x1b/0x30
> [ 519.007800] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007801] get_pfnblock_migratetype+0xa/0x20
> [ 519.007802] __dump_page.cold+0x1c6/0x331
> [ 519.007804] ? dump_page+0x1b/0x30
> [ 519.007805] dump_page+0x1b/0x30
> [ 519.007808] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007809] get_pfnblock_migratetype+0xa/0x20
> [ 519.007810] __dump_page.cold+0x1c6/0x331
> [ 519.007812] ? dump_page+0x1b/0x30
> [ 519.007813] dump_page+0x1b/0x30
> [ 519.007813] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007814] get_pfnblock_migratetype+0xa/0x20
> [ 519.007815] __dump_page.cold+0x1c6/0x331
> [ 519.007817] ? dump_page+0x1b/0x30
> [ 519.007818] dump_page+0x1b/0x30
> [ 519.007818] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007819] get_pfnblock_migratetype+0xa/0x20
> [ 519.007820] __dump_page.cold+0x1c6/0x331
> [ 519.007822] ? dump_page+0x1b/0x30
> [ 519.007823] dump_page+0x1b/0x30
> [ 519.007824] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007824] get_pfnblock_migratetype+0xa/0x20
> [ 519.007825] __dump_page.cold+0x1c6/0x331
> [ 519.007827] ? dump_page+0x1b/0x30
> [ 519.007828] dump_page+0x1b/0x30
> [ 519.007829] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007829] get_pfnblock_migratetype+0xa/0x20
> [ 519.007830] __dump_page.cold+0x1c6/0x331
> [ 519.007833] ? dump_page+0x1b/0x30
> [ 519.007833] dump_page+0x1b/0x30
> [ 519.007834] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007834] get_pfnblock_migratetype+0xa/0x20
> [ 519.007835] __dump_page.cold+0x1c6/0x331
> [ 519.007838] ? dump_page+0x1b/0x30
> [ 519.007838] dump_page+0x1b/0x30
> [ 519.007839] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007840] get_pfnblock_migratetype+0xa/0x20
> [ 519.007841] __dump_page.cold+0x1c6/0x331
> [ 519.007843] ? dump_page+0x1b/0x30
> [ 519.007843] dump_page+0x1b/0x30
> [ 519.007844] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007845] get_pfnblock_migratetype+0xa/0x20
> [ 519.007846] __dump_page.cold+0x1c6/0x331
> [ 519.007848] ? dump_page+0x1b/0x30
> [ 519.007849] dump_page+0x1b/0x30
> [ 519.007849] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007850] get_pfnblock_migratetype+0xa/0x20
> [ 519.007851] __dump_page.cold+0x1c6/0x331
> [ 519.007853] ? dump_page+0x1b/0x30
> [ 519.007854] dump_page+0x1b/0x30
> [ 519.007854] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007855] get_pfnblock_migratetype+0xa/0x20
> [ 519.007856] __dump_page.cold+0x1c6/0x331
> [ 519.007858] ? dump_page+0x1b/0x30
> [ 519.007859] dump_page+0x1b/0x30
> [ 519.007859] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007860] get_pfnblock_migratetype+0xa/0x20
> [ 519.007861] __dump_page.cold+0x1c6/0x331
> [ 519.007863] ? dump_page+0x1b/0x30
> [ 519.007864] dump_page+0x1b/0x30
> [ 519.007864] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007865] get_pfnblock_migratetype+0xa/0x20
> [ 519.007866] __dump_page.cold+0x1c6/0x331
> [ 519.007868] ? dump_page+0x1b/0x30
> [ 519.007869] dump_page+0x1b/0x30
> [ 519.007869] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007870] get_pfnblock_migratetype+0xa/0x20
> [ 519.007871] __dump_page.cold+0x1c6/0x331
> [ 519.007873] ? dump_page+0x1b/0x30
> [ 519.007874] dump_page+0x1b/0x30
> [ 519.007874] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007875] get_pfnblock_migratetype+0xa/0x20
> [ 519.007876] __dump_page.cold+0x1c6/0x331
> [ 519.007878] ? dump_page+0x1b/0x30
> [ 519.007879] dump_page+0x1b/0x30
> [ 519.007880] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007880] get_pfnblock_migratetype+0xa/0x20
> [ 519.007881] __dump_page.cold+0x1c6/0x331
> [ 519.007883] ? dump_page+0x1b/0x30
> [ 519.007884] dump_page+0x1b/0x30
> [ 519.007885] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007885] get_pfnblock_migratetype+0xa/0x20
> [ 519.007886] __dump_page.cold+0x1c6/0x331
> [ 519.007889] ? dump_page+0x1b/0x30
> [ 519.007889] dump_page+0x1b/0x30
> [ 519.007890] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007890] get_pfnblock_migratetype+0xa/0x20
> [ 519.007891] __dump_page.cold+0x1c6/0x331
> [ 519.007894] ? dump_page+0x1b/0x30
> [ 519.007894] dump_page+0x1b/0x30
> [ 519.007895] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007895] get_pfnblock_migratetype+0xa/0x20
> [ 519.007896] __dump_page.cold+0x1c6/0x331
> [ 519.007899] ? dump_page+0x1b/0x30
> [ 519.007899] dump_page+0x1b/0x30
> [ 519.007900] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007900] get_pfnblock_migratetype+0xa/0x20
> [ 519.007901] __dump_page.cold+0x1c6/0x331
> [ 519.007904] ? dump_page+0x1b/0x30
> [ 519.007904] dump_page+0x1b/0x30
> [ 519.007905] __get_pfnblock_flags_mask+0x6c/0xe0
> [ 519.007905] get_pfnblock_migratetype+0xa/0x20
> [ 519.007906] __dump_page.cold+0x1c6/0x331
> [ 519.007907] ? do_file_open+0xbe/0x150
> [ 519.007910] ? stack_depot_save_flags+0x24/0x910
> [ 519.007918] ? dump_page+0x1b/0x30
> [ 519.007919] dump_page+0x1b/0x30
> [ 519.007920] memmap_init_range+0x2f6/0x310
> [ 519.007922] move_pfn_range_to_zone+0xee/0x220
> [ 519.007924] mhp_init_memmap_on_memory+0x23/0xb0
> [ 519.007926] memory_subsys_online+0x122/0x1a0
> [ 519.007929] device_online+0x49/0x80
> [ 519.007931] state_store+0x8e/0xa0
> [ 519.007932] kernfs_fop_write_iter+0x136/0x1f0
> [ 519.007935] vfs_write+0x205/0x460
> [ 519.007937] ksys_write+0x57/0xd0
> [ 519.007938] do_syscall_64+0x106/0x5f0
> [ 519.007940] ? irqentry_exit+0x6c/0x520
> [ 519.007941] ? exc_page_fault+0x66/0x180
> [ 519.007942] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 519.007944] RIP: 0033:0x7fb94ba3473e
> [ 519.007946] Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9>
> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
> [ 519.007946] RSP: 002b:00007fff47c8ddd0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [ 519.007948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb94ba3473e
> [ 519.007948] RDX: 000000000000000f RSI: 00007fb94bc21a3e RDI: 0000000000000004
> [ 519.007949] RBP: 00007fff47c8dde0 R08: 0000000000000000 R09: 0000000000000000
> [ 519.007949] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff47c8e3f8
> [ 519.007950] R13: 0000000000000006 R14: 00007fb94bc67000 R15: 0000000000413d88
> [ 519.007951] </TASK>
> [ 519.007951] Modules linked in: cxl_test(O) cxl_acpi(O) device_dax(O) fsdev_dax kmem nd_pmem(O) nd_btt(O) cxl_pmu dax_cxl dax_pmem(O) cxl_pci nd_e820
> (O) nfit(O) cxl_mock_mem(O) cxl_pmem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) libnvdimm(O) nfit_test_iomap(O) cxl_core(O) fwctl [last unloaded: cxl_acpi(O
> )]
> [ 519.007962] ---[ end trace 0000000000000000 ]---
> [ 519.007963] RIP: 0010:sprintf+0xc/0x50
> [ 519.007964] Code: 24 10 e8 37 f8 ff ff c9 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 48 48 8d 45 10 <48>
> 89 54 24 28 48 89 f2 be ff ff ff 7f 48 89 4c 24 30 48 89 e1 48
> [ 519.007965] RSP: 0018:ffffc90001767fd0 EFLAGS: 00010282
> [ 519.007966] RAX: ffffc90001768028 RBX: ffffc90001768068 RCX: 0000000000001e08
> [ 519.007966] RDX: 0000000000000207 RSI: ffffffff82abab1c RDI: ffffc90001768068
> [ 519.007967] RBP: ffffc90001768018 R08: 0000000000000000 R09: 0000000000000001
> [ 519.007967] R10: ffffc90001768110 R11: 0000000000000002 R12: 0000000000000800
> [ 519.007967] R13: ffffc90001768068 R14: 0000000000000000 R15: ffffffff839c71c0
> [ 519.007968] FS: 00007fb94b807c80(0000) GS:ffff8880f9e9c000(0000) knlGS:0000000000000000
> [ 519.007969] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 519.007969] CR2: ffffc90001767fc8 CR3: 0000000077d2e005 CR4: 0000000000770ef0
> [ 519.007971] PKRU: 55555554
> [ 519.007972] Kernel panic - not syncing: Fatal exception in interrupt
> [ 519.008404] Kernel Offset: disabled
> [ 519.083400] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Dang. Obviously runaway recursion; I don't recognize anything in
the stack, but will start trying to reproduce it.
John
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: Joel Fernandes @ 2026-04-13 22:27 UTC (permalink / raw)
To: Danilo Krummrich
Cc: John Hubbard, Eliot Courtney, linux-kernel, Miguel Ojeda,
Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Dave Airlie,
Daniel Almeida, Koen Koning, dri-devel, rust-for-linux,
Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, Alexey Ivanov,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <acd38f51-3acc-4dbf-9929-50187dccec82@nvidia.com>
On 4/13/2026 4:10 PM, Joel Fernandes wrote:
> Hi Danilo,
>
> On 4/9/2026 7:00 AM, Danilo Krummrich wrote:
>> On Thu Apr 9, 2026 at 12:33 PM CEST, Joel Fernandes wrote:
>>> Since it is 3 against 1 here, I rest my case :-).
>>
>> That's not how I'd view it. :)
>>
>> Anyways, in case I'm included in "3", that's not my position. My point was to
>> ensure we keep discussing advantages and disadvantages on their merits, as I
>> think you both have good points.
>
> Heh, yes I actually *did not* include you in the 3 since you sounded to be open
> to both. ;-)
>
>>
>>> I am still in disagreement since I do not see much benefit (that is why I said
>>> pointless above).
>>
>> That is fair -- in this case please explain why the advantages pointed out by
>> others are not worth it, propose something that picks up the best of both
>> worlds, etc.
>>
>> You can also turn it around and ask people whether they can tweak their counter
>> proposal to get rid of specific parts you dislike for a reason.
>>
>> IOW, keep the ball rolling, so we can come up with the best possible solution.
>
> Good advice, thanks! I will try to come up with something that is acceptable to
> everyone and we can further debate pros/cons on v11.
>
> There are some merits on the alternative proposal from Eliot/Alex that I'd like
> to explore while seeing if I can keep some of the merits in mine as well.
I think I found a nice approach. IMO the MMU version dispatch does not belong in
Vmm/BarUser layers. Those are version-independent code. However I agree that
doing version dispatch at every low-level page table operation is a bit heavy on
matches (if we put the MMIO overhead counter-argument aside).
So how about the following approach?
PtWalk, PtMap and everything below it are monomorphized. Vmm and BarUser are
not. Version dispatch is handled on PtWalk and PtMap entry points.
I think it makes it cleaner and splits the code up better too and the
organizations makes sense because the version differences are related to page
tables, not to generic concepts like Vmm and Bar.
Thoughts? Here is a preview:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=pt-traits-v2&id=ff22ba64f729f9f73258777231763a7b9804123b
thanks,
--
Joel Fernandes
^ permalink raw reply
* Re: [PATCH v3 09/11] dt-bindings: input: Document hid-over-spi DT schema
From: Rob Herring @ 2026-04-13 22:34 UTC (permalink / raw)
To: Conor Dooley, Dmitry Torokhov, Jingyuan Liang
Cc: Jiri Kosina, Benjamin Tissoires, Jonathan Corbet, Mark Brown,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Krzysztof Kozlowski, Conor Dooley, linux-input, linux-doc,
linux-kernel, linux-spi, linux-trace-kernel, devicetree, hbarnor,
tfiga, Dmitry Antipov, Jarrett Schultz
In-Reply-To: <20260410-sake-dollop-9f253ddb0749@spud>
On Fri, Apr 10, 2026 at 06:35:00PM +0100, Conor Dooley wrote:
> On Thu, Apr 09, 2026 at 10:16:46AM -0700, Dmitry Torokhov wrote:
> > On Thu, Apr 09, 2026 at 05:02:11PM +0100, Conor Dooley wrote:
> > > On Thu, Apr 02, 2026 at 01:59:46AM +0000, Jingyuan Liang wrote:
> > > > Documentation describes the required and optional properties for
> > > > implementing Device Tree for a Microsoft G6 Touch Digitizer that
> > > > supports HID over SPI Protocol 1.0 specification.
> > > >
> > > > The properties are common to HID over SPI.
> > > >
> > > > Signed-off-by: Dmitry Antipov <dmanti@microsoft.com>
> > > > Signed-off-by: Jarrett Schultz <jaschultz@microsoft.com>
> > > > Signed-off-by: Jingyuan Liang <jingyliang@chromium.org>
> > > > ---
> > > > .../devicetree/bindings/input/hid-over-spi.yaml | 126 +++++++++++++++++++++
> > > > 1 file changed, 126 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/input/hid-over-spi.yaml b/Documentation/devicetree/bindings/input/hid-over-spi.yaml
> > > > new file mode 100644
> > > > index 000000000000..d1b0a2e26c32
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/input/hid-over-spi.yaml
> > > > @@ -0,0 +1,126 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/input/hid-over-spi.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: HID over SPI Devices
> > > > +
> > > > +maintainers:
> > > > + - Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > > > + - Jiri Kosina <jkosina@suse.cz>
> > >
> > > Why them and not you, the developers of the series?
> > >
> > > > +
> > > > +description: |+
> > > > + HID over SPI provides support for various Human Interface Devices over the
> > > > + SPI bus. These devices can be for example touchpads, keyboards, touch screens
> > > > + or sensors.
> > > > +
> > > > + The specification has been written by Microsoft and is currently available
> > > > + here: https://www.microsoft.com/en-us/download/details.aspx?id=103325
> > > > +
> > > > + If this binding is used, the kernel module spi-hid will handle the
> > > > + communication with the device and the generic hid core layer will handle the
> > > > + protocol.
> > >
> > > This is not relevant to the binding, please remove it.
> > >
> > > > +
> > > > +allOf:
> > > > + - $ref: /schemas/input/touchscreen/touchscreen.yaml#
> > > > +
> > > > +properties:
> > > > + compatible:
> > > > + oneOf:
> > > > + - items:
> > > > + - enum:
> > > > + - microsoft,g6-touch-digitizer
> > > > + - const: hid-over-spi
> > > > + - description: Just "hid-over-spi" alone is allowed, but not recommended.
> > > > + const: hid-over-spi
> > >
> > > Why is it allowed but not recommended? Seems to me like we should
> > > require device-specific compatibles.
> >
> > Why would we want to change the driver code to add a new compatible each
> > time a vendor decides to create a chip that is fully hid-spi-protocol
> > compliant? Or is the plan to still allow "hid-over-spi" fallback but
> > require device-specific compatible that will be ignored unless there is
> > device-specific quirk needed?
The plan is the latter case (the 1st entry up above). The comment is
remove the 2nd entry (with 'Just "hid-over-spi" alone is allowed, but
not recommended.').
> This has nothing to do with the driver, just the oddity of having a
> comment saying that not having a device specific compatible was
> permitted by not recommended in a binding. Requiring device-specific
> compatibles is the norm after all and a comment like this makes draws
> more attention to the fact that this is abnormal. Regardless of what the
> driver does, device-specific compatibles should be required.
>
> > > > +
> > > > + reg:
> > > > + maxItems: 1
> > > > +
> > > > + interrupts:
> > > > + maxItems: 1
> > > > +
> > > > + reset-gpios:
> > > > + maxItems: 1
> > > > + description:
> > > > + GPIO specifier for the digitizer's reset pin (active low). The line must
> > > > + be flagged with GPIO_ACTIVE_LOW.
> > > > +
> > > > + vdd-supply:
> > > > + description:
> > > > + Regulator for the VDD supply voltage.
> > > > +
> > > > + input-report-header-address:
> > > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > > + minimum: 0
> > > > + maximum: 0xffffff
> > > > + description:
> > > > + A value to be included in the Read Approval packet, listing an address of
> > > > + the input report header to be put on the SPI bus. This address has 24
> > > > + bits.
> > > > +
> > > > + input-report-body-address:
> > > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > > + minimum: 0
> > > > + maximum: 0xffffff
> > > > + description:
> > > > + A value to be included in the Read Approval packet, listing an address of
> > > > + the input report body to be put on the SPI bus. This address has 24 bits.
> > > > +
> > > > + output-report-address:
> > > > + $ref: /schemas/types.yaml#/definitions/uint32
> > > > + minimum: 0
> > > > + maximum: 0xffffff
> > > > + description:
> > > > + A value to be included in the Output Report sent by the host, listing an
> > > > + address where the output report on the SPI bus is to be written to. This
> > > > + address has 24 bits.
> > > > +
> > > > + read-opcode:
> > > > + $ref: /schemas/types.yaml#/definitions/uint8
> > > > + description:
> > > > + Value to be used in Read Approval packets. 1 byte.
> > > > +
> > > > + write-opcode:
> > > > + $ref: /schemas/types.yaml#/definitions/uint8
> > > > + description:
> > > > + Value to be used in Write Approval packets. 1 byte.
> > >
> > > Why can none of these things be determined from the device's compatible?
> > > On the surface, they like the kinds of things that could/should be.
> >
> > Why would we want to keep tables of these values in the kernel and again
> > have to update the driver for each new chip?
>
> That's pretty normal though innit? It's what match data does.
> If someone wants to have properties that communicate data that
> can be determined from the compatible, they need to provide
> justification why it is being done.
IIRC, it was explained in prior versions the spec itself says these
values vary by device. If we expect variation, then I think these
properties are fine. But please capture the reasoning for them in this
patch or we will just keep asking the same questions over and over.
Rob
^ permalink raw reply
* Re: [PATCH V10 0/8] dax: prepare for famfs
From: Alison Schofield @ 2026-04-13 22:41 UTC (permalink / raw)
To: John Groves
Cc: John Groves, Miklos Szeredi, Dan Williams, Bernd Schubert,
John Groves, Jonathan Corbet, Shuah Khan, Vishal Verma,
Dave Jiang, Matthew Wilcox, Jan Kara, Alexander Viro,
David Hildenbrand, Christian Brauner, Darrick J . Wong,
Randy Dunlap, Jeff Layton, Amir Goldstein, Jonathan Cameron,
Stefan Hajnoczi, Joanne Koong, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org
In-Reply-To: <acrpbBt5UsWEiEbm@aschofie-mobl2.lan>
On Mon, Mar 30, 2026 at 02:21:48PM -0700, Alison Schofield wrote:
> On Fri, Mar 27, 2026 at 09:03:26PM +0000, John Groves wrote:
> > From: John Groves <john@groves.net>
> >
> > This patch series along with the bundled patches to fuse are available
> > as a git tag at [0].
> >
> > Dropped the "bundle" thread. If this submission goes smoothly, I'll update
> > the fuse patches to v10 (very little change there as yet).
> >
> > Changes v9 -> v10
> > - Minor modernizations per comments from (mostly) Jonathan
> > - Minor Kconfig simplification
> > - bus.c:dax_match_type(): don't make fsdev_dax eligible for automatic binding
> > where devdax would otherwise bind
> > - dax-private.h: add missing kerneldoc comment for field cached_size in
> > struct dev_dax_range (thanks Dave)
> > - fsdev_write_dax(): s/pmem_addr/addr/ (thanks Dave)
> > - include/linux/dax.h: remove a spuriously-added declaration of inode_dax()
> > (thanks Jonathan)
> >
> > Description:
> >
> > This patch series introduces the required dax support for famfs.
> > Previous versions of the famfs series included both dax and fuse patches.
> > This series separates them into separate patch series' (and the fuse
> > series dependends on this dax series).
> >
> > The famfs user space code can be found at [1]
> >
> > Dax Overview:
> >
> > This series introduces a new "famfs mode" of devdax, whose driver is
> > drivers/dax/fsdev.c. This driver supports dax_iomap_rw() and
> > dax_iomap_fault() calls against a character dax instance. A dax device
> > now can be converted among three modes: 'system-ram', 'devdax' and
> > 'famfs' via daxctl or sysfs (e.g. unbind devdax and bind famfs instead).
> >
> > In famfs mode, a dax device initializes its pages consistent with the
> > fsdaxmode of pmem. Raw read/write/mmap are not supported in this mode,
> > but famfs is happy in this mode - using dax_iomap_rw() for read/write and
> > dax_iomap_fault() for mmap faults.
> >
>
> Here's what I found:
>
> famfs-v10 on 7.0-rc5 + ndctl v84:
> dax suite all pass 13/13, so no regression appears
>
> famfs-v10 on 7.0-rc5 +
> (ndctl v84 w https://github.com/jagalactic/ndctl/tree/famfs
> top 3 patches + edit daxctl-famfs.sh to use cxl-test:
>
> existing dax suite keeps passing
> daxctl-famfs.sh oops w the new test at # Restore original mode"
> seems easy to repoduce, maybe cannot go back to system-ram???
My stack trace differed from Ira's. I hit:
[ 88.991865] probe of dax0.0 returned 0 after 2371506 usecs
[ 88.996717] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x0 pfn:0x3ff028000
[ 88.997592] BUG: unable to handle page fault for address: ffffc9000f4c8033
[ 88.998256] #PF: supervisor read access in kernel mode
[ 88.998728] #PF: error_code(0x0000) - not-present page
[ 88.999254] PGD 80a067 P4D 80a067 PUD 193e067 PMD 79baf067 PTE 0
[ 88.999799] Oops: Oops: 0000 [#1] SMP NOPTI
[ 89.000253] CPU: 5 UID: 0 PID: 1476 Comm: daxctl Tainted: G O 7.0.0-rc5+ #182 PREEMPT(full)
[ 89.001092] Tainted: [O]=OOT_MODULE
[ 89.001630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 89.002345] RIP: 0010:is_free_buddy_page+0x39/0x60
[ 89.002816] Code: 00 00 00 48 c1 fe 06 eb 0a 48 83 c1 01 48 83 f9 0b 74 30 44 89 c0 48 89 fa d3 e0 83 e8 01 48 98 48 21 f0 48 c1 e0 06 48 29 c2 <80> 7a 33 f0 75 d9 48 8b 42 28 48 39 c8 72 d0 b8 01 00 00 00 e9 ce
[ 89.004504] RSP: 0018:ffffc9000f4cf828 EFLAGS: 00010286
[ 89.005039] RAX: 0000000000007a80 RBX: ffffc9000f4cf8a0 RCX: 0000000000000009
[ 89.005674] RDX: ffffc9000f4c8000 RSI: ffffff7c003d33ea RDI: ffffc9000f4cfa80
[ 89.006350] RBP: ffffc9000f4cf838 R08: 0000000000000001 R09: 00000000ffefffff
[ 89.007000] R10: ffffc9000f4cfa38 R11: ffff888376ffe000 R12: ffffc9000f4cfa80
[ 89.007673] R13: ffffc9000f4cf9a0 R14: 0000000000000006 R15: 0000000000000001
[ 89.008395] FS: 00007f3fbca2e7c0(0000) GS:ffff8881fa75f000(0000) knlGS:0000000000000000
[ 89.009156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 89.009715] CR2: ffffc9000f4c8033 CR3: 000000012f638003 CR4: 0000000000370ef0
[ 89.010447] Call Trace:
[ 89.010767] <TASK>
[ 89.011083] ? set_ps_flags.constprop.0+0x3c/0x70
[ 89.011559] snapshot_page+0x2ca/0x330
[ 89.011974] __dump_page+0x2e/0x380
[ 89.012362] ? up+0x5a/0x90
[ 89.012704] dump_page+0x16/0x50
[ 89.013108] ? dump_page+0x16/0x50
[ 89.013489] __get_pfnblock_flags_mask+0x6f/0xd0
[ 89.013958] get_pfnblock_migratetype+0xe/0x30
[ 89.014412] __dump_page+0x15b/0x380
[ 89.014816] dump_page+0x16/0x50
[ 89.015210] ? dump_page+0x16/0x50
[ 89.015587] __set_pfnblock_flags_mask.constprop.0+0x6f/0xf0
[ 89.016195] init_pageblock_migratetype+0x39/0x60
[ 89.016692] memmap_init_range+0x165/0x290
[ 89.017205] move_pfn_range_to_zone+0xed/0x200
[ 89.017688] mhp_init_memmap_on_memory+0x23/0xb0
[ 89.018223] memory_subsys_online+0x127/0x1a0
[ 89.018693] device_online+0x4d/0x90
[ 89.019149] state_store+0x96/0xa0
[ 89.019552] dev_attr_store+0x12/0x30
[ 89.019975] sysfs_kf_write+0x48/0x70
[ 89.020381] kernfs_fop_write_iter+0x160/0x210
[ 89.020876] vfs_write+0x261/0x500
[ 89.021311] ksys_write+0x5c/0xf0
[ 89.021701] __x64_sys_write+0x14/0x20
[ 89.022180] x64_sys_call+0x1cb7/0x2010
[ 89.022640] do_syscall_64+0xb1/0x560
[ 89.023096] entry_SYSCALL_64_after_hwframe+0x71/0x79
[ 89.023615] RIP: 0033:0x7f3fbc901c37
[ 89.024050] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 89.025768] RSP: 002b:00007ffdbdf63c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 89.026517] RAX: ffffffffffffffda RBX: 00007ffdbdf64228 RCX: 00007f3fbc901c37
[ 89.027280] RDX: 000000000000000f RSI: 00007f3fbcb554de RDI: 0000000000000004
[ 89.027934] RBP: 00007ffdbdf63ca0 R08: 0000000000000000 R09: 0000000000000073
[ 89.028610] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 89.029337] R13: 00007ffdbdf64260 R14: 0000000000414da0 R15: 00007f3fbcb9b000
[ 89.030051] </TASK>
[ 89.030364] Modules linked in: cxl_test(O) cxl_acpi(O) cxl_pmem(O) device_dax(O) fsdev_dax kmem dax_pmem(O) nd_pmem(O) dax_cxl nd_btt(O) nd_e820(O) nfit(O) cxl_mock_mem(O) cxl_mem(O) cxl_port(O) cxl_mock(O) libnvdimm(O) nfit_test_iomap(O) cxl_core(O) fwctl [last unloaded: cxl_pmem(O)]
[ 89.032575] CR2: ffffc9000f4c8033
[ 89.032960] ---[ end trace 0000000000000000 ]---
[ 89.033460] RIP: 0010:is_free_buddy_page+0x39/0x60
[ 89.033948] Code: 00 00 00 48 c1 fe 06 eb 0a 48 83 c1 01 48 83 f9 0b 74 30 44 89 c0 48 89 fa d3 e0 83 e8 01 48 98 48 21 f0 48 c1 e0 06 48 29 c2 <80> 7a 33 f0 75 d9 48 8b 42 28 48 39 c8 72 d0 b8 01 00 00 00 e9 ce
[ 89.035645] RSP: 0018:ffffc9000f4cf828 EFLAGS: 00010286
[ 89.036235] RAX: 0000000000007a80 RBX: ffffc9000f4cf8a0 RCX: 0000000000000009
[ 89.036910] RDX: ffffc9000f4c8000 RSI: ffffff7c003d33ea RDI: ffffc9000f4cfa80
[ 89.037588] RBP: ffffc9000f4cf838 R08: 0000000000000001 R09: 00000000ffefffff
[ 89.038310] R10: ffffc9000f4cfa38 R11: ffff888376ffe000 R12: ffffc9000f4cfa80
[ 89.039008] R13: ffffc9000f4cf9a0 R14: 0000000000000006 R15: 0000000000000001
[ 89.039710] FS: 00007f3fbca2e7c0(0000) GS:ffff8881fa75f000(0000) knlGS:0000000000000000
[ 89.040506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 89.041129] CR2: ffffc9000f4c8033 CR3: 000000012f638003 CR4: 0000000000370ef0
[ 89.041836] note: daxctl[1476] exited with irqs disabled
>
> Let me know if you need more info.
>
> -- Alison
>
>
^ permalink raw reply
* Re: [PATCH v10 12/21] gpu: nova-core: mm: Add unified page table entry wrapper enums
From: John Hubbard @ 2026-04-13 22:50 UTC (permalink / raw)
To: Joel Fernandes, Danilo Krummrich
Cc: Eliot Courtney, linux-kernel, Miguel Ojeda, Boqun Feng, Gary Guo,
Bjorn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Dave Airlie, Daniel Almeida, Koen Koning, dri-devel,
rust-for-linux, Nikola Djukic, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Jonathan Corbet,
Alex Deucher, Christian Koenig, Jani Nikula, Joonas Lahtinen,
Vivi Rodrigo, Tvrtko Ursulin, Rui Huang, Matthew Auld,
Matthew Brost, Lucas De Marchi, Thomas Hellstrom, Helge Deller,
Alex Gaynor, Boqun Feng, Alistair Popple, Timur Tabi, Edwin Peer,
Alexandre Courbot, Andrea Righi, Andy Ritger, Zhi Wang,
Balbir Singh, Philipp Stanner, Elle Rhumsaa, Alexey Ivanov,
linux-doc, amd-gfx, intel-gfx, intel-xe, linux-fbdev
In-Reply-To: <56526b93-72a3-4b07-9aa7-7822bd561cd5@nvidia.com>
On 4/13/26 3:27 PM, Joel Fernandes wrote:
>
>
> On 4/13/2026 4:10 PM, Joel Fernandes wrote:
>> Hi Danilo,
>>
>> On 4/9/2026 7:00 AM, Danilo Krummrich wrote:
>>> On Thu Apr 9, 2026 at 12:33 PM CEST, Joel Fernandes wrote:
>>>> Since it is 3 against 1 here, I rest my case :-).
>>>
>>> That's not how I'd view it. :)
>>>
>>> Anyways, in case I'm included in "3", that's not my position. My point was to
>>> ensure we keep discussing advantages and disadvantages on their merits, as I
>>> think you both have good points.
>>
>> Heh, yes I actually *did not* include you in the 3 since you sounded to be open
>> to both. ;-)
>>
>>>
>>>> I am still in disagreement since I do not see much benefit (that is why I said
>>>> pointless above).
>>>
>>> That is fair -- in this case please explain why the advantages pointed out by
>>> others are not worth it, propose something that picks up the best of both
>>> worlds, etc.
>>>
>>> You can also turn it around and ask people whether they can tweak their counter
>>> proposal to get rid of specific parts you dislike for a reason.
>>>
>>> IOW, keep the ball rolling, so we can come up with the best possible solution.
>>
>> Good advice, thanks! I will try to come up with something that is acceptable to
>> everyone and we can further debate pros/cons on v11.
>>
>> There are some merits on the alternative proposal from Eliot/Alex that I'd like
>> to explore while seeing if I can keep some of the merits in mine as well.
> I think I found a nice approach. IMO the MMU version dispatch does not belong in
> Vmm/BarUser layers. Those are version-independent code. However I agree that
> doing version dispatch at every low-level page table operation is a bit heavy on
> matches (if we put the MMIO overhead counter-argument aside).
>
> So how about the following approach?
>
> PtWalk, PtMap and everything below it are monomorphized. Vmm and BarUser are
> not. Version dispatch is handled on PtWalk and PtMap entry points.
Conceptually it sounds pretty good.
>
> I think it makes it cleaner and splits the code up better too and the
> organizations makes sense because the version differences are related to page
> tables, not to generic concepts like Vmm and Bar.
>
> Thoughts? Here is a preview:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=pt-traits-v2&id=ff22ba64f729f9f73258777231763a7b9804123b
>
Probably impractical to review that here, so let's do the review in
a v11 posting, I think.
thanks,
--
John Hubbard
^ permalink raw reply
* Re: maintainer profiles
From: Randy Dunlap @ 2026-04-13 23:08 UTC (permalink / raw)
To: Dan Williams, Jonathan Corbet, Linux Documentation,
Linux Kernel Mailing List
Cc: Linux Kernel Workflows
In-Reply-To: <69dd6299440be_147c801005b@djbw-dev.notmuch>
Hi,
On 4/13/26 2:39 PM, Dan Williams wrote:
> Jonathan Corbet wrote:
>> Randy Dunlap <rdunlap@infradead.org> writes:
>>
>>> Hi,
>>>
>>> Is there supposed to be a difference (or distinction) in the contents of
>>>
>>> Documentation/process/maintainer-handbooks.rst
>>> and
>>> Documentation/maintainer/maintainer-entry-profile.rst
>>> ?
>>>
>>> Can they be combined into one location?
>>
>> Late to the party, sorry ... the original idea, I believe, was that
>> maintainer-handbooks.rst would be for developers looking for a guidebook
>> for a specific subsystem, while maintainer-entry-profile.rst was about
>> how maintainers themselves should write their subsystem guide.
>> Doubtless things have drifted since then... But the intended audiences
>> were different, so it might be good to think about bringing them back
>> into focus.
>
> Right, I think something (roughly / hand-wavy) like the below is the
> intent. However, as I write that I notice that the combined list is a
> bit of a mess. I also notice that there are more "P:" entries in
> MAINTAINERS than there are entries in this maintainer-handbooks.rst
> list.
>
> So this probably wants to be a script that can build Documentation links
> from MAINTAINERS, or otherwise provide a script for developers to query
> a kernel tree for additional submission guides. It is probably not as
> important for the built docs to link all guides as it is for developers
> (or their agents) to live query a tree they are developing against.
>
> Note the problem goes both ways, there are P: entries not in the
> combined handbook list, like the Security subsystem, and there are
> handbook entries without a P:, like the Tip tree.
I had not (and have not) checked on the P: entries.
However, this patch is close to where I already was, but it (and my
patch) causes some problems. (I dropped the duplicate
maintainer-soc-clean-dts entry.)
E.g., maintainer-handbooks uses :numbered:, but the Media and XFS
entries are already numbered, so Sphinx complains about that.
I think that numbering isn't needed, so I tried dropping that,
but the Media and XFS entries are still numbered, so it looks
messy, but that may be OK (better) than 2 mixed lists.
I'm not finding a satisfactory answer here (yet).
diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst> index 6020d188e13d..58e2af333692 100644
> --- a/Documentation/maintainer/maintainer-entry-profile.rst
> +++ b/Documentation/maintainer/maintainer-entry-profile.rst
> @@ -92,24 +92,8 @@ full series, or privately send a reminder email. This section might also
> list how review works for this code area and methods to get feedback
> that are not directly from the maintainer.
>
> -Existing profiles
> ------------------
> -
> -For now, existing maintainer profiles are listed here; we will likely want
> -to do something different in the near future.
> -
> -.. toctree::
> - :maxdepth: 1
> -
> - ../doc-guide/maintainer-profile
> - ../nvdimm/maintainer-entry-profile
> - ../arch/riscv/patch-acceptance
> - ../process/maintainer-soc
> - ../process/maintainer-soc-clean-dts
> - ../driver-api/media/maintainer-entry-profile
> - ../process/maintainer-netdev
> - ../driver-api/vfio-pci-device-specific-driver-acceptance
> - ../nvme/feature-and-quirk-policy
> - ../filesystems/nfs/nfsd-maintainer-entry-profile
> - ../filesystems/xfs/xfs-maintainer-entry-profile
> - ../mm/damon/maintainer-profile
> +Maintainer Handbooks
> +--------------------
> +
> +For examples of other subsystem handbooks see
> +Documentation/process/maintainer-handbooks.rst.
> diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
> index 976391cec528..bc9299a04b1f 100644
> --- a/Documentation/process/maintainer-handbooks.rst
> +++ b/Documentation/process/maintainer-handbooks.rst
> @@ -9,14 +9,33 @@ The purpose of this document is to provide subsystem specific information
> which is supplementary to the general development process handbook
> :ref:`Documentation/process <development_process_main>`.
>
> +For developers, see below for all the known subsystem specific guides.
> +If the subsystem you are contributing to does not have a guide listed
> +here, it is fair to seek clarification of questions raised in
> +Documentation/maintainer/maintainer-entry-profile.rst.
> +
> +For maintainers, consider documenting additional requirements and
> +expectations if submissions routinely overlook specific submission
> +criteria. See Documentation/maintainer/maintainer-entry-profile.rst.
> +
> Contents:
>
> .. toctree::
> :numbered:
> :maxdepth: 2
>
> + maintainer-kvm-x86
> maintainer-netdev
> maintainer-soc
> maintainer-soc-clean-dts
> + maintainer-soc-clean-dts
> maintainer-tip
> - maintainer-kvm-x86
> + ../arch/riscv/patch-acceptance
> + ../doc-guide/maintainer-profile
> + ../driver-api/media/maintainer-entry-profile
> + ../driver-api/vfio-pci-device-specific-driver-acceptance
> + ../filesystems/nfs/nfsd-maintainer-entry-profile
> + ../filesystems/xfs/xfs-maintainer-entry-profile
> + ../mm/damon/maintainer-profile
> + ../nvdimm/maintainer-entry-profile
> + ../nvme/feature-and-quirk-policy
>
>
--
~Randy
^ permalink raw reply
* Re: [PATCH] docs: xforms_lists: allow __maybe_unused in func parameters
From: Bart Van Assche @ 2026-04-13 23:36 UTC (permalink / raw)
To: Randy Dunlap, linux-kernel
Cc: kernel test robot, Jonathan Corbet, Shuah Khan, linux-doc,
Mauro Carvalho Chehab
In-Reply-To: <20260411233526.3909303-1-rdunlap@infradead.org>
On 4/11/26 4:35 PM, Randy Dunlap wrote:
> --- linext-2026-0410.orig/tools/lib/python/kdoc/xforms_lists.py
> +++ linext-2026-0410/tools/lib/python/kdoc/xforms_lists.py
> @@ -93,6 +93,7 @@ class CTransforms:
> (CMatch("__weak"), ""),
> (CMatch("__sched"), ""),
> (CMatch("__always_unused"), ""),
> + (CMatch("__maybe_unused"), ""),
> (CMatch("__printf"), ""),
> (CMatch("__(?:re)?alloc_size"), ""),
> (CMatch("__diagnose_as"), ""),
Thanks!
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply
* Re: [PATCH net-next v05 1/6] hinic3: Add ethtool queue ops
From: Jakub Kicinski @ 2026-04-14 0:18 UTC (permalink / raw)
To: Fan Gong
Cc: Zhu Yikai, netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Andrew Lunn, Ioana Ciornei, Mohsin Bashir,
linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <157d5cc6e757ffa77eee01dfdc3f2159dc97905f.1775711066.git.zhuyikai1@h-partners.com>
On Sat, 11 Apr 2026 11:36:59 +0800 Fan Gong wrote:
> Implement following ethtool callback function:
> .get_ringparam
> .set_ringparam
>
> These callbacks allow users to utilize ethtool for detailed
> queue depth configuration and monitoring.
> +static int hinic3_check_ringparam_valid(struct net_device *netdev,
> + const struct ethtool_ringparam *ring)
> +{
> + if (ring->rx_jumbo_pending || ring->rx_mini_pending) {
Can driver actually be called with non-zero values if max is not set?
> + netdev_err(netdev, "Unsupported rx_jumbo_pending/rx_mini_pending\n");
> + return -EINVAL;
> + }
> + if (ring->tx_pending > HINIC3_MAX_TX_QUEUE_DEPTH ||
> + ring->tx_pending < HINIC3_MIN_QUEUE_DEPTH ||
> + ring->rx_pending > HINIC3_MAX_RX_QUEUE_DEPTH ||
> + ring->rx_pending < HINIC3_MIN_QUEUE_DEPTH) {
similar question - do you need to check the upper bound?
kernel should check the input against max returned by .get
> + netdev_err(netdev,
please use extack for errors
> + "Queue depth out of range tx[%d-%d] rx[%d-%d]\n",
> + HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_TX_QUEUE_DEPTH,
> + HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_RX_QUEUE_DEPTH);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int hinic3_set_ringparam(struct net_device *netdev,
> + struct ethtool_ringparam *ring,
> + struct kernel_ethtool_ringparam *kernel_ring,
> + struct netlink_ext_ack *extack)
> +{
> + struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> + struct hinic3_dyna_txrxq_params q_params = {};
> + u32 new_sq_depth, new_rq_depth;
> + int err;
> +
> + err = hinic3_check_ringparam_valid(netdev, ring);
> + if (err)
> + return err;
> +
> + new_sq_depth = 1U << ilog2(ring->tx_pending);
> + new_rq_depth = 1U << ilog2(ring->rx_pending);
> + if (new_sq_depth == nic_dev->q_params.sq_depth &&
> + new_rq_depth == nic_dev->q_params.rq_depth)
> + return 0;
> +
> + if (new_sq_depth != ring->tx_pending)
> + netdev_info(netdev, "Requested Tx depth trimmed to %d\n",
> + new_sq_depth);
please use extack for warnings
> + if (new_rq_depth != ring->rx_pending)
> + netdev_info(netdev, "Requested Rx depth trimmed to %d\n",
> + new_rq_depth);
> +
> + netdev_info(netdev, "Change Tx/Rx ring depth from %u/%u to %u/%u\n",
> + nic_dev->q_params.sq_depth, nic_dev->q_params.rq_depth,
> + new_sq_depth, new_rq_depth);
> +
> + if (!netif_running(netdev)) {
> + hinic3_update_qp_depth(netdev, new_sq_depth, new_rq_depth);
> + } else {
> + q_params = nic_dev->q_params;
> + q_params.sq_depth = new_sq_depth;
> + q_params.rq_depth = new_rq_depth;
> +
> + err = hinic3_change_channel_settings(netdev, &q_params);
> + if (err) {
> + netdev_err(netdev, "Failed to change channel settings\n");
> + return err;
> + }
> + }
> +
> + return 0;
> +}
> +
> static const struct ethtool_ops hinic3_ethtool_ops = {
> .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
> ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
> @@ -417,6 +516,8 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
> .get_msglevel = hinic3_get_msglevel,
> .set_msglevel = hinic3_set_msglevel,
> .get_link = ethtool_op_get_link,
> + .get_ringparam = hinic3_get_ringparam,
> + .set_ringparam = hinic3_set_ringparam,
> };
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> index 0a888fe4c975..3b470978714a 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
> @@ -179,6 +179,8 @@ static int hinic3_sw_init(struct net_device *netdev)
> int err;
>
> mutex_init(&nic_dev->port_state_mutex);
> + mutex_init(&nic_dev->channel_cfg_lock);
Why do you need this mutex?
Aren't all the places you take it under rtnl_lock anyway?
> + spin_lock_init(&nic_dev->channel_res_lock);
>
> nic_dev->q_params.sq_depth = HINIC3_SQ_DEPTH;
> nic_dev->q_params.rq_depth = HINIC3_RQ_DEPTH;
> @@ -314,6 +316,15 @@ static void hinic3_link_status_change(struct net_device *netdev,
> bool link_status_up)
> {
> struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> + unsigned long flags;
> + bool valid;
> +
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + valid = HINIC3_CHANNEL_RES_VALID(nic_dev);
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + if (!valid)
Why are you checking valid here? What if the state changes immediately
after unlocking?
> + return;
>
> if (link_status_up) {
> if (netif_carrier_ok(netdev))
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> index da73811641a9..d652a5ffdc2c 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
> @@ -428,6 +428,85 @@ static void hinic3_vport_down(struct net_device *netdev)
> }
> }
>
> +int
> +hinic3_change_channel_settings(struct net_device *netdev,
> + struct hinic3_dyna_txrxq_params *trxq_params)
> +{
> + struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> + struct hinic3_dyna_txrxq_params old_qp_params = {};
> + struct hinic3_dyna_qp_params new_qp_params = {};
> + struct hinic3_dyna_qp_params cur_qp_params = {};
> + bool need_teardown = false;
> + unsigned long flags;
> + int err;
> +
> + mutex_lock(&nic_dev->channel_cfg_lock);
> +
> + hinic3_config_num_qps(netdev, trxq_params);
> +
> + err = hinic3_alloc_channel_resources(netdev, &new_qp_params,
> + trxq_params);
> + if (err) {
> + netdev_err(netdev, "Failed to alloc channel resources\n");
> + mutex_unlock(&nic_dev->channel_cfg_lock);
> + return err;
> + }
> +
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
> + need_teardown = true;
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + if (need_teardown) {
> + hinic3_vport_down(netdev);
> + hinic3_close_channel(netdev);
> + hinic3_uninit_qps(nic_dev, &cur_qp_params);
> + hinic3_free_channel_resources(netdev, &cur_qp_params,
> + &nic_dev->q_params);
> + }
> +
> + if (nic_dev->num_qp_irq > trxq_params->num_qps)
> + hinic3_qp_irq_change(netdev, trxq_params->num_qps);
> +
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + old_qp_params = nic_dev->q_params;
> + nic_dev->q_params = *trxq_params;
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + hinic3_init_qps(nic_dev, &new_qp_params);
> +
> + err = hinic3_open_channel(netdev);
This "open" function allocates Rx buffers, and fails if it couldn't get
even one. That's no good.
> + if (err)
> + goto err_uninit_qps;
> +
> + err = hinic3_vport_up(netdev);
> + if (err)
> + goto err_close_channel;
> +
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + mutex_unlock(&nic_dev->channel_cfg_lock);
> +
> + return 0;
> +
> +err_close_channel:
> + hinic3_close_channel(netdev);
> +err_uninit_qps:
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + nic_dev->q_params = old_qp_params;
> + clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + hinic3_uninit_qps(nic_dev, &new_qp_params);
> + hinic3_free_channel_resources(netdev, &new_qp_params, trxq_params);
> +
> + mutex_unlock(&nic_dev->channel_cfg_lock);
AI says:
Can this error path lead to memory corruption?
If need_teardown was true, the old channel resources were freed earlier in
the function. If hinic3_open_channel() or hinic3_vport_up() fails, the code
jumps to err_uninit_qps and restores nic_dev->q_params = old_qp_params.
However, it doesn't appear to re-allocate those old resources or mark the
interface as down. Could a subsequent administrative teardown or network
traffic dereference these freed pointers?
> +
> static int hinic3_open(struct net_device *netdev)
> {
> struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> @@ -487,16 +566,33 @@ static int hinic3_close(struct net_device *netdev)
> {
> struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
> struct hinic3_dyna_qp_params qp_params;
> + bool need_teardown = false;
> + unsigned long flags;
>
> if (!test_and_clear_bit(HINIC3_INTF_UP, &nic_dev->flags)) {
> netdev_dbg(netdev, "Netdev already close, do nothing\n");
> return 0;
> }
>
> - hinic3_vport_down(netdev);
> - hinic3_close_channel(netdev);
> - hinic3_uninit_qps(nic_dev, &qp_params);
> - hinic3_free_channel_resources(netdev, &qp_params, &nic_dev->q_params);
> + mutex_lock(&nic_dev->channel_cfg_lock);
> +
> + spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
> + if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
> + need_teardown = true;
> + spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
> +
> + if (need_teardown) {
> + hinic3_vport_down(netdev);
> + hinic3_close_channel(netdev);
> + hinic3_uninit_qps(nic_dev, &qp_params);
> + hinic3_free_channel_resources(netdev, &qp_params,
> + &nic_dev->q_params);
> + }
> +
> + hinic3_free_nicio_res(nic_dev);
> + hinic3_destroy_num_qps(netdev);
> +
> + mutex_unlock(&nic_dev->channel_cfg_lock);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> index 9502293ff710..55b280888ad8 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
> @@ -10,6 +10,9 @@
> #include "hinic3_hw_cfg.h"
> #include "hinic3_hwdev.h"
> #include "hinic3_mgmt_interface.h"
> +#include "hinic3_nic_io.h"
> +#include "hinic3_tx.h"
> +#include "hinic3_rx.h"
>
> #define HINIC3_VLAN_BITMAP_BYTE_SIZE(nic_dev) (sizeof(*(nic_dev)->vlan_bitmap))
> #define HINIC3_VLAN_BITMAP_SIZE(nic_dev) \
> @@ -20,8 +23,13 @@ enum hinic3_flags {
> HINIC3_MAC_FILTER_CHANGED,
> HINIC3_RSS_ENABLE,
> HINIC3_UPDATE_MAC_FILTER,
> + HINIC3_CHANGE_RES_INVALID,
> };
>
> +#define HINIC3_CHANNEL_RES_VALID(nic_dev) \
> + (test_bit(HINIC3_INTF_UP, &(nic_dev)->flags) && \
> + !test_bit(HINIC3_CHANGE_RES_INVALID, &(nic_dev)->flags))
I don't get why you need to check both of these bits.
Can't there be one bit for "resources valid" ?
And it will only be set while device is up (of course) so no need to
also check UP (this way checking can be atomic without the spin lock).
> enum hinic3_event_work_flags {
> HINIC3_EVENT_WORK_TX_TIMEOUT,
> };
> @@ -129,6 +137,10 @@ struct hinic3_nic_dev {
> struct work_struct rx_mode_work;
> /* lock for enable/disable port */
> struct mutex port_state_mutex;
> + /* lock for channel configuration */
> + struct mutex channel_cfg_lock;
> + /* lock for channel resources */
> + spinlock_t channel_res_lock;
>
> struct list_head uc_filter_list;
> struct list_head mc_filter_list;
> @@ -143,6 +155,10 @@ struct hinic3_nic_dev {
>
> void hinic3_set_netdev_ops(struct net_device *netdev);
> int hinic3_set_hw_features(struct net_device *netdev);
> +int
> +hinic3_change_channel_settings(struct net_device *netdev,
> + struct hinic3_dyna_txrxq_params *trxq_params);
> +
> int hinic3_qps_irq_init(struct net_device *netdev);
> void hinic3_qps_irq_uninit(struct net_device *netdev);
>
> diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> index 12eefabcf1db..3791b9bc865b 100644
> --- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> +++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
> @@ -14,6 +14,10 @@ struct hinic3_nic_dev;
> #define HINIC3_RQ_WQEBB_SHIFT 3
> #define HINIC3_SQ_WQEBB_SIZE BIT(HINIC3_SQ_WQEBB_SHIFT)
>
> +#define HINIC3_MAX_TX_QUEUE_DEPTH 65536
> +#define HINIC3_MAX_RX_QUEUE_DEPTH 16384
> +#define HINIC3_MIN_QUEUE_DEPTH 128
> +
> /* ******************** RQ_CTRL ******************** */
> enum hinic3_rq_wqe_type {
> HINIC3_NORMAL_RQ_WQE = 1,
^ permalink raw reply
* Re: [PATCH v5 00/21] Virtual Swap Space
From: YoungJun Park @ 2026-04-14 2:50 UTC (permalink / raw)
To: Nhat Pham
Cc: kasong, Liam.Howlett, akpm, apopple, axelrasmussen, baohua,
baolin.wang, bhe, byungchul, cgroups, chengming.zhou, chrisl,
corbet, david, dev.jain, gourry, hannes, hughd, jannh,
joshua.hahnjy, lance.yang, lenb, linux-doc, linux-kernel,
linux-mm, linux-pm, lorenzo.stoakes, matthew.brost, mhocko,
muchun.song, npache, pavel, peterx, peterz, pfalcato, rafael,
rakie.kim, roman.gushchin, rppt, ryan.roberts, shakeel.butt,
shikemeng, surenb, tglx, vbabka, weixugc, ying.huang, yosry.ahmed,
yuanchu, zhengqi.arch, ziy, kernel-team, riel
In-Reply-To: <CAKEwX=NnHxpQKp9qBg2=r_euyjgxw2nHXjbgof3MymHTgJmRAQ@mail.gmail.com>
On Sat, Apr 11, 2026 at 06:40:44PM -0700, Nhat Pham wrote:
Hello Nhat!
> > 1. Modularization
> >
> > You removed CONFIG_* and went with a unified approach. I recall
> > you were also considering a module-based structure at some point.
> > What are your thoughts on that direction?
> >
>
> The CONFIG-based approach was a huge mess. It makes me not want to
> look at the code, and I'm the author :)
>
> > If we take that approach, we could extend the recent swap ops
> > patchset (https://lore.kernel.org/linux-mm/20260302104016.163542-1-bhe@redhat.com/)
> > as follows:
> > - Make vswap a swap module
> > - Have cluster allocation functions reside in swapops
> > - Enable vswap through swapon
>
> Hmmmmm.
I think this would be a happy world, but I wonder what others think.
Anyway, I'm looking forward to the future direction.
> > 2. Flash-friendly swap integration (for my use case)
> >
> > I've been thinking about the flash-friendly swap concept that
> > I mentioned before and recently proposed:
> > (https://lore.kernel.org/linux-mm/aZW0voL4MmnMQlaR@yjaykim-PowerEdge-T330/)
> >
> > One of its core functions requires buffering RAM-swapped pages
> > and writing them sequentially at an appropriate time -- not
> > immediately, but in proper block-sized units, sequentially.
> >
> > This means allocated offsets must essentially be virtual, and
> > physical offsets need to be managed separately at the actual
> > write time.
> >
> > If we integrate this into the current vswap, we would either
> > need vswap itself to handle the sequential writes (bypassing
> > the physical device and receiving pages directly), or swapon
> > a swap device and have vswap obtain physical offsets from it.
> > But since those offsets cannot be used directly (due to
> > buffering and sequential write requirements), they become
> > virtual too, resulting in:
> >
> > virtual -> virtual -> physical
> >
> > This triple indirection is not ideal.
> >
> > However, if the modularization from point 1 is achieved and
> > vswap acts as a swap device itself, then we can cleanly
> > establish a:
> >
> > virtual -> physical
>
> I read that thread sometimes ago. Some remarks:
>
> 1. I think Christoph has a point. Seems like some of your ideas ( are
> broadly applicable to swap in general. Maybe fixing swap infra
> generally would make a lot of sense?
Broadly speaking, there are two main ideas:
1. Swap I/O buffering (which is also tied to cluster management issues)
2. Deduplication
Are you leaning towards the view that these two should be placed in a
higher layer?
> 2. Why do we need to do two virtual layers here? For example, If you
> want to buffer multiple swap outs and turn them into a sequential
> request, you can:
>
> a. Allocate virtual swap space for them as you wish. They don't even
> need to be sequential.
>
> b. At swap_writeout() time, don't allocate physical swap space for
> them right away. Instead, accumulate them into a buffer. You can add a
> new virtual swap entry type to flag it if necessary.
>
> c. Once that buffer reaches a certain size, you can now allocate
> contiguous physical swap space for them. Then flush etc. You can flush
> at swap_writeout() time, or use a dedicated threads etc.
I initially thought implementing this in vswap would be complicated
(due to the ripple effects of altering behavior at swap_writeout timing),
but it seems entirely possible!
1. We could change the behavior (e.g., buffering) at vswap_alloc_swap_slot
timing by checking things like the si type.
2. Additionally, if we can handle the cluster data structures and
mechanisms in the swap_info_struct privately, a virtual-to-physical
one-direction approach seems feasible.
(Come to think of it, it might be better to refactor the infra to let
other modules handle this, potentially removing the swap_info_struct
mechanism entirely. Just imagination ;) )
> Deduplication sounds like something that should live at a lower layer
> - I was thinking about it for zswap/zsmalloc back then. I mean, I
> assume you don't want content sharing across different swap media? :)
> Something along the line of:
>
> 1. Maintain an content index for swapped out pages.
>
> 2. For the swap media that support deduplication, you'll need to add
> some sort of reference count (more overhead ew).
>
> 3. Each time we swapped out, we can content-check to see if the same
> piece of conent has been swapped out before. If so, set the vswap
> backend to the physical location of the data, increment some sort of
> reference count (perhaps we can use swap count) of the older entry,
> and have the swap type point to it.
As for reference count management, applying it loosely might be a good
approach. Instead of strictly managing the lifecycle of the dedup contents
with refcounts, we could just periodically clean up the hash. This also
has the benefit of reducing I/O for the same swap content compared to
deleting it immediately.
> But have you considered the implications of sharing swap data like
> this? I need to read the paper you cite - seems like a potential fun
> read. But what happen when these two pages that share the content
> belong to two different cgroups? How does the
> charging/uncharging/charge transferring story work? That's one of the
> things that made me pause when I wanted to implement deduplication for
> zswap/zsmalloc. Zram does not charge memory towards cgroup, but zswap
> does, so we'll need to handle this somehow, and at that point all the
> complexity might no longer be worth it.
Since our private swap device is similar to ZRAM, I hadn't considered
the charging aspect. It is indeed a complex issue.
If it goes into ZSWAP, there would definitely be a clear advantage of
seeing dedup benefits across all swap devices. It's a technically
interesting area, and I'd like to discuss it in a separate thread if
I have more ideas or thoughts.
Just a thought that comes to mind here: if vswap becomes modularized,
how about doing memcg charging for this entire area?
(Come to think of it, to fully benefit from vswap modularization,
zswap should also be applied within its scope.)
Best regards,
Youngjun Park
^ permalink raw reply
* Re: [PATCH v5 00/21] Virtual Swap Space
From: YoungJun Park @ 2026-04-14 3:09 UTC (permalink / raw)
To: Nhat Pham
Cc: Kairui Song, Liam.Howlett, akpm, apopple, axelrasmussen, baohua,
baolin.wang, bhe, byungchul, cgroups, chengming.zhou, chrisl,
corbet, david, dev.jain, gourry, hannes, hughd, jannh,
joshua.hahnjy, lance.yang, lenb, linux-doc, linux-kernel,
linux-mm, linux-pm, lorenzo.stoakes, matthew.brost, mhocko,
muchun.song, npache, pavel, peterx, peterz, pfalcato, rafael,
rakie.kim, roman.gushchin, rppt, ryan.roberts, shakeel.butt,
shikemeng, surenb, tglx, vbabka, weixugc, ying.huang, yosry.ahmed,
yuanchu, zhengqi.arch, ziy, kernel-team, riel
In-Reply-To: <CAKEwX=Pt04pYfhYOwmtXJKU5OqcxBC14SAf1wpBxBo1D7rPpGw@mail.gmail.com>
On Sat, Apr 11, 2026 at 06:03:04PM -0700, Nhat Pham wrote:
> On Wed, Mar 25, 2026 at 11:53 AM YoungJun Park <youngjun.park@lge.com> wrote:
> >
> > On Mon, Mar 23, 2026 at 11:32:57AM -0400, Nhat Pham wrote:
> >
> > > Interesting. Normally "lots of zero-filled page" is a very beneficial
> > > case for vswap. You don't need a swapfile, or any zram/zswap metadata
> > > overhead - it's a native swap backend. If production workload has this
> > > many zero-filled pages, I think the numbers of vswap would be much
> > > less alarming - perhaps even matching memory overhead because you
> > > don't need to maintain a zram entry metadata (it's at least 2 words
> > > per zram entry right?), while there's no reverse map overhead induced
> > > (so it's 24 bytes on both side), and no need to do zram-side locking
> > > :)
> > >
> > > So I was surprised to see that it's not working out very well here. I
> > > checked the implementation of memhog - let me know if this is wrong
> > > place to look:
> > >
> > > https://man7.org/linux/man-pages/man8/memhog.8.html
> > > https://github.com/numactl/numactl/blob/master/memhog.c#L52
> > >
> > > I think this is what happened here: memhog was populating the memory
> > > 0xff, which triggers the full overhead of a swapfile-backed swap entry
> > > because even though it's "same-filled" it's not zero-filled! I was
> > > following Usama's observation - "less than 1% of the same-filled pages
> > > were non-zero" - and so I only handled the zero-filled case here:
> > >
> > > https://lore.kernel.org/all/20240530102126.357438-1-usamaarif642@gmail.com/
> > >
> > > This sounds a bit artificial IMHO - as Usama pointed out above, I
> > > think most samefilled pages are zero pages, in real production
> > > workloads. However, if you think there are real use cases with a lot
> > > of non-zero samefilled pages, please let me know I can fix this real
> > > quick. We can support this in vswap with zero extra metadata overhead
> > > - change the VSWAP_ZERO swap entry type to VSWAP_SAME_FILLED, then use
> > > the backend field to store that value. I can send you a patch if
> > > you're interested.
> >
> > This brings back memories -- I'm pretty sure we talked about
> > exactly this at LPC. Our custom swap device already handles both
> > zero-filled and same-filled pages on its own, so what we really
> > wanted was a way to tell the swap layer "just skip the detection
> > and let it through."
> >
> > I looked at two approaches back then but never submitted either:
> >
> > - A per-swap_info flag to opt out of zero/same-filled handling.
> > But this felt wrong from vswap's perspective -- if even one
> > device opts out of the zeromap, the model gets messy.
> >
> > - Revisiting Usama's patch 2 approach.
> > Sounded good in theory, but as you said,
> > it's not as simple to verify in practice. And it is more clean design
> > swapout time zero check as I see. So, I gave up on it.
> >
> > Seeing this come up again is actually kind of nice :)
> >
> > One thought -- maybe a compile-time CONFIG or a boot param to
> > control the scope? e.g. zero-only, same-filled, or disabled.
> > That way vendors like us just turn it off, and setups like
> > Kairui's can opt into broader detection. Just an idea though --
> > open to other approaches if you have something in mind.
>
> Yeah for vswap it's probably going to be a CONFIG or boot param.
>
> But in the status quo, we can always add a swapfile flag. That one
> should work already, right?
I'm a bit hesitant about the swapfile flag approach. If vswap gets merged,
handling devices with this flag set might complicate the vswap design.
Moreover, exposing a new swap flag to the user interface (e.g., at swapon)
raises concerns about backward compatibility. Do you think that would be safe?
Since our use case isn't very common, we just need a simple knob to tune it.
That's why I still prefer a boot param or CONFIG approach.
Thanks :D
Youngjun Park
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox