* Re: [PATCH v13 11/27] drm/i915/hdmi: Add YCBCR444 handling for sink formats
From: Ville Syrjälä @ 2026-04-13 19:04 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc
In-Reply-To: <ad08zqpKbyF--Br3@intel.com>
On Mon, Apr 13, 2026 at 09:58:22PM +0300, Ville Syrjälä wrote:
> On Mon, Apr 13, 2026 at 12:07:25PM +0200, Nicolas Frattaroli wrote:
> > In anticipation of userspace being able to explicitly select supported
> > sink formats, add handling of the YCBCR444 sink format. The AUTO path
> > does not choose this format, but with explicit format selection added to
> > the driver, it becomes a possibility.
> >
> > Check for YCBCR444 support on the sink in both sink_bpc_possible, and
> > sink_format_valid.
> >
> > Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> > ---
> > drivers/gpu/drm/i915/display/intel_hdmi.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c b/drivers/gpu/drm/i915/display/intel_hdmi.c
> > index 874076a29da4..5ab5b5f85cde 100644
> > --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> > +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> > @@ -1966,6 +1966,8 @@ static bool intel_hdmi_sink_bpc_possible(struct drm_connector *_connector,
> >
> > if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR420)
> > return hdmi->y420_dc_modes & DRM_EDID_YCBCR420_DC_36;
> > + else if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR444)
> > + return info->edid_hdmi_ycbcr444_dc_modes & DRM_EDID_HDMI_DC_36;
> > else
> > return info->edid_hdmi_rgb444_dc_modes & DRM_EDID_HDMI_DC_36;
> > case 10:
> > @@ -1974,6 +1976,8 @@ static bool intel_hdmi_sink_bpc_possible(struct drm_connector *_connector,
> >
> > if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR420)
> > return hdmi->y420_dc_modes & DRM_EDID_YCBCR420_DC_30;
> > + else if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR444)
> > + return info->edid_hdmi_ycbcr444_dc_modes & DRM_EDID_HDMI_DC_30;
> > else
> > return info->edid_hdmi_rgb444_dc_modes & DRM_EDID_HDMI_DC_30;
> > case 8:
> > @@ -2038,6 +2042,11 @@ intel_hdmi_sink_format_valid(struct intel_connector *connector,
> >
> > return MODE_OK;
> > case INTEL_OUTPUT_FORMAT_RGB:
> > + return MODE_OK;
> > + case INTEL_OUTPUT_FORMAT_YCBCR444:
>
> We'll also want the !has_hdmi_sink check here like for 4:2:0.
>
> And I think we also want something to mirror the ycbcr_420_allowed
> flag. I guess you could just make it something like:
>
> intel_hdmi_ycbcr_444_allowed(display)
> {
> return DISPLAY_VER(display) >= 5 && !HAS_GMCH(display);
Actually the display version check is redundant there.
!HAS_GMCH alone is sufficient.
> }
>
> That can also be reused when setting up the allowed property values.
>
> > + if (!(info->color_formats & BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444)))
> > + return MODE_BAD;
> > +
> > return MODE_OK;
> > default:
> > MISSING_CASE(sink_format);
> >
> > --
> > 2.53.0
>
> --
> Ville Syrjälä
> Intel
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: maintainer profiles
From: Jonathan Corbet @ 2026-04-13 19:03 UTC (permalink / raw)
To: Randy Dunlap, Linux Documentation, Linux Kernel Mailing List
Cc: Linux Kernel Workflows
In-Reply-To: <b7775383-da94-4098-8af9-2f672c4f1a71@infradead.org>
Randy Dunlap <rdunlap@infradead.org> writes:
> Hi,
>
> Is there supposed to be a difference (or distinction) in the contents of
>
> Documentation/process/maintainer-handbooks.rst
> and
> Documentation/maintainer/maintainer-entry-profile.rst
> ?
>
> Can they be combined into one location?
Late to the party, sorry ... the original idea, I believe, was that
maintainer-handbooks.rst would be for developers looking for a guidebook
for a specific subsystem, while maintainer-entry-profile.rst was about
how maintainers themselves should write their subsystem guide.
Doubtless things have drifted since then... But the intended audiences
were different, so it might be good to think about bringing them back
into focus.
jon
^ permalink raw reply
* Re: [PATCH v13 11/27] drm/i915/hdmi: Add YCBCR444 handling for sink formats
From: Ville Syrjälä @ 2026-04-13 18:58 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc
In-Reply-To: <20260413-color-format-v13-11-ab37d4dfba48@collabora.com>
On Mon, Apr 13, 2026 at 12:07:25PM +0200, Nicolas Frattaroli wrote:
> In anticipation of userspace being able to explicitly select supported
> sink formats, add handling of the YCBCR444 sink format. The AUTO path
> does not choose this format, but with explicit format selection added to
> the driver, it becomes a possibility.
>
> Check for YCBCR444 support on the sink in both sink_bpc_possible, and
> sink_format_valid.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/gpu/drm/i915/display/intel_hdmi.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c b/drivers/gpu/drm/i915/display/intel_hdmi.c
> index 874076a29da4..5ab5b5f85cde 100644
> --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> @@ -1966,6 +1966,8 @@ static bool intel_hdmi_sink_bpc_possible(struct drm_connector *_connector,
>
> if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR420)
> return hdmi->y420_dc_modes & DRM_EDID_YCBCR420_DC_36;
> + else if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR444)
> + return info->edid_hdmi_ycbcr444_dc_modes & DRM_EDID_HDMI_DC_36;
> else
> return info->edid_hdmi_rgb444_dc_modes & DRM_EDID_HDMI_DC_36;
> case 10:
> @@ -1974,6 +1976,8 @@ static bool intel_hdmi_sink_bpc_possible(struct drm_connector *_connector,
>
> if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR420)
> return hdmi->y420_dc_modes & DRM_EDID_YCBCR420_DC_30;
> + else if (sink_format == INTEL_OUTPUT_FORMAT_YCBCR444)
> + return info->edid_hdmi_ycbcr444_dc_modes & DRM_EDID_HDMI_DC_30;
> else
> return info->edid_hdmi_rgb444_dc_modes & DRM_EDID_HDMI_DC_30;
> case 8:
> @@ -2038,6 +2042,11 @@ intel_hdmi_sink_format_valid(struct intel_connector *connector,
>
> return MODE_OK;
> case INTEL_OUTPUT_FORMAT_RGB:
> + return MODE_OK;
> + case INTEL_OUTPUT_FORMAT_YCBCR444:
We'll also want the !has_hdmi_sink check here like for 4:2:0.
And I think we also want something to mirror the ycbcr_420_allowed
flag. I guess you could just make it something like:
intel_hdmi_ycbcr_444_allowed(display)
{
return DISPLAY_VER(display) >= 5 && !HAS_GMCH(display);
}
That can also be reused when setting up the allowed property values.
> + if (!(info->color_formats & BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444)))
> + return MODE_BAD;
> +
> return MODE_OK;
> default:
> MISSING_CASE(sink_format);
>
> --
> 2.53.0
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: [PATCH v13 00/27] Add new general DRM property "color format"
From: Ville Syrjälä @ 2026-04-13 18:45 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc, Werner Sembach, Andri Yngvason,
Cristian Ciocaltea, Marius Vlad, Dmitry Baryshkov, Andy Yan
In-Reply-To: <20260413-color-format-v13-0-ab37d4dfba48@collabora.com>
On Mon, Apr 13, 2026 at 12:07:14PM +0200, Nicolas Frattaroli wrote:
> Hello,
>
> this is a follow-up to
> https://lore.kernel.org/all/20250911130739.4936-1-marius.vlad@collabora.com/
> which in of itself is a follow-up to
> https://lore.kernel.org/dri-devel/20240115160554.720247-1-andri@yngvason.is/ where
> a new DRM connector property has been added allowing users to
> force a particular color format.
Looks like we're still missing the wayland folks in the cc. But I was
told that everyone should just cc wayland-devel@lists.freedesktop.org
on all relevant uapi stuff. So please add that on the next version.
The i915 rework is now merged so you should even get a buildable
series next time.
I'll go read the i915 parts now...
--
Ville Syrjälä
Intel
^ permalink raw reply
* Re: [PATCH 3/6] hugetlb: make hugetlb_fault_mutex_hash() take PAGE_SIZE index
From: Oscar Salvador @ 2026-04-13 17:43 UTC (permalink / raw)
To: Jane Chu
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-4-jane.chu@oracle.com>
On Thu, Apr 09, 2026 at 05:41:54PM -0600, Jane Chu wrote:
> hugetlb_fault_mutex_hash() is used to serialize faults and page cache
> operations on the same hugetlb file offset. The helper currently expects
> its index argument in hugetlb page granularity, so callers have to
> open-code conversions from the PAGE_SIZE-based indices commonly used
> in the rest of MM helpers.
>
> Change hugetlb_fault_mutex_hash() to take a PAGE_SIZE-based index
> instead, and perform the hugetlb-granularity conversion inside the helper.
> Update all callers accordingly.
>
> This makes the helper interface consistent with filemap_get_folio(),
> and linear_page_index(), while preserving the same lock selection for
> a given hugetlb file offset.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
> fs/hugetlbfs/inode.c | 19 ++++++++++---------
> mm/hugetlb.c | 28 +++++++++++++++++++---------
> mm/memfd.c | 11 ++++++-----
> mm/userfaultfd.c | 7 +++----
> 4 files changed, 38 insertions(+), 27 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index cf79fb830377..e24e9bf54e14 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -575,7 +575,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> struct address_space *mapping = &inode->i_data;
> const pgoff_t end = lend >> PAGE_SHIFT;
> struct folio_batch fbatch;
> - pgoff_t next, index;
> + pgoff_t next, idx;
> int i, freed = 0;
> bool truncate_op = (lend == LLONG_MAX);
>
> @@ -586,15 +586,15 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> struct folio *folio = fbatch.folios[i];
> u32 hash = 0;
>
> - index = folio->index >> huge_page_order(h);
> - hash = hugetlb_fault_mutex_hash(mapping, index);
> + hash = hugetlb_fault_mutex_hash(mapping, folio->index);
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
>
> /*
> * Remove folio that was part of folio_batch.
> */
> + idx = folio->index >> huge_page_order(h);
> remove_inode_single_folio(h, inode, mapping, folio,
> - index, truncate_op);
> + idx, truncate_op);
Since this is the only place we call remove_inode_single_folio(), and that we do not
the index (at least index >> huge_page_order()) directly in this function, would it not be
better to make remove_inode_single_folio do the conversion itself?
Also, I am thinking out loud here but we do have a few places where we
go: idx = index >> huge_page_order() to convert it into hugepage units, but the casual
reader might be a bit puzzled about that.
So, would it be worth to have implement an inline helper with an accurate name
to do that? It might help whoever reads that?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* [PATCH 5.10 110/491] time: add kernel-doc in time.c
From: Greg Kroah-Hartman @ 2026-04-13 15:55 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Randy Dunlap, John Stultz,
Thomas Gleixner, Stephen Boyd, Jonathan Corbet, linux-doc,
Sasha Levin
In-Reply-To: <20260413155819.042779211@linuxfoundation.org>
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap <rdunlap@infradead.org>
[ Upstream commit 67b3f564cb1e769ef8e45835129a4866152fcfdb ]
Add kernel-doc for all APIs that do not already have it.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Acked-by: John Stultz <jstultz@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230704052405.5089-3-rdunlap@infradead.org
Stable-dep-of: 755a648e78f1 ("time/jiffies: Mark jiffies_64_to_clock_t() notrace")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/time/time.c | 169 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 158 insertions(+), 11 deletions(-)
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 483f8a3e24d0c..6f81aead1856d 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -365,11 +365,14 @@ SYSCALL_DEFINE1(adjtimex_time32, struct old_timex32 __user *, utp)
}
#endif
-/*
- * Convert jiffies to milliseconds and back.
+/**
+ * jiffies_to_msecs - Convert jiffies to milliseconds
+ * @j: jiffies value
*
* Avoid unnecessary multiplications/divisions in the
- * two most common HZ cases:
+ * two most common HZ cases.
+ *
+ * Return: milliseconds value
*/
unsigned int jiffies_to_msecs(const unsigned long j)
{
@@ -388,6 +391,12 @@ unsigned int jiffies_to_msecs(const unsigned long j)
}
EXPORT_SYMBOL(jiffies_to_msecs);
+/**
+ * jiffies_to_usecs - Convert jiffies to microseconds
+ * @j: jiffies value
+ *
+ * Return: microseconds value
+ */
unsigned int jiffies_to_usecs(const unsigned long j)
{
/*
@@ -408,8 +417,15 @@ unsigned int jiffies_to_usecs(const unsigned long j)
}
EXPORT_SYMBOL(jiffies_to_usecs);
-/*
+/**
* mktime64 - Converts date to seconds.
+ * @year0: year to convert
+ * @mon0: month to convert
+ * @day: day to convert
+ * @hour: hour to convert
+ * @min: minute to convert
+ * @sec: second to convert
+ *
* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
* Assumes input in normal date format, i.e. 1980-12-31 23:59:59
* => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
@@ -427,6 +443,8 @@ EXPORT_SYMBOL(jiffies_to_usecs);
*
* An encoding of midnight at the end of the day as 24:00:00 - ie. midnight
* tomorrow - (allowable under ISO 8601) is supported.
+ *
+ * Return: seconds since the epoch time for the given input date
*/
time64_t mktime64(const unsigned int year0, const unsigned int mon0,
const unsigned int day, const unsigned int hour,
@@ -471,8 +489,7 @@ EXPORT_SYMBOL(ns_to_kernel_old_timeval);
* Set seconds and nanoseconds field of a timespec variable and
* normalize to the timespec storage format
*
- * Note: The tv_nsec part is always in the range of
- * 0 <= tv_nsec < NSEC_PER_SEC
+ * Note: The tv_nsec part is always in the range of 0 <= tv_nsec < NSEC_PER_SEC.
* For negative values only the tv_sec field is negative !
*/
void set_normalized_timespec64(struct timespec64 *ts, time64_t sec, s64 nsec)
@@ -501,7 +518,7 @@ EXPORT_SYMBOL(set_normalized_timespec64);
* ns_to_timespec64 - Convert nanoseconds to timespec64
* @nsec: the nanoseconds value to be converted
*
- * Returns the timespec64 representation of the nsec parameter.
+ * Return: the timespec64 representation of the nsec parameter.
*/
struct timespec64 ns_to_timespec64(const s64 nsec)
{
@@ -548,6 +565,8 @@ EXPORT_SYMBOL(ns_to_timespec64);
* runtime.
* the _msecs_to_jiffies helpers are the HZ dependent conversion
* routines found in include/linux/jiffies.h
+ *
+ * Return: jiffies value
*/
unsigned long __msecs_to_jiffies(const unsigned int m)
{
@@ -560,6 +579,12 @@ unsigned long __msecs_to_jiffies(const unsigned int m)
}
EXPORT_SYMBOL(__msecs_to_jiffies);
+/**
+ * __usecs_to_jiffies: - convert microseconds to jiffies
+ * @u: time in milliseconds
+ *
+ * Return: jiffies value
+ */
unsigned long __usecs_to_jiffies(const unsigned int u)
{
if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
@@ -568,7 +593,10 @@ unsigned long __usecs_to_jiffies(const unsigned int u)
}
EXPORT_SYMBOL(__usecs_to_jiffies);
-/*
+/**
+ * timespec64_to_jiffies - convert a timespec64 value to jiffies
+ * @value: pointer to &struct timespec64
+ *
* The TICK_NSEC - 1 rounds up the value to the next resolution. Note
* that a remainder subtract here would not do the right thing as the
* resolution values don't fall on second boundries. I.e. the line:
@@ -582,8 +610,9 @@ EXPORT_SYMBOL(__usecs_to_jiffies);
*
* The >> (NSEC_JIFFIE_SC - SEC_JIFFIE_SC) converts the scaled nsec
* value to a scaled second value.
+ *
+ * Return: jiffies value
*/
-
unsigned long
timespec64_to_jiffies(const struct timespec64 *value)
{
@@ -601,6 +630,11 @@ timespec64_to_jiffies(const struct timespec64 *value)
}
EXPORT_SYMBOL(timespec64_to_jiffies);
+/**
+ * jiffies_to_timespec64 - convert jiffies value to &struct timespec64
+ * @jiffies: jiffies value
+ * @value: pointer to &struct timespec64
+ */
void
jiffies_to_timespec64(const unsigned long jiffies, struct timespec64 *value)
{
@@ -618,6 +652,13 @@ EXPORT_SYMBOL(jiffies_to_timespec64);
/*
* Convert jiffies/jiffies_64 to clock_t and back.
*/
+
+/**
+ * jiffies_to_clock_t - Convert jiffies to clock_t
+ * @x: jiffies value
+ *
+ * Return: jiffies converted to clock_t (CLOCKS_PER_SEC)
+ */
clock_t jiffies_to_clock_t(unsigned long x)
{
#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
@@ -632,6 +673,12 @@ clock_t jiffies_to_clock_t(unsigned long x)
}
EXPORT_SYMBOL(jiffies_to_clock_t);
+/**
+ * clock_t_to_jiffies - Convert clock_t to jiffies
+ * @x: clock_t value
+ *
+ * Return: clock_t value converted to jiffies
+ */
unsigned long clock_t_to_jiffies(unsigned long x)
{
#if (HZ % USER_HZ)==0
@@ -649,6 +696,12 @@ unsigned long clock_t_to_jiffies(unsigned long x)
}
EXPORT_SYMBOL(clock_t_to_jiffies);
+/**
+ * jiffies_64_to_clock_t - Convert jiffies_64 to clock_t
+ * @x: jiffies_64 value
+ *
+ * Return: jiffies_64 value converted to 64-bit "clock_t" (CLOCKS_PER_SEC)
+ */
u64 jiffies_64_to_clock_t(u64 x)
{
#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
@@ -671,6 +724,12 @@ u64 jiffies_64_to_clock_t(u64 x)
}
EXPORT_SYMBOL(jiffies_64_to_clock_t);
+/**
+ * nsec_to_clock_t - Convert nsec value to clock_t
+ * @x: nsec value
+ *
+ * Return: nsec value converted to 64-bit "clock_t" (CLOCKS_PER_SEC)
+ */
u64 nsec_to_clock_t(u64 x)
{
#if (NSEC_PER_SEC % USER_HZ) == 0
@@ -687,6 +746,12 @@ u64 nsec_to_clock_t(u64 x)
#endif
}
+/**
+ * jiffies64_to_nsecs - Convert jiffies64 to nanoseconds
+ * @j: jiffies64 value
+ *
+ * Return: nanoseconds value
+ */
u64 jiffies64_to_nsecs(u64 j)
{
#if !(NSEC_PER_SEC % HZ)
@@ -697,6 +762,12 @@ u64 jiffies64_to_nsecs(u64 j)
}
EXPORT_SYMBOL(jiffies64_to_nsecs);
+/**
+ * jiffies64_to_msecs - Convert jiffies64 to milliseconds
+ * @j: jiffies64 value
+ *
+ * Return: milliseconds value
+ */
u64 jiffies64_to_msecs(const u64 j)
{
#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
@@ -719,6 +790,8 @@ EXPORT_SYMBOL(jiffies64_to_msecs);
* note:
* NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
* ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ *
+ * Return: nsecs converted to jiffies64 value
*/
u64 nsecs_to_jiffies64(u64 n)
{
@@ -750,6 +823,8 @@ EXPORT_SYMBOL(nsecs_to_jiffies64);
* note:
* NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
* ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ *
+ * Return: nsecs converted to jiffies value
*/
unsigned long nsecs_to_jiffies(u64 n)
{
@@ -757,10 +832,16 @@ unsigned long nsecs_to_jiffies(u64 n)
}
EXPORT_SYMBOL_GPL(nsecs_to_jiffies);
-/*
- * Add two timespec64 values and do a safety check for overflow.
+/**
+ * timespec64_add_safe - Add two timespec64 values and do a safety check
+ * for overflow.
+ * @lhs: first (left) timespec64 to add
+ * @rhs: second (right) timespec64 to add
+ *
* It's assumed that both values are valid (>= 0).
* And, each timespec64 is in normalized form.
+ *
+ * Return: sum of @lhs + @rhs
*/
struct timespec64 timespec64_add_safe(const struct timespec64 lhs,
const struct timespec64 rhs)
@@ -778,6 +859,15 @@ struct timespec64 timespec64_add_safe(const struct timespec64 lhs,
return res;
}
+/**
+ * get_timespec64 - get user's time value into kernel space
+ * @ts: destination &struct timespec64
+ * @uts: user's time value as &struct __kernel_timespec
+ *
+ * Handles compat or 32-bit modes.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_timespec64(struct timespec64 *ts,
const struct __kernel_timespec __user *uts)
{
@@ -801,6 +891,14 @@ int get_timespec64(struct timespec64 *ts,
}
EXPORT_SYMBOL_GPL(get_timespec64);
+/**
+ * put_timespec64 - convert timespec64 value to __kernel_timespec format and
+ * copy the latter to userspace
+ * @ts: input &struct timespec64
+ * @uts: user's &struct __kernel_timespec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_timespec64(const struct timespec64 *ts,
struct __kernel_timespec __user *uts)
{
@@ -839,6 +937,15 @@ static int __put_old_timespec32(const struct timespec64 *ts64,
return copy_to_user(cts, &ts, sizeof(ts)) ? -EFAULT : 0;
}
+/**
+ * get_old_timespec32 - get user's old-format time value into kernel space
+ * @ts: destination &struct timespec64
+ * @uts: user's old-format time value (&struct old_timespec32)
+ *
+ * Handles X86_X32_ABI compatibility conversion.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_old_timespec32(struct timespec64 *ts, const void __user *uts)
{
if (COMPAT_USE_64BIT_TIME)
@@ -848,6 +955,16 @@ int get_old_timespec32(struct timespec64 *ts, const void __user *uts)
}
EXPORT_SYMBOL_GPL(get_old_timespec32);
+/**
+ * put_old_timespec32 - convert timespec64 value to &struct old_timespec32 and
+ * copy the latter to userspace
+ * @ts: input &struct timespec64
+ * @uts: user's &struct old_timespec32
+ *
+ * Handles X86_X32_ABI compatibility conversion.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_old_timespec32(const struct timespec64 *ts, void __user *uts)
{
if (COMPAT_USE_64BIT_TIME)
@@ -857,6 +974,13 @@ int put_old_timespec32(const struct timespec64 *ts, void __user *uts)
}
EXPORT_SYMBOL_GPL(put_old_timespec32);
+/**
+ * get_itimerspec64 - get user's &struct __kernel_itimerspec into kernel space
+ * @it: destination &struct itimerspec64
+ * @uit: user's &struct __kernel_itimerspec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_itimerspec64(struct itimerspec64 *it,
const struct __kernel_itimerspec __user *uit)
{
@@ -872,6 +996,14 @@ int get_itimerspec64(struct itimerspec64 *it,
}
EXPORT_SYMBOL_GPL(get_itimerspec64);
+/**
+ * put_itimerspec64 - convert &struct itimerspec64 to __kernel_itimerspec format
+ * and copy the latter to userspace
+ * @it: input &struct itimerspec64
+ * @uit: user's &struct __kernel_itimerspec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_itimerspec64(const struct itimerspec64 *it,
struct __kernel_itimerspec __user *uit)
{
@@ -887,6 +1019,13 @@ int put_itimerspec64(const struct itimerspec64 *it,
}
EXPORT_SYMBOL_GPL(put_itimerspec64);
+/**
+ * get_old_itimerspec32 - get user's &struct old_itimerspec32 into kernel space
+ * @its: destination &struct itimerspec64
+ * @uits: user's &struct old_itimerspec32
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_old_itimerspec32(struct itimerspec64 *its,
const struct old_itimerspec32 __user *uits)
{
@@ -898,6 +1037,14 @@ int get_old_itimerspec32(struct itimerspec64 *its,
}
EXPORT_SYMBOL_GPL(get_old_itimerspec32);
+/**
+ * put_old_itimerspec32 - convert &struct itimerspec64 to &struct
+ * old_itimerspec32 and copy the latter to userspace
+ * @its: input &struct itimerspec64
+ * @uits: user's &struct old_itimerspec32
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_old_itimerspec32(const struct itimerspec64 *its,
struct old_itimerspec32 __user *uits)
{
--
2.51.0
^ permalink raw reply related
* Re: [PATCH 2/6] hugetlb: remove the hugetlb_linear_page_index() helper
From: Oscar Salvador @ 2026-04-13 16:48 UTC (permalink / raw)
To: Jane Chu
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-3-jane.chu@oracle.com>
On Thu, Apr 09, 2026 at 05:41:53PM -0600, Jane Chu wrote:
> hugetlb_linear_page_index() is just linear_page_index() converted from
> base-page units to hugetlb page units.
>
> Open-code that conversion at its remaining call site in
> mfill_atomic_hugetlb() and drop the helper.
>
> No functional change intended.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* Re: [PATCH v2] Documentation: sysctl: document net core sysctls
From: Simon Horman @ 2026-04-13 16:47 UTC (permalink / raw)
To: Shubham Chakraborty
Cc: netdev, davem, edumazet, kuba, pabeni, kuniyu, corbet, skhan,
linux-doc, linux-kernel
In-Reply-To: <20260409174859.11854-1-chakrabortyshubham66@gmail.com>
On Thu, Apr 09, 2026 at 11:18:59PM +0530, Shubham Chakraborty wrote:
> Document missing net.core and net.unix sysctl entries in
> admin-guide/sysctl/net.rst, and correct wording for defaults
> that are derived from PAGE_SIZE, HZ, or CONFIG_MAX_SKB_FRAGS.
>
> Also clarify that the RFS and flow-limit controls are only present
> when CONFIG_RPS or CONFIG_NET_FLOW_LIMIT is enabled, and describe
> rps_sock_flow_entries the way the handler implements it: non-zero
> values are rounded up to the nearest power of two.
>
> Signed-off-by: Shubham Chakraborty <chakrabortyshubham66@gmail.com>
...
> @@ -238,6 +240,37 @@ rps_default_mask
> The default RPS CPU mask used on newly created network devices. An empty
> mask means RPS disabled by default.
>
> +rps_sock_flow_entries
> +---------------------
> +
> +The total number of entries in the RPS flow table. This is used by
Maybe s/This/The table/ to make it clearer that it is the table,
rather than the number of entries, that track CPUs.
> +RFS (Receive Flow Steering) to track which CPU is currently processing
> +a flow in userspace. Non-zero values are rounded up to the nearest
> +power of two.
> +Available only when ``CONFIG_RPS`` is enabled.
I think it would be worth noting that a value of 0 disables RPS.
> +
> +Default: 0
...
> netdev_budget_usecs
> ---------------------
>
The lines above the following hunk are:
netdev_budget_usecs
---------------------
Maximum number of microseconds in one NAPI polling cycle. Polling
> @@ -297,12 +332,16 @@ Maximum number of microseconds in one NAPI polling cycle. Polling
> will exit when either netdev_budget_usecs have elapsed during the
> poll cycle or the number of packets processed reaches netdev_budget.
>
> +Default: ``2 * USEC_PER_SEC / HZ`` (2000 when ``HZ`` is 1000)
> +
Well, that is awkward.
Looking at git history, it seems that this sysctl was added by 7acf8a1e8a28
("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq
tuning") in 2017. And at that time the unic was us, and the default was 2000 us.
But that was changed by a fix for that commit, a4837980fd9f ("net: revert
default NAPI poll timeout to 2 jiffies"), in 2020. As a side-effect of
that commit, the default was changed to what you have documented above,
and the unit changed to jiffies.
So while what you have is correct it seems nonsensical to me for the unit
to be jiffies. Because that's not a meaningful unit for users. And because
the name of the sysctl ends in usecs.
But I'm unsure what to do about it. Since changing the unit this would
represent (another) KABI break.
* Add another knob that shadows this one (But what to call it?)
* Simply remove this one (KAPI break)
* Change the unit of this knob (KAPI break)
If the code is left as is, then I think it should be documented that the
unit is jiffies.
...
^ permalink raw reply
* Re: [PATCH 1/6] hugetlb: open-code hugetlb folio lookup index conversion
From: jane.chu @ 2026-04-13 16:39 UTC (permalink / raw)
To: Mike Rapoport
Cc: akpm, david, muchun.song, osalvador, lorenzo.stoakes,
Liam.Howlett, vbabka, surenb, mhocko, corbet, skhan, hughd,
baolin.wang, peterx, linux-mm, linux-doc, linux-kernel
In-Reply-To: <adpXTeGPIKcdyekX@kernel.org>
On 4/11/2026 7:14 AM, Mike Rapoport wrote:
> Hi,
>
> On Thu, Apr 09, 2026 at 05:41:52PM -0600, Jane Chu wrote:
>> This patch removes `filemap_lock_hugetlb_folio()` and open-codes
>> the index conversion at each call site, making it explicit when
>> hugetlb code is translating a hugepage index into the base-page index
>> expected by `filemap_lock_folio()`. As part of that cleanup,
>> it also uses a base-page index directly in `hugetlbfs_zero_partial_page()`,
>> where the byte offset is already page-granular. Overall, the change
>> makes the indexing model more obvious at the call sites and avoids
>> hiding the huge-index to base-index conversion inside a helper.
>>
>> Suggested-by: David Hildenbrand <david@kernel.org>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>> fs/hugetlbfs/inode.c | 20 ++++++++++----------
>> include/linux/hugetlb.h | 12 ------------
>> mm/hugetlb.c | 4 ++--
>> 3 files changed, 12 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index cd6b22f6e2b1..cf79fb830377 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -242,9 +242,9 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> struct hstate *h = hstate_file(file);
>> struct address_space *mapping = file->f_mapping;
>> struct inode *inode = mapping->host;
>> - unsigned long index = iocb->ki_pos >> huge_page_shift(h);
>> + unsigned long idx = iocb->ki_pos >> huge_page_shift(h);
>
> Is it necessary to rename index to idx?
>
>> unsigned long offset = iocb->ki_pos & ~huge_page_mask(h);
>> - unsigned long end_index;
>> + unsigned long end_idx;
>> loff_t isize;
>> ssize_t retval = 0;
>
> ...
>
>> @@ -652,10 +652,10 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
>> loff_t start,
>> loff_t end)
>> {
>> - pgoff_t idx = start >> huge_page_shift(h);
>> + pgoff_t index = start >> PAGE_SHIFT;
>
> And idx to index?
>
> Maybe let's pick one and rename the other or just leave them be.
As I just replied to Oscar, I found the mixture of idx/index both could
represent both huge page index as well as base page index creates a bit
dizzying situation to the reader. So through out the patches, 'idx' is
made to carry the notion of huge page index while 'index' carry the
notion of base page index.
thanks,
-jane
>
>> struct folio *folio;
>>
>
^ permalink raw reply
* Re: [PATCH 1/6] hugetlb: open-code hugetlb folio lookup index conversion
From: jane.chu @ 2026-04-13 16:30 UTC (permalink / raw)
To: Oscar Salvador
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <ad0YVH4EVzi62yML@localhost.localdomain>
On 4/13/2026 9:22 AM, Oscar Salvador wrote:
> On Thu, Apr 09, 2026 at 05:41:52PM -0600, Jane Chu wrote:
>> This patch removes `filemap_lock_hugetlb_folio()` and open-codes
>> the index conversion at each call site, making it explicit when
>> hugetlb code is translating a hugepage index into the base-page index
>> expected by `filemap_lock_folio()`. As part of that cleanup,
>> it also uses a base-page index directly in `hugetlbfs_zero_partial_page()`,
>> where the byte offset is already page-granular. Overall, the change
>> makes the indexing model more obvious at the call sites and avoids
>> hiding the huge-index to base-index conversion inside a helper.
>>
>> Suggested-by: David Hildenbrand <david@kernel.org>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>
> It kind of funny that most of the patch is s/index/idx noise.
> Checking mm/hugetlb* and fs/hugetlb/* we do have a mix of index/idx but
> I would say that idx predominates, so I am ok with going with that one.
Indeed the situation that both idx/index can represent both huge page
index and base page index had led me intentionally memorize which is
representing what in a given local context. I thought that to denote
'index' to base page granularity and 'idx' to huge page granularity
could relax the readers.
>
> Acked-by: Oscar Salvador <osalvador@suse.de>
thanks,
-jane
>
>
>> ---
>> fs/hugetlbfs/inode.c | 20 ++++++++++----------
>> include/linux/hugetlb.h | 12 ------------
>> mm/hugetlb.c | 4 ++--
>> 3 files changed, 12 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index cd6b22f6e2b1..cf79fb830377 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -242,9 +242,9 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> struct hstate *h = hstate_file(file);
>> struct address_space *mapping = file->f_mapping;
>> struct inode *inode = mapping->host;
>> - unsigned long index = iocb->ki_pos >> huge_page_shift(h);
>> + unsigned long idx = iocb->ki_pos >> huge_page_shift(h);
>> unsigned long offset = iocb->ki_pos & ~huge_page_mask(h);
>> - unsigned long end_index;
>> + unsigned long end_idx;
>> loff_t isize;
>> ssize_t retval = 0;
>>
>> @@ -257,10 +257,10 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> isize = i_size_read(inode);
>> if (!isize)
>> break;
>> - end_index = (isize - 1) >> huge_page_shift(h);
>> - if (index > end_index)
>> + end_idx = (isize - 1) >> huge_page_shift(h);
>> + if (idx > end_idx)
>> break;
>> - if (index == end_index) {
>> + if (idx == end_idx) {
>> nr = ((isize - 1) & ~huge_page_mask(h)) + 1;
>> if (nr <= offset)
>> break;
>> @@ -268,7 +268,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> nr = nr - offset;
>>
>> /* Find the folio */
>> - folio = filemap_lock_hugetlb_folio(h, mapping, index);
>> + folio = filemap_lock_folio(mapping, idx << huge_page_order(h));
>> if (IS_ERR(folio)) {
>> /*
>> * We have a HOLE, zero out the user-buffer for the
>> @@ -307,10 +307,10 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> retval = -EFAULT;
>> break;
>> }
>> - index += offset >> huge_page_shift(h);
>> + idx += offset >> huge_page_shift(h);
>> offset &= ~huge_page_mask(h);
>> }
>> - iocb->ki_pos = ((loff_t)index << huge_page_shift(h)) + offset;
>> + iocb->ki_pos = ((loff_t)idx << huge_page_shift(h)) + offset;
>> return retval;
>> }
>>
>> @@ -652,10 +652,10 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
>> loff_t start,
>> loff_t end)
>> {
>> - pgoff_t idx = start >> huge_page_shift(h);
>> + pgoff_t index = start >> PAGE_SHIFT;
>> struct folio *folio;
>>
>> - folio = filemap_lock_hugetlb_folio(h, mapping, idx);
>> + folio = filemap_lock_folio(mapping, index);
>> if (IS_ERR(folio))
>> return;
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 9c098a02a09e..c64c6e5e50f5 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -829,12 +829,6 @@ static inline unsigned int blocks_per_huge_page(struct hstate *h)
>> return huge_page_size(h) / 512;
>> }
>>
>> -static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
>> - struct address_space *mapping, pgoff_t idx)
>> -{
>> - return filemap_lock_folio(mapping, idx << huge_page_order(h));
>> -}
>> -
>> #include <asm/hugetlb.h>
>>
>> #ifndef is_hugepage_only_range
>> @@ -1106,12 +1100,6 @@ static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio
>> return NULL;
>> }
>>
>> -static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
>> - struct address_space *mapping, pgoff_t idx)
>> -{
>> - return NULL;
>> -}
>> -
>> static inline int isolate_or_dissolve_huge_folio(struct folio *folio,
>> struct list_head *list)
>> {
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index a786034ac95c..38b39eaf46cc 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -5724,7 +5724,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
>> * before we get page_table_lock.
>> */
>> new_folio = false;
>> - folio = filemap_lock_hugetlb_folio(h, mapping, vmf->pgoff);
>> + folio = filemap_lock_folio(mapping, vmf->pgoff << huge_page_order(h));
>> if (IS_ERR(folio)) {
>> size = i_size_read(mapping->host) >> huge_page_shift(h);
>> if (vmf->pgoff >= size)
>> @@ -6208,7 +6208,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
>>
>> if (is_continue) {
>> ret = -EFAULT;
>> - folio = filemap_lock_hugetlb_folio(h, mapping, idx);
>> + folio = filemap_lock_folio(mapping, idx << huge_page_order(h));
>> if (IS_ERR(folio))
>> goto out;
>> folio_in_pagecache = true;
>> --
>> 2.43.5
>>
>
^ permalink raw reply
* [PATCH 5.15 157/570] time: add kernel-doc in time.c
From: Greg Kroah-Hartman @ 2026-04-13 15:54 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, Randy Dunlap, John Stultz,
Thomas Gleixner, Stephen Boyd, Jonathan Corbet, linux-doc,
Sasha Levin
In-Reply-To: <20260413155830.386096114@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap <rdunlap@infradead.org>
[ Upstream commit 67b3f564cb1e769ef8e45835129a4866152fcfdb ]
Add kernel-doc for all APIs that do not already have it.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Acked-by: John Stultz <jstultz@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230704052405.5089-3-rdunlap@infradead.org
Stable-dep-of: 755a648e78f1 ("time/jiffies: Mark jiffies_64_to_clock_t() notrace")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/time/time.c | 169 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 158 insertions(+), 11 deletions(-)
diff --git a/kernel/time/time.c b/kernel/time/time.c
index a7fce68465a38..50390158e9d97 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -365,11 +365,14 @@ SYSCALL_DEFINE1(adjtimex_time32, struct old_timex32 __user *, utp)
}
#endif
-/*
- * Convert jiffies to milliseconds and back.
+/**
+ * jiffies_to_msecs - Convert jiffies to milliseconds
+ * @j: jiffies value
*
* Avoid unnecessary multiplications/divisions in the
- * two most common HZ cases:
+ * two most common HZ cases.
+ *
+ * Return: milliseconds value
*/
unsigned int jiffies_to_msecs(const unsigned long j)
{
@@ -388,6 +391,12 @@ unsigned int jiffies_to_msecs(const unsigned long j)
}
EXPORT_SYMBOL(jiffies_to_msecs);
+/**
+ * jiffies_to_usecs - Convert jiffies to microseconds
+ * @j: jiffies value
+ *
+ * Return: microseconds value
+ */
unsigned int jiffies_to_usecs(const unsigned long j)
{
/*
@@ -408,8 +417,15 @@ unsigned int jiffies_to_usecs(const unsigned long j)
}
EXPORT_SYMBOL(jiffies_to_usecs);
-/*
+/**
* mktime64 - Converts date to seconds.
+ * @year0: year to convert
+ * @mon0: month to convert
+ * @day: day to convert
+ * @hour: hour to convert
+ * @min: minute to convert
+ * @sec: second to convert
+ *
* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
* Assumes input in normal date format, i.e. 1980-12-31 23:59:59
* => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
@@ -427,6 +443,8 @@ EXPORT_SYMBOL(jiffies_to_usecs);
*
* An encoding of midnight at the end of the day as 24:00:00 - ie. midnight
* tomorrow - (allowable under ISO 8601) is supported.
+ *
+ * Return: seconds since the epoch time for the given input date
*/
time64_t mktime64(const unsigned int year0, const unsigned int mon0,
const unsigned int day, const unsigned int hour,
@@ -471,8 +489,7 @@ EXPORT_SYMBOL(ns_to_kernel_old_timeval);
* Set seconds and nanoseconds field of a timespec variable and
* normalize to the timespec storage format
*
- * Note: The tv_nsec part is always in the range of
- * 0 <= tv_nsec < NSEC_PER_SEC
+ * Note: The tv_nsec part is always in the range of 0 <= tv_nsec < NSEC_PER_SEC.
* For negative values only the tv_sec field is negative !
*/
void set_normalized_timespec64(struct timespec64 *ts, time64_t sec, s64 nsec)
@@ -501,7 +518,7 @@ EXPORT_SYMBOL(set_normalized_timespec64);
* ns_to_timespec64 - Convert nanoseconds to timespec64
* @nsec: the nanoseconds value to be converted
*
- * Returns the timespec64 representation of the nsec parameter.
+ * Return: the timespec64 representation of the nsec parameter.
*/
struct timespec64 ns_to_timespec64(const s64 nsec)
{
@@ -548,6 +565,8 @@ EXPORT_SYMBOL(ns_to_timespec64);
* runtime.
* the _msecs_to_jiffies helpers are the HZ dependent conversion
* routines found in include/linux/jiffies.h
+ *
+ * Return: jiffies value
*/
unsigned long __msecs_to_jiffies(const unsigned int m)
{
@@ -560,6 +579,12 @@ unsigned long __msecs_to_jiffies(const unsigned int m)
}
EXPORT_SYMBOL(__msecs_to_jiffies);
+/**
+ * __usecs_to_jiffies: - convert microseconds to jiffies
+ * @u: time in milliseconds
+ *
+ * Return: jiffies value
+ */
unsigned long __usecs_to_jiffies(const unsigned int u)
{
if (u > jiffies_to_usecs(MAX_JIFFY_OFFSET))
@@ -568,7 +593,10 @@ unsigned long __usecs_to_jiffies(const unsigned int u)
}
EXPORT_SYMBOL(__usecs_to_jiffies);
-/*
+/**
+ * timespec64_to_jiffies - convert a timespec64 value to jiffies
+ * @value: pointer to &struct timespec64
+ *
* The TICK_NSEC - 1 rounds up the value to the next resolution. Note
* that a remainder subtract here would not do the right thing as the
* resolution values don't fall on second boundaries. I.e. the line:
@@ -582,8 +610,9 @@ EXPORT_SYMBOL(__usecs_to_jiffies);
*
* The >> (NSEC_JIFFIE_SC - SEC_JIFFIE_SC) converts the scaled nsec
* value to a scaled second value.
+ *
+ * Return: jiffies value
*/
-
unsigned long
timespec64_to_jiffies(const struct timespec64 *value)
{
@@ -601,6 +630,11 @@ timespec64_to_jiffies(const struct timespec64 *value)
}
EXPORT_SYMBOL(timespec64_to_jiffies);
+/**
+ * jiffies_to_timespec64 - convert jiffies value to &struct timespec64
+ * @jiffies: jiffies value
+ * @value: pointer to &struct timespec64
+ */
void
jiffies_to_timespec64(const unsigned long jiffies, struct timespec64 *value)
{
@@ -618,6 +652,13 @@ EXPORT_SYMBOL(jiffies_to_timespec64);
/*
* Convert jiffies/jiffies_64 to clock_t and back.
*/
+
+/**
+ * jiffies_to_clock_t - Convert jiffies to clock_t
+ * @x: jiffies value
+ *
+ * Return: jiffies converted to clock_t (CLOCKS_PER_SEC)
+ */
clock_t jiffies_to_clock_t(unsigned long x)
{
#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
@@ -632,6 +673,12 @@ clock_t jiffies_to_clock_t(unsigned long x)
}
EXPORT_SYMBOL(jiffies_to_clock_t);
+/**
+ * clock_t_to_jiffies - Convert clock_t to jiffies
+ * @x: clock_t value
+ *
+ * Return: clock_t value converted to jiffies
+ */
unsigned long clock_t_to_jiffies(unsigned long x)
{
#if (HZ % USER_HZ)==0
@@ -649,6 +696,12 @@ unsigned long clock_t_to_jiffies(unsigned long x)
}
EXPORT_SYMBOL(clock_t_to_jiffies);
+/**
+ * jiffies_64_to_clock_t - Convert jiffies_64 to clock_t
+ * @x: jiffies_64 value
+ *
+ * Return: jiffies_64 value converted to 64-bit "clock_t" (CLOCKS_PER_SEC)
+ */
u64 jiffies_64_to_clock_t(u64 x)
{
#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
@@ -671,6 +724,12 @@ u64 jiffies_64_to_clock_t(u64 x)
}
EXPORT_SYMBOL(jiffies_64_to_clock_t);
+/**
+ * nsec_to_clock_t - Convert nsec value to clock_t
+ * @x: nsec value
+ *
+ * Return: nsec value converted to 64-bit "clock_t" (CLOCKS_PER_SEC)
+ */
u64 nsec_to_clock_t(u64 x)
{
#if (NSEC_PER_SEC % USER_HZ) == 0
@@ -687,6 +746,12 @@ u64 nsec_to_clock_t(u64 x)
#endif
}
+/**
+ * jiffies64_to_nsecs - Convert jiffies64 to nanoseconds
+ * @j: jiffies64 value
+ *
+ * Return: nanoseconds value
+ */
u64 jiffies64_to_nsecs(u64 j)
{
#if !(NSEC_PER_SEC % HZ)
@@ -697,6 +762,12 @@ u64 jiffies64_to_nsecs(u64 j)
}
EXPORT_SYMBOL(jiffies64_to_nsecs);
+/**
+ * jiffies64_to_msecs - Convert jiffies64 to milliseconds
+ * @j: jiffies64 value
+ *
+ * Return: milliseconds value
+ */
u64 jiffies64_to_msecs(const u64 j)
{
#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
@@ -719,6 +790,8 @@ EXPORT_SYMBOL(jiffies64_to_msecs);
* note:
* NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
* ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ *
+ * Return: nsecs converted to jiffies64 value
*/
u64 nsecs_to_jiffies64(u64 n)
{
@@ -750,6 +823,8 @@ EXPORT_SYMBOL(nsecs_to_jiffies64);
* note:
* NSEC_PER_SEC = 10^9 = (5^9 * 2^9) = (1953125 * 512)
* ULLONG_MAX ns = 18446744073.709551615 secs = about 584 years
+ *
+ * Return: nsecs converted to jiffies value
*/
unsigned long nsecs_to_jiffies(u64 n)
{
@@ -757,10 +832,16 @@ unsigned long nsecs_to_jiffies(u64 n)
}
EXPORT_SYMBOL_GPL(nsecs_to_jiffies);
-/*
- * Add two timespec64 values and do a safety check for overflow.
+/**
+ * timespec64_add_safe - Add two timespec64 values and do a safety check
+ * for overflow.
+ * @lhs: first (left) timespec64 to add
+ * @rhs: second (right) timespec64 to add
+ *
* It's assumed that both values are valid (>= 0).
* And, each timespec64 is in normalized form.
+ *
+ * Return: sum of @lhs + @rhs
*/
struct timespec64 timespec64_add_safe(const struct timespec64 lhs,
const struct timespec64 rhs)
@@ -778,6 +859,15 @@ struct timespec64 timespec64_add_safe(const struct timespec64 lhs,
return res;
}
+/**
+ * get_timespec64 - get user's time value into kernel space
+ * @ts: destination &struct timespec64
+ * @uts: user's time value as &struct __kernel_timespec
+ *
+ * Handles compat or 32-bit modes.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_timespec64(struct timespec64 *ts,
const struct __kernel_timespec __user *uts)
{
@@ -801,6 +891,14 @@ int get_timespec64(struct timespec64 *ts,
}
EXPORT_SYMBOL_GPL(get_timespec64);
+/**
+ * put_timespec64 - convert timespec64 value to __kernel_timespec format and
+ * copy the latter to userspace
+ * @ts: input &struct timespec64
+ * @uts: user's &struct __kernel_timespec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_timespec64(const struct timespec64 *ts,
struct __kernel_timespec __user *uts)
{
@@ -839,6 +937,15 @@ static int __put_old_timespec32(const struct timespec64 *ts64,
return copy_to_user(cts, &ts, sizeof(ts)) ? -EFAULT : 0;
}
+/**
+ * get_old_timespec32 - get user's old-format time value into kernel space
+ * @ts: destination &struct timespec64
+ * @uts: user's old-format time value (&struct old_timespec32)
+ *
+ * Handles X86_X32_ABI compatibility conversion.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_old_timespec32(struct timespec64 *ts, const void __user *uts)
{
if (COMPAT_USE_64BIT_TIME)
@@ -848,6 +955,16 @@ int get_old_timespec32(struct timespec64 *ts, const void __user *uts)
}
EXPORT_SYMBOL_GPL(get_old_timespec32);
+/**
+ * put_old_timespec32 - convert timespec64 value to &struct old_timespec32 and
+ * copy the latter to userspace
+ * @ts: input &struct timespec64
+ * @uts: user's &struct old_timespec32
+ *
+ * Handles X86_X32_ABI compatibility conversion.
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_old_timespec32(const struct timespec64 *ts, void __user *uts)
{
if (COMPAT_USE_64BIT_TIME)
@@ -857,6 +974,13 @@ int put_old_timespec32(const struct timespec64 *ts, void __user *uts)
}
EXPORT_SYMBOL_GPL(put_old_timespec32);
+/**
+ * get_itimerspec64 - get user's &struct __kernel_itimerspec into kernel space
+ * @it: destination &struct itimerspec64
+ * @uit: user's &struct __kernel_itimerspec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_itimerspec64(struct itimerspec64 *it,
const struct __kernel_itimerspec __user *uit)
{
@@ -872,6 +996,14 @@ int get_itimerspec64(struct itimerspec64 *it,
}
EXPORT_SYMBOL_GPL(get_itimerspec64);
+/**
+ * put_itimerspec64 - convert &struct itimerspec64 to __kernel_itimerspec format
+ * and copy the latter to userspace
+ * @it: input &struct itimerspec64
+ * @uit: user's &struct __kernel_itimerspec
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_itimerspec64(const struct itimerspec64 *it,
struct __kernel_itimerspec __user *uit)
{
@@ -887,6 +1019,13 @@ int put_itimerspec64(const struct itimerspec64 *it,
}
EXPORT_SYMBOL_GPL(put_itimerspec64);
+/**
+ * get_old_itimerspec32 - get user's &struct old_itimerspec32 into kernel space
+ * @its: destination &struct itimerspec64
+ * @uits: user's &struct old_itimerspec32
+ *
+ * Return: %0 on success or negative errno on error
+ */
int get_old_itimerspec32(struct itimerspec64 *its,
const struct old_itimerspec32 __user *uits)
{
@@ -898,6 +1037,14 @@ int get_old_itimerspec32(struct itimerspec64 *its,
}
EXPORT_SYMBOL_GPL(get_old_itimerspec32);
+/**
+ * put_old_itimerspec32 - convert &struct itimerspec64 to &struct
+ * old_itimerspec32 and copy the latter to userspace
+ * @its: input &struct itimerspec64
+ * @uits: user's &struct old_itimerspec32
+ *
+ * Return: %0 on success or negative errno on error
+ */
int put_old_itimerspec32(const struct itimerspec64 *its,
struct old_itimerspec32 __user *uits)
{
--
2.51.0
^ permalink raw reply related
* Re: [PATCH 1/6] hugetlb: open-code hugetlb folio lookup index conversion
From: Oscar Salvador @ 2026-04-13 16:22 UTC (permalink / raw)
To: Jane Chu
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-2-jane.chu@oracle.com>
On Thu, Apr 09, 2026 at 05:41:52PM -0600, Jane Chu wrote:
> This patch removes `filemap_lock_hugetlb_folio()` and open-codes
> the index conversion at each call site, making it explicit when
> hugetlb code is translating a hugepage index into the base-page index
> expected by `filemap_lock_folio()`. As part of that cleanup,
> it also uses a base-page index directly in `hugetlbfs_zero_partial_page()`,
> where the byte offset is already page-granular. Overall, the change
> makes the indexing model more obvious at the call sites and avoids
> hiding the huge-index to base-index conversion inside a helper.
>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
It kind of funny that most of the patch is s/index/idx noise.
Checking mm/hugetlb* and fs/hugetlb/* we do have a mix of index/idx but
I would say that idx predominates, so I am ok with going with that one.
Acked-by: Oscar Salvador <osalvador@suse.de>
> ---
> fs/hugetlbfs/inode.c | 20 ++++++++++----------
> include/linux/hugetlb.h | 12 ------------
> mm/hugetlb.c | 4 ++--
> 3 files changed, 12 insertions(+), 24 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index cd6b22f6e2b1..cf79fb830377 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -242,9 +242,9 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
> struct hstate *h = hstate_file(file);
> struct address_space *mapping = file->f_mapping;
> struct inode *inode = mapping->host;
> - unsigned long index = iocb->ki_pos >> huge_page_shift(h);
> + unsigned long idx = iocb->ki_pos >> huge_page_shift(h);
> unsigned long offset = iocb->ki_pos & ~huge_page_mask(h);
> - unsigned long end_index;
> + unsigned long end_idx;
> loff_t isize;
> ssize_t retval = 0;
>
> @@ -257,10 +257,10 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
> isize = i_size_read(inode);
> if (!isize)
> break;
> - end_index = (isize - 1) >> huge_page_shift(h);
> - if (index > end_index)
> + end_idx = (isize - 1) >> huge_page_shift(h);
> + if (idx > end_idx)
> break;
> - if (index == end_index) {
> + if (idx == end_idx) {
> nr = ((isize - 1) & ~huge_page_mask(h)) + 1;
> if (nr <= offset)
> break;
> @@ -268,7 +268,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
> nr = nr - offset;
>
> /* Find the folio */
> - folio = filemap_lock_hugetlb_folio(h, mapping, index);
> + folio = filemap_lock_folio(mapping, idx << huge_page_order(h));
> if (IS_ERR(folio)) {
> /*
> * We have a HOLE, zero out the user-buffer for the
> @@ -307,10 +307,10 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
> retval = -EFAULT;
> break;
> }
> - index += offset >> huge_page_shift(h);
> + idx += offset >> huge_page_shift(h);
> offset &= ~huge_page_mask(h);
> }
> - iocb->ki_pos = ((loff_t)index << huge_page_shift(h)) + offset;
> + iocb->ki_pos = ((loff_t)idx << huge_page_shift(h)) + offset;
> return retval;
> }
>
> @@ -652,10 +652,10 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
> loff_t start,
> loff_t end)
> {
> - pgoff_t idx = start >> huge_page_shift(h);
> + pgoff_t index = start >> PAGE_SHIFT;
> struct folio *folio;
>
> - folio = filemap_lock_hugetlb_folio(h, mapping, idx);
> + folio = filemap_lock_folio(mapping, index);
> if (IS_ERR(folio))
> return;
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9c098a02a09e..c64c6e5e50f5 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -829,12 +829,6 @@ static inline unsigned int blocks_per_huge_page(struct hstate *h)
> return huge_page_size(h) / 512;
> }
>
> -static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
> - struct address_space *mapping, pgoff_t idx)
> -{
> - return filemap_lock_folio(mapping, idx << huge_page_order(h));
> -}
> -
> #include <asm/hugetlb.h>
>
> #ifndef is_hugepage_only_range
> @@ -1106,12 +1100,6 @@ static inline struct hugepage_subpool *hugetlb_folio_subpool(struct folio *folio
> return NULL;
> }
>
> -static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h,
> - struct address_space *mapping, pgoff_t idx)
> -{
> - return NULL;
> -}
> -
> static inline int isolate_or_dissolve_huge_folio(struct folio *folio,
> struct list_head *list)
> {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a786034ac95c..38b39eaf46cc 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5724,7 +5724,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
> * before we get page_table_lock.
> */
> new_folio = false;
> - folio = filemap_lock_hugetlb_folio(h, mapping, vmf->pgoff);
> + folio = filemap_lock_folio(mapping, vmf->pgoff << huge_page_order(h));
> if (IS_ERR(folio)) {
> size = i_size_read(mapping->host) >> huge_page_shift(h);
> if (vmf->pgoff >= size)
> @@ -6208,7 +6208,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
>
> if (is_continue) {
> ret = -EFAULT;
> - folio = filemap_lock_hugetlb_folio(h, mapping, idx);
> + folio = filemap_lock_folio(mapping, idx << huge_page_order(h));
> if (IS_ERR(folio))
> goto out;
> folio_in_pagecache = true;
> --
> 2.43.5
>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: Sebastian Andrzej Siewior @ 2026-04-13 15:59 UTC (permalink / raw)
To: John Ogness
Cc: Valentin Schneider, linux-doc, linux-kernel, Aaron Tomlin,
Christoph Hellwig, Frederic Weisbecker, Jens Axboe,
Jonathan Corbet, Ming Lei, Thomas Gleixner, Waiman Long,
Peter Zijlstra
In-Reply-To: <87zf37f3xl.fsf@jogness.linutronix.de>
On 2026-04-13 14:25:34 [+0206], John Ogness wrote:
> If that is the case, then the deprecation notice should explicitly
> target the "domain" flag. Also note that "domain" is the default if no
I am going to submit a patch removing that deprecation note. Frederic
was pro and I did not collect a single argument why removing it would be
a good thing.
> John Ogness
Sebastian
^ permalink raw reply
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: Sebastian Andrzej Siewior @ 2026-04-13 15:57 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-doc, linux-kernel, Aaron Tomlin, Christoph Hellwig,
Frederic Weisbecker, Jens Axboe, Jonathan Corbet, Ming Lei,
Thomas Gleixner, Waiman Long, Peter Zijlstra, John Ogness
In-Reply-To: <xhsmhlderi1f6.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
On 2026-04-13 11:45:33 [+0100], Valentin Schneider wrote:
> On 01/04/26 13:02, Sebastian Andrzej Siewior wrote:
> > One more point: Given that isolcpus= is marked deprecated as of commit
> > b0d40d2b22fe4 ("sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated")
> >
> > and the 'managed_irq' is evaluated at device's probe time it would
> > require additional callbacks to re-evaluate the situation. Probably for
> > 'io_queue', too. Does is make sense or should we simply drop the
> > "deprecation" notice and allowing using it long term?
>
> AIUI the deprecation notice is more for isolcpus=domain, i.e. the scheduler
> part, but it's still relevant for e.g. managed_irq. FWIW Openshift uses:
>
> isolcpus=managed_irq,<cpulist>
> nohz_full=<cpulist>
>
> and cpusets for dynamically isolating CPUs from the scheduler.
For the managed_irq you could argue that this could also use some
runtime configuration at which point isolcpus= would have a runtime
counterpart and could be removed.
After going through all this I concluded that it makes hardly sense
since you would require callbacks in every driver using it or other
magic "to reconfigure" but it already makes little sense using it.
Either way, I don't see anything wrong with using isolcpus=domain if you
have a static setup and need/ want reconfigure at runtime.
Sebastian
^ permalink raw reply
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: Sebastian Andrzej Siewior @ 2026-04-13 15:53 UTC (permalink / raw)
To: Ming Lei
Cc: linux-doc, linux-kernel, Aaron Tomlin, Christoph Hellwig,
Frederic Weisbecker, Jens Axboe, Jonathan Corbet, Thomas Gleixner,
Valentin Schneider, Waiman Long, Peter Zijlstra, John Ogness
In-Reply-To: <CAFj5m9La5S0B8o677FmHoXkD-N+kMVdJL7Gn1YG5noy_4Q_jxg@mail.gmail.com>
On 2026-04-11 20:18:17 [+0800], Ming Lei wrote:
> > +CPUs listed in the avoided mask remain part of the interrupt’s affinity mask.
> > +This means that if all non‑isolated CPUs go offline while isolated CPUs remain
> > +online, the interrupt will be assigned to one of the isolated CPUs.
>
> Maybe you can add:
>
> In reality it is fine because IO isn't supposed to submit from isolated CPUs.
You can argue both way. And I have some vague memory that block will
schedule kworker and there was some work to use unbound worker instead
of _this_ CPU. I just don't know what happens with interrupt and this is
probably the one thing you can't configure.
Sebastian
^ permalink raw reply
* [PATCH v2] docs/zh_CN: add module-signing Chinese translation
From: Yan Zhu @ 2026-04-13 15:33 UTC (permalink / raw)
To: seakeel, alexs, si.yanteng, corbet
Cc: dzm91, skhan, linux-doc, linux-kernel, zhuyan2015
Translate .../admin-guide/module-signing.rst into Chinese.
Update the translation through commit 0ad9a71933e7
("modsign: Enable ML-DSA module signing")
Signed-off-by: Yan Zhu <zhuyan2015@qq.com>
---
.../zh_CN/admin-guide/module-signing.rst | 249 ++++++++++++++++++
1 file changed, 249 insertions(+)
create mode 100644 Documentation/translations/zh_CN/admin-guide/module-signing.rst
diff --git a/Documentation/translations/zh_CN/admin-guide/module-signing.rst b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
new file mode 100644
index 000000000000..981ebfc2c7cc
--- /dev/null
+++ b/Documentation/translations/zh_CN/admin-guide/module-signing.rst
@@ -0,0 +1,249 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/admin-guide/module-signing.rst
+:翻译:
+ 朱岩 Yan Zhu <zhuyan2015@qq.com>
+
+
+==========================
+内核模块签名机制
+==========================
+
+.. 目录
+..
+.. - 概述
+.. - 配置模块签名
+.. - 生成签名密钥
+.. - 内核中的公钥
+.. - 模块手动签名
+.. - 已签名模块和剥离
+.. - 加载已签名模块
+.. - 无效签名和未签名模块
+.. - 管理/保护私钥
+
+
+概述
+====
+
+内核模块签名机制在安装过程中对模块进行加密签名,然后在加载模块时检查签名。这
+通过禁止加载未签名的模块或使用无效密钥签名的模块来提高内核安全性。模块签名通
+过使恶意模块更难加载到内核中来增加安全性。模块签名检查在内核中完成,因此不需
+要受信任的用户空间位。
+
+此机制使用 X.509 ITU-T 标准证书对涉及的公钥进行编码。签名本身不以任何工业标准
+类型编码。内置机制目前仅支持 RSA、NIST P-384 ECDSA 和 NIST FIPS-204 ML-DSA
+公钥签名标准(尽管它是可插拔的并允许使用其他标准)。对于 RSA 和 ECDSA,可以使
+用的可能的哈希算法是大小为 256、384 和 512 的 SHA-2 和 SHA-3(算法由签名中的
+数据选择);ML-DSA会自行进行哈希运算,但允许与SHA512哈希算法结合用于签名属性。
+
+配置模块签名
+============
+
+通过进入内核配置的 :menuselection:`Enable Loadable Module Support` 菜单并打
+开以下选项来启用模块签名机制::
+
+ CONFIG_MODULE_SIG "Module signature verification"
+
+这有多个可用选项:
+
+ (1) :menuselection:`Require modules to be validly signed`
+ (``CONFIG_MODULE_SIG_FORCE``)
+
+ 这指定了内核应如何处理其密钥未知或未签名的模块。
+
+ 如果关闭(即"宽松模式"),则允许使用不可用密钥和未签名的模块,但内核将被
+ 标记为受污染,并且相关模块将被标记为受污染,显示字符'E'。
+
+ 如果打开(即"限制模式"),只有具有有效签名且可由内核拥有的公钥验证的模块
+ 才会被加载。所有其他模块将生成错误。
+
+ 无论此处的设置如何,如果模块的签名块无法解析,它将被直接拒绝。
+
+
+ (2) :menuselection:`Automatically sign all modules`
+ (``CONFIG_MODULE_SIG_ALL``)
+
+ 如果打开此选项,则在构建的 modules_install 阶段期间将自动签名模块。
+ 如果关闭,则必须使用以下命令手动签名模块::
+
+ scripts/sign-file
+
+
+ (3) :menuselection:`Which hash algorithm should modules be signed with?`
+
+ 这提供了安装阶段将用于签名模块的哈希算法选择:
+
+ =============================== ==========================================
+ ``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
+ ``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
+ ``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
+ ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256`
+ ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384`
+ ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512`
+ =============================== ==========================================
+
+ 此处选择的算法也将被构建到内核中(而不是作为模块),以便使用该算法签名的
+ 模块可以在不导致循环依赖的情况下检查其签名。
+
+
+ (4) :menuselection:`File name or PKCS#11 URI of module signing key`
+ (``CONFIG_MODULE_SIG_KEY``)
+
+ 将此选项设置为除默认值 ``certs/signing_key.pem`` 之外的其他值将禁用签名
+ 密钥的自动生成,并允许使用您选择的密钥对内核模块进行签名。提供的字符串应
+ 标识包含私钥及其对应的 PEM 格式 X.509 证书的文件,或者在 OpenSSL
+ ENGINE_pkcs11 功能正常的系统上,使用 RFC7512 定义的 PKCS#11 URI。在后一
+ 种情况下,PKCS#11 URI 应引用证书和私钥。
+
+ 如果包含私钥的 PEM 文件已加密,或者 PKCS#11 令牌需要 PIN,可以通过
+ ``KBUILD_SIGN_PIN`` 变量在构建时提供。
+
+
+ (5) :menuselection:`Additional X.509 keys for default system keyring`
+ (``CONFIG_SYSTEM_TRUSTED_KEYS``)
+
+ 此选项可设置为包含附加证书的 PEM 编码文件的文件名,这些证书将默认包含在
+ 系统密钥环中。
+
+请注意,启用模块签名会为内核构建过程添加对执行签名工具的OpenSSL开发包的依赖。
+
+
+生成签名密钥
+============
+
+生成和检查签名需要加密密钥对。私钥用于生成签名,相应的公钥用于检查签名。私钥
+仅在构建期间需要,之后可以删除或安全存储。公钥被构建到内核中,以便在加载模块
+时可以使用它来检查签名。
+
+在正常情况下,当 ``CONFIG_MODULE_SIG_KEY`` 保持默认值时,如果文件中不存在密
+钥对,内核构建将使用 openssl 自动生成新的密钥对::
+
+ certs/signing_key.pem
+
+在构建 vmlinux 期间(公钥需要构建到 vmlinux 中)使用参数::
+
+ certs/x509.genkey
+
+文件(如果尚不存在也会生成)。
+
+可以在 RSA(``MODULE_SIG_KEY_TYPE_RSA``)、
+ECDSA(``MODULE_SIG_KEY_TYPE_ECDSA``)和
+ML-DSA(``MODULE_SIG_KEY_TYPE_MLDSA_*``)之间选择生成 RSA 4k、NIST P-384
+密钥对或 ML-DSA 44、65 或 87 密钥对。
+
+强烈建议您提供自己的 x509.genkey 文件。
+
+最值得注意的是,在 x509.genkey 文件中,req_distinguished_name 部分应从默认值
+更改::
+
+ [ req_distinguished_name ]
+ #O = Unspecified company
+ CN = Build time autogenerated kernel key
+ #emailAddress = unspecified.user@unspecified.company
+
+生成的 RSA 密钥大小也可以通过以下方式设置::
+
+ [ req ]
+ default_bits = 4096
+
+也可以使用位于 Linux 内核源代码树根节点中的 x509.genkey 密钥生成配置文件和
+openssl 命令手动生成公钥/私钥文件。以下是生成公钥/私钥文件的示例::
+
+ openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
+ -config x509.genkey -outform PEM -out kernel_key.pem \
+ -keyout kernel_key.pem
+
+然后可以将生成的 kernel_key.pem 文件的完整路径名指定在
+``CONFIG_MODULE_SIG_KEY``选项中,并且将使用其中的证书和密钥而不是自动生成的
+密钥对。
+
+
+内核中的公钥
+============
+
+内核包含一个可由 root 查看的公钥环。它们在名为 ".builtin_trusted_keys" 的密
+钥环中,可以通过以下方式查看::
+
+ [root@deneb ~]# cat /proc/keys
+ ...
+ 223c7853 I------ 1 perm 1f030000 0 0 keyring .builtin_trusted_keys: 1
+ 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
+
+除了专门为模块签名生成的公钥外,还可以在 ``CONFIG_SYSTEM_TRUSTED_KEYS`` 配置
+选项引用的 PEM 编码文件中提供其他受信任的证书。
+
+此外,架构代码可以从硬件存储中获取公钥并将其添加(例如从 UEFI 密钥数据库)。
+
+最后,可以通过以下方式添加其他公钥::
+
+ keyctl padd asymmetric "" [.builtin_trusted_keys-ID] <[key-file]
+
+例如::
+
+ keyctl padd asymmetric "" 0x223c7853 <my_public_key.x509
+
+但是,请注意,内核只允许将由已驻留在 ``.builtin_trusted_keys`` 中的密钥有效
+签名的密钥添加到 ``.builtin_trusted_keys``。
+
+模块手动签名
+============
+
+要手动对模块进行签名,请使用 Linux 内核源代码树中可用的 scripts/sign-file 工
+具。该脚本需要 4 个参数:
+
+ 1. 哈希算法(例如,sha256)
+ 2. 私钥文件名或 PKCS#11 URI
+ 3. 公钥文件名
+ 4. 要签名的内核模块
+
+以下是签名内核模块的示例::
+
+ scripts/sign-file sha512 kernel-signkey.priv \
+ kernel-signkey.x509 module.ko
+
+使用的哈希算法不必与配置的算法匹配,但如果不同,应确保哈希算法要么内置在内核
+中,要么可以在不需要自身的情况下加载。
+
+如果私钥需要密码或 PIN,可以在 $KBUILD_SIGN_PIN 环境变量中提供。
+
+
+已签名模块和剥离
+================
+
+已签名模块在末尾简单地附加了数字签名。模块文件末尾的字符串
+``~Module signature appended~.`` 确认签名存在,但不能确认签名有效!
+
+已签名模块是脆弱的,因为签名在定义的ELF容器之外。因此,一旦计算并附加签名,就
+不得剥离它们。请注意,整个模块都是签名的有效载荷,包括签名时存在的任何和所有
+调试信息。
+
+
+加载已签名模块
+==============
+
+模块通过 insmod、modprobe、``init_module()`` 或 ``finit_module()`` 加载,
+与未签名模块完全一样,因为在用户空间中不进行任何处理。
+所有签名检查都在内核内完成。
+
+
+无效签名和未签名模块
+====================
+
+如果启用了 ``CONFIG_MODULE_SIG_FORCE`` 或在内核启动命令提供了
+module.sig_enforce=1,内核将仅加载具有有效签名且具有公钥的模块。否则,它还将
+加载未签名的模块。任何具有不匹配签名的模块将不被允许加载。
+
+任何具有不可解析签名的模块将被拒绝。
+
+
+管理/保护私钥
+==============
+
+由于私钥用于签名模块,病毒和恶意软件可以使用私钥签名模块并危害操作系统。私钥
+必须被销毁或移动到安全位置,而不是保存在内核源代码树的根节点中。
+
+如果使用相同的私钥为多个内核配置签名模块,必须确保模块版本信息足以防止将模块
+加载到不同的内核中。要么设置 ``CONFIG_MODVERSIONS=y``,要么通过更改
+``EXTRAVERSION`` 或 ``CONFIG_LOCALVERSION`` 确保每个配置具有不同的内核发布字
+符串。
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v6 00/40] arm_mpam: Add KVM/arm64 and resctrl glue code
From: Ben Horgan @ 2026-04-13 14:41 UTC (permalink / raw)
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, zengheng4,
linux-doc
In-Reply-To: <20260313144617.3420416-1-ben.horgan@arm.com>
On 3/13/26 14:45, Ben Horgan wrote:
> This version of the mpam missing pieces series sees a couple of things
> dropped or hidden. Memory bandwith utilization with free-running counters
> is dropped in preference of just always using 'mbm_event' mode (ABMC
> emulation) which simplifies the code and allows for, in the future,
> filtering by read/write traffic. So, for the interim, there is no memory
> bandwidth utilization support. CDP is hidden behind config expert as
> remount of resctrl fs could potentially lead to out of range PARTIDs being
> used and the fix requires a change in fs/resctrl. The setting of MPAM2_EL2
> (for pkvm/nvhe) is dropped as too expensive a write for not much value.
>
> There are a couple of 'fixes' at the start of the series which address
> problems in the base driver but are only user visible due to this series.
>
> Changelogs in patches
>
> Thanks for all the reviewing and testing so far. Just a bit more to get this
> over the line.
>
> There is a small build conflict with the MPAM abmc precursors series [1], which
> alters some of the resctrl arch hooks. I will shortly be posting a respin
> of that too.
>
> [1] https://lore.kernel.org/lkml/20260225201905.3568624-1-ben.horgan@arm.com/
>
> From James' cover letter:
>
> This is the missing piece to make MPAM usable resctrl in user-space. This has
> shed its debugfs code and the read/write 'event configuration' for the monitors
> to make the series smaller.
>
Thanks for all the help testing and reviewing. James has now sent a pull request to Catalin and he has picked it up.
Pull request:
https://lore.kernel.org/linux-arm-kernel/01f76011-f3c2-4dcb-b3bc-37c7d4de342e@arm.com/
branch in arm64:
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mpam
Ben
^ permalink raw reply
* Re: [PATCH v6 00/40] arm_mpam: Add KVM/arm64 and resctrl glue code
From: Ben Horgan @ 2026-04-13 14:32 UTC (permalink / raw)
To: Rose, Charles
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
tan.shaopeng@fujitsu.com, xhao@linux.alibaba.com,
catalin.marinas@arm.com, will@kernel.org, corbet@lwn.net,
maz@kernel.org, oupton@kernel.org, joey.gouly@arm.com,
suzuki.poulose@arm.com, kvmarm@lists.linux.dev,
zengheng4@huawei.com, linux-doc@vger.kernel.org
In-Reply-To: <DS7PR19MB6351DBDFED61A8C9A89DB391F351A@DS7PR19MB6351.namprd19.prod.outlook.com>
Hi Charles,
On 4/3/26 00:38, Rose, Charles wrote:
> Hi Ben,
>
>> This version of the mpam missing pieces series sees a couple of things
>> dropped or hidden. Memory bandwith utilization with free-running counters
>> is dropped in preference of just always using 'mbm_event' mode (ABMC
>> emulation) which simplifies the code and allows for, in the future,
>> filtering by read/write traffic. So, for the interim, there is no memory
>> bandwidth utilization support. CDP is hidden behind config expert as
>> remount of resctrl fs could potentially lead to out of range PARTIDs being
>> used and the fix requires a change in fs/resctrl. The setting of MPAM2_EL2
>> (for pkvm/nvhe) is dropped as too expensive a write for not much value.
>>
>> There are a couple of 'fixes' at the start of the series which address
>> problems in the base driver but are only user visible due to this series.
>>
>
> I tested cache occupancy and memory bandwidth allocation on a Dell PowerEdge XE8712 with NVIDIA Grace A02P. Both seem to work as expected.
>
> For the series:
>
> Tested-by: Charles Rose <charles.rose@dell.com>
Thanks for testing!
^ permalink raw reply
* Re: [PATCH v6 00/40] arm_mpam: Add KVM/arm64 and resctrl glue code
From: Ben Horgan @ 2026-04-13 14:31 UTC (permalink / raw)
To: Fenghua Yu
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc
In-Reply-To: <8c4f8019-f6eb-4a3b-a6cf-96e533bfa15f@nvidia.com>
On 4/2/26 00:56, Fenghua Yu wrote:
>
>
> On 3/13/26 07:45, Ben Horgan wrote:
>> This version of the mpam missing pieces series sees a couple of things
>> dropped or hidden. Memory bandwith utilization with free-running counters
>> is dropped in preference of just always using 'mbm_event' mode (ABMC
>> emulation) which simplifies the code and allows for, in the future,
>> filtering by read/write traffic. So, for the interim, there is no memory
>> bandwidth utilization support. CDP is hidden behind config expert as
>> remount of resctrl fs could potentially lead to out of range PARTIDs being
>> used and the fix requires a change in fs/resctrl. The setting of MPAM2_EL2
>> (for pkvm/nvhe) is dropped as too expensive a write for not much value.
>>
>> There are a couple of 'fixes' at the start of the series which address
>> problems in the base driver but are only user visible due to this series.
>
> Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Thanks!
Ben
^ permalink raw reply
* [PATCH v3 3/3] Documentation: document panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org>
Document the vm.panic_on_unrecoverable_memory_failure sysctl in the
admin guide, including the CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel
configuration option that allows enabling this behavior at build time.
This follows the same format as panic_on_unrecovered_nmi and other
panic-on-error documentation, providing clear examples of:
- Enabling panic at build time via CONFIG option
- Disabling at runtime via sysctl
- Enabling at runtime via sysctl
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Documentation/admin-guide/sysctl/vm.rst | 46 +++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c9..af545869bc1b4 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
- page-cluster
- page_lock_unfairness
- panic_on_oom
+- panic_on_unrecoverable_memory_failure
- percpu_pagelist_high_fraction
- stat_interval
- stat_refresh
@@ -925,6 +926,51 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
why oom happens. You can get snapshot.
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits an in-use kernel
+page that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation. This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+Pages that reach this path include slab objects (dentry cache, inode
+cache, etc.), page tables, kernel stacks, and other kernel allocations
+that lack the reverse mapping needed to isolate all references.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately. If the ``panic`` sysctl is also non-zero then the
+ machine will be rebooted.
+= =====================================================================
+
+This sysctl can be set to 1 at boot time by enabling the
+``CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC`` kernel configuration option.
+This provides systems with the ability to enforce panic-on-error behavior
+from the kernel build, without requiring runtime sysctl configuration.
+
+Examples:
+
+1. Enable panic on unrecoverable memory failure at kernel build time::
+
+ CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC=y
+
+2. Disable at runtime even when compiled in::
+
+ echo 0 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+3. Enable at runtime when not enabled at build time::
+
+ echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
percpu_pagelist_high_fraction
=============================
--
2.52.0
^ permalink raw reply related
* [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org>
Add a kernel configuration option to enable panic on unrecoverable
memory failures at boot time, similar to CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
and CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
This allows systems that prioritize availability over recovery to
automatically panic when encountering unrecoverable kernel memory
failures. The behavior can still be controlled at runtime via the
panic_on_unrecoverable_memory_failure sysctl.
When enabled, the kernel will panic if:
* A memory failure affects kernel pages that cannot be recovered
* A memory failure affects high-order kernel pages
* A memory failure affects unknown page types that cannot be recovered
Examples of BOOTPARAM configuration usage:
1. Building with the panic option enabled by default:
CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC=y
2. Disabling at runtime even when compiled in:
echo 0 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
3. Enabling at runtime when not compiled in by default:
echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
Similar to other BOOTPARAM options, this provides a balance between:
- Safe defaults (disabled by default without CONFIG option)
- Production flexibility (can be enabled at build time)
- Runtime control (can be toggled via sysctl)
This is consistent with the kernel's approach to other panic-on-error
options that allow systems to choose between attempting recovery or
failing fast when critical errors are detected.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/Kconfig | 9 +++++++++
mm/memory-failure.c | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index ebd8ea353687e..596f24a872ff6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -733,6 +733,15 @@ config MEMORY_FAILURE
even when some of its memory has uncorrected errors. This requires
special hardware support and typically ECC memory.
+config BOOTPARAM_MEMORY_FAILURE_PANIC
+ bool "Panic on unrecoverable memory failure"
+ depends on MEMORY_FAILURE
+ help
+ Say Y here to panic when an unrecoverable memory failure is
+ detected. This covers kernel pages, high-order kernel pages,
+ and unknown page types that cannot be recovered. Can be disabled
+ at runtime via the panic_on_unrecoverable_memory_failure sysctl.
+
config HWPOISON_INJECT
tristate "HWPoison pages injector"
depends on MEMORY_FAILURE && DEBUG_KERNEL && PROC_FS
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 852c595aff108..cf06960b4d069 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,7 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
static int sysctl_enable_soft_offline __read_mostly = 1;
-static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+static int sysctl_panic_on_unrecoverable_mf __read_mostly =
+ IS_ENABLED(CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC);
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
--
2.52.0
^ permalink raw reply related
* [PATCH v3 0/3] mm/memory-failure: add panic option for unrecoverable pages
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
When the memory failure handler encounters an in-use kernel page that it
cannot recover (slab, page tables, kernel stacks, vmalloc, etc.), it
currently logs the error as "Ignored" and continues operation.
This leaves corrupted data accessible to the kernel, which will inevitably
cause either silent data corruption or a delayed crash when the poisoned memory
is next accessed.
This is a common problem on large fleets. We frequently observe multi-bit ECC
errors hitting kernel slab pages, where memory_failure() fails to recover them
and the system crashes later at an unrelated code path, making root cause
analysis unnecessarily difficult.
Here is one specific example from production on an arm64 server: a multi-bit
ECC error hit a dentry cache slab page, memory_failure() failed to recover it
(slab pages are not supported by the hwpoison recovery mechanism), and 67
seconds later d_lookup() accessed the poisoned cache line causing a synchronous
external abort:
[88690.479680] [Hardware Error]: error_type: 3, multi-bit ECC
[88690.498473] Memory failure: 0x40272d: unhandlable page.
[88690.498619] Memory failure: 0x40272d: recovery action for
get hwpoison page: Ignored
...
[88757.847126] Internal error: synchronous external abort:
0000000096000410 [#1] SMP
[88758.061075] pc : d_lookup+0x5c/0x220
This series adds a new sysctl vm.panic_on_unrecoverable_memory_failure
(default 0) that, when enabled, panics immediately on unrecoverable
memory failures. This provides a clean crash dump at the time of the
error, which is far more useful for diagnosis than a random crash later
at an unrelated code path.
This also categorizes reserved pages as MF_MSG_KERNEL, and panics on
unknown page types (MF_MSG_UNKNOWN), so all unrecoverable failure cases
are covered.
A CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option is
also provided, similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC, allowing
the sysctl to be enabled at build time for systems that always want to
panic on unrecoverable memory failures without requiring runtime
configuration.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
how the retry mechanism mitigates false positives from buddy allocator
races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org
Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org
---
Breno Leitao (3):
mm/memory-failure: report MF_MSG_KERNEL for reserved pages
mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option
Documentation: document panic_on_unrecoverable_memory_failure sysctl
Documentation/admin-guide/sysctl/vm.rst | 46 ++++++++++++++++++++++++++++++
mm/Kconfig | 9 ++++++
mm/memory-failure.c | 50 ++++++++++++++++++++++++++++++++-
3 files changed, 104 insertions(+), 1 deletion(-)
---
base-commit: 028ef9c96e96197026887c0f092424679298aae8
change-id: 20260323-ecc_panic-4e473b83087c
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply
* [PATCH v3 1/3] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: Breno Leitao @ 2026-04-13 13:26 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Cc: linux-mm, linux-kernel, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org>
When get_hwpoison_page() returns a negative value, distinguish
reserved pages from other failure cases by reporting MF_MSG_KERNEL
instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
and should be classified accordingly for proper handling by the
panic_on_unrecoverable_memory_failure mechanism.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/memory-failure.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ee42d43613097..852c595aff108 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
+ },
+ {
+ .procname = "panic_on_unrecoverable_memory_failure",
+ .data = &sysctl_panic_on_unrecoverable_mf,
+ .maxlen = sizeof(sysctl_panic_on_unrecoverable_mf),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
}
};
@@ -1281,6 +1292,35 @@ static void update_per_node_mf_stats(unsigned long pfn,
++mf_stats->total;
}
+/*
+ * Determine whether to panic on an unrecoverable memory failure.
+ *
+ * Design rationale: This design opts for immediate panic on kernel memory
+ * failures, capturing clean crashes other than random crashes on MF_IGNORED pages
+ *
+ * This panics on three categories of failures:
+ * - MF_MSG_KERNEL: Reserved pages that cannot be recovered
+ * - MF_MSG_KERNEL_HIGH_ORDER: High-order kernel pages that cannot be recovered
+ * - MF_MSG_UNKNOWN: Pages with unknown state that cannot be classified as recoverable
+ * - and the page is not being recovered (result = MF_IGNORED)
+ *
+ * Note: Transient races are mitigated by memory_failure()'s retry mechanism.
+ * When a buddy allocator race is detected (take_page_off_buddy() fails), the
+ * code clears PageHWPoison and retries the entire memory_failure() flow,
+ * allowing pages to be properly reclassified with updated flags. This ensures
+ * that false posiotives are not misclassified as unrecoverable.
+ *
+ */
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+ enum mf_result result)
+{
+ return sysctl_panic_on_unrecoverable_mf &&
+ result == MF_IGNORED &&
+ (type == MF_MSG_KERNEL ||
+ type == MF_MSG_KERNEL_HIGH_ORDER ||
+ type == MF_MSG_UNKNOWN);
+}
+
/*
* "Dirty/Clean" indication is not 100% accurate due to the possibility of
* setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1298,6 +1338,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
pr_err("%#lx: recovery action for %s: %s\n",
pfn, action_page_types[type], action_name[result]);
+ if (panic_on_unrecoverable_mf(type, result))
+ panic("Memory failure: %#lx: unrecoverable page", pfn);
+
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}
@@ -2432,7 +2475,11 @@ int memory_failure(unsigned long pfn, int flags)
}
goto unlock_mutex;
} else if (res < 0) {
- res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+ if (PageReserved(p))
+ res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+ else
+ res = action_result(pfn, MF_MSG_GET_HWPOISON,
+ MF_IGNORED);
goto unlock_mutex;
}
--
2.52.0
^ permalink raw reply related
* Re: [RFC PATCH] Documentation: Add managed interrupts
From: John Ogness @ 2026-04-13 12:19 UTC (permalink / raw)
To: Valentin Schneider, Sebastian Andrzej Siewior, linux-doc,
linux-kernel
Cc: Aaron Tomlin, Christoph Hellwig, Frederic Weisbecker, Jens Axboe,
Jonathan Corbet, Ming Lei, Thomas Gleixner, Waiman Long,
Peter Zijlstra
In-Reply-To: <xhsmhlderi1f6.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
On 2026-04-13, Valentin Schneider <vschneid@redhat.com> wrote:
> On 01/04/26 13:02, Sebastian Andrzej Siewior wrote:
>> One more point: Given that isolcpus= is marked deprecated as of commit
>> b0d40d2b22fe4 ("sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated")
>>
>> and the 'managed_irq' is evaluated at device's probe time it would
>> require additional callbacks to re-evaluate the situation. Probably for
>> 'io_queue', too. Does is make sense or should we simply drop the
>> "deprecation" notice and allowing using it long term?
>
> AIUI the deprecation notice is more for isolcpus=domain, i.e. the scheduler
> part, but it's still relevant for e.g. managed_irq.
If that is the case, then the deprecation notice should explicitly
target the "domain" flag. Also note that "domain" is the default if no
flag is specified, so it is a bit messy. It is odd to deprecate a
(default) component of a feature and not have a plan how that component
will ever be removed.
The documentation of "domain" already strongly advises to use cpusets
instead. Is that not enough? If so, the deprecation should be dropped.
If there is a strong wish to remove the "domain" boot functionality,
then there should be a new boot arg that can be used for "managed_irq"
and "nohz" features, i.e. get users off the deprecated isolcpus so that
isolcpus can log deprecation notices and be removed someday.
John Ogness
^ permalink raw reply
* Re: [PATCH v4 9/9] Documentation: ABI: Add sysfs ABI documentation for DDR training data
From: Manivannan Sadhasivam @ 2026-04-13 11:59 UTC (permalink / raw)
To: Kishore Batta
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <20260319-sahara_protocol_new_v2-v4-9-47ad79308762@oss.qualcomm.com>
On Thu, Mar 19, 2026 at 12:01:49PM +0530, Kishore Batta wrote:
> Add ABI documentation for the DDR training data sysfs attribute exposed by
> the sahara MHI driver.
>
> The documented sysfs node provides read-only access to the DDR training
> data captured during sahara command mode and exposed via the MHI
> controller device. This allows userspace to read the training data and
> manage it as needed outside the kernel.
>
> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
Ah, this should be squashed with previous patch.
- Mani
> ---
> .../ABI/testing/sysfs-bus-mhi-ddr_training_data | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data b/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data
> new file mode 100644
> index 0000000000000000000000000000000000000000..810b487b5a5fdba133d81255f9879844e3938a10
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data
> @@ -0,0 +1,19 @@
> +What: /sys/bus/mhi/devices/<mhi-cntrl>/ddr_training_data
> +
> +Date: March 2026
> +
> +Contact: Kishore Batta <kishore.batta@oss.qualcomm.com>
> +
> +Description: Contains the DDR training data for the Qualcomm device
> + connected. MHI driver populates different controller
> + nodes for each device. The DDR training data is exposed
> + to userspace to read and save the training data file to
> + the filesystem. In the subsequent boot up of the device,
> + the training data is restored from host to device
> + optimizing the boot up time of the device.
> +
> +Usage: Example for reading DDR training data:
> + cat /sys/bus/mhi/devices/mhi0/ddr_training_data
> +
> +Permissions: The file permissions are set to 0444 allowing read
> + access.
>
> --
> 2.34.1
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox