LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC 00/12] PCI: Add support for Scalable I/O Virtualization
From: Jason Gunthorpe @ 2026-06-04 18:20 UTC (permalink / raw)
  To: Dimitri Daskalakis
  Cc: Bjorn Helgaas, linux-pci, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Mahesh J Salgaonkar,
	Oliver O'Halloran, Niklas Schnelle, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alex Williamson, Kevin Tian,
	Ankit Agrawal, Leon Romanovsky, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Keith Busch, Alexander Duyck,
	Jakub Kicinski, Dimitri Daskalakis, linuxppc-dev, linux-s390, kvm,
	xen-devel
In-Reply-To: <20260604150153.3619662-1-dimitri.daskalakis1@gmail.com>

On Thu, Jun 04, 2026 at 08:01:41AM -0700, Dimitri Daskalakis wrote:
> With this patchset core enumarates the SIOV capability and can identify
> SIOV PFs. But there is no central mechanism to allocate/manage SIOV VFs.
> To support device pass through, devices will need to add a vfio-mdev
> driver with IOMMUFD support (or something similar).

There is an enormous amount of missing work to do something useful
with the SIOVr2 stuff. IIRC there is even supposed to be BIOS
components in this plan and there are some missing PCI SIG topics too
IIRC.

So, I'm not sure how much value there is in merging just the cap
discovery without a roadmap for the missing parts..

Also, I'm quite surprised to see this out of the blue, there is an OCP
workstream that was building out a standard that outlines how all the
different components have to act to successfully implement it.  What
is in PCI SIG was just some minor foundational adjustments without any
context on how to form them into a solution.

I think it is extremely premature to merge anything related to SIOV to
the kernel. Join the OCP work stream if you are interested. I think
the general feeling was there is not sufficient interest in the
industry to do this and it has gone quiet.

Jason


^ permalink raw reply

* Re: [PATCH] smp: prevent soft lockup in smp_call_function_many_cond
From: Paul E. McKenney @ 2026-06-04 17:52 UTC (permalink / raw)
  To: Chris Packham
  Cc: Mark Tomlinson, Madhavan Srinivasan, Thomas Gleixner,
	yury.norov@gmail.com, romank@linux.microsoft.com,
	rafael.j.wysocki@intel.com, riel@surriel.com,
	joelagnelf@nvidia.com, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <7a0693c5-8bda-403e-9889-f65f7f768356@alliedtelesis.co.nz>

On Tue, Jun 02, 2026 at 09:18:39PM +0000, Chris Packham wrote:
> (adding others suggested by get_maintainer.pl)
> 
> On 27/05/2026 15:16, Mark Tomlinson wrote:
> > Using the PowerPC P2040 (e500mc) CPU, soft lockups can occasionally be
> > seen in smp_call_function_many_cond(). The conclusion is that this CPU
> > does not process the doorbell interrupt while in a data-storage (MMU)
> > exception. If more than one CPU in a multi core environment is calling
> > this function at the same time, it is possible for a deadlock to occur.
> >
> > The fix for this is to call flush_smp_call_function_queue() before
> > waiting for responses from other CPUs. If there is something in the
> > queue, this is a good time to process it before busy-waiting on other
> > CPUs. On other architectures this call will quickly do nothing, as the
> > queue will be empty.

OK, I will bite...

How do we know that another entry will not get added by some other CPU
just before this new call to flush_smp_call_function_queue()?

							Thanx, Paul

> > Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
> > ---
> >   kernel/smp.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index a0bb56bd8dda..3c4467654ab0 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -884,6 +884,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> >   		local_irq_restore(flags);
> >   	}
> >   
> > +	flush_smp_call_function_queue();
> > +
> >   	if (run_remote && wait) {
> >   		for_each_cpu(cpu, cfd->cpumask) {
> >   			call_single_data_t *csd;


^ permalink raw reply

* Re: [PATCH] kvm powerpc/book3s-apiv2: Add suite initialization to skip GSB tests without APIv2 support
From: Amit Machhiwal @ 2026-06-04 17:09 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: linuxppc-dev, kvm, kvm-ppc, Madhavan Srinivasan, Michael Ellerman,
	Eric Biggers
In-Reply-To: <20260604092931.344101-1-vaibhav@linux.ibm.com>

Hi Vaibhav,

Thanks for the patch. Please find my comments inline.

On 2026/06/04 02:59 PM, Vaibhav Jain wrote:
> The guest state buffer (GSB) test suite currently fails on systems that
> do not support the PAPR APIv2 nested virtualization. This happens because
> the tests attempt to use APIv2-specific functionality without first
> checking if the host supports it. This was recently reported [1] when
> test-guest-state-buffer kunit tests were being run on Qemu without enabling
> Qemu capability 'cap-nested-papr' which enabled APIv2 nested virtualization
> for PPC64 Pseries Qemu machine.
> 
> Add a suite_init callback that checks for APIv2 support by calling
> plpar_guest_get_capabilities(). If the host does not support APIv2
> (indicated by H_SUCCESS not being returned), mark all test cases in the
> suite as KUNIT_SKIPPED. This prevents test failures on systems without
> APIv2 support while still allowing the tests to run on capable systems.
> 
> [1] https://lore.kernel.org/all/20260603064225.GC18149@sol/
> 
> Reported-by: Eric Biggers <ebiggers@kernel.org>
> Closes: https://lore.kernel.org/all/20260603064225.GC18149@sol
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Assisted-by: Bob:Claude-3.7-Sonnet Bob-Shell
> ---
>  arch/powerpc/kvm/test-guest-state-buffer.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
> index 5ccca306997a..f84b40fa55db 100644
> --- a/arch/powerpc/kvm/test-guest-state-buffer.c
> +++ b/arch/powerpc/kvm/test-guest-state-buffer.c
> @@ -521,6 +521,24 @@ static void test_gs_hostwide_counters(struct kunit *test)
>  	kvmppc_gsb_free(gsb);
>  }
>  
> +static int init_gs_test_suite(struct kunit_suite *suite)
> +{
> +	long rc;
> +	unsigned long host_capabilities;
> +	struct kunit_case *test_case;
> +
> +	/* Enable test suite only if APIv2 is supported */
> +	rc = plpar_guest_get_capabilities(0, &host_capabilities);

I believe we don't really need an hcall overhead to check the
availability of APIv2. We could simply check:

diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
index 5ccca306997a..a263e7f31e15 100644
--- a/arch/powerpc/kvm/test-guest-state-buffer.c
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -521,6 +521,18 @@ static void test_gs_hostwide_counters(struct kunit *test)
        kvmppc_gsb_free(gsb);
 }
 
+static int init_gs_test_suite(struct kunit_suite *suite)
+{
+       struct kunit_case *test_case;
+
+       if (!kvmhv_is_nestedv2()) {
+               kunit_suite_for_each_test_case(suite, test_case)
+                       WRITE_ONCE(test_case->status, KUNIT_SKIPPED);
+       }
+
+       return 0;
+}
+

Also, I understand that these tests exercise gsb related tests specific
to APIv2 but I see that only 'test_gs_hostwide_counters' relies on an
APIv2 specific 'H_GUEST_GET_STATE' hcall but rest of the tests just
operate on in-memory gsb. So, do we really want to skip all the tests
when APIv2 is not available?

If not, we could simply skip this one test as:

diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c b/arch/powerpc/kvm/test-guest-state-buffer.c
index 5ccca306997a..89999b80fdfc 100644
--- a/arch/powerpc/kvm/test-guest-state-buffer.c
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -462,7 +462,10 @@ static void test_gs_hostwide_counters(struct kunit *test)
        int rc;
 
        if (!kvmhv_on_pseries())
-               kunit_skip(test, "This test need a kmv-hv guest");
+               kunit_skip(test, "This test need a kvm-hv guest");
+
+       if (!kvmhv_is_nestedv2())
+               kunit_skip(test, "This test needs an spapr nested APIv2 support");
 
        gsm = kvmppc_gsm_new(&gs_msg_test_hostwide_ops, &test_data, GSM_SEND,
                             GFP_KERNEL);

Please let me know your views.

Thanks,
Amit

> +
> +	if (rc != H_SUCCESS) {
> +		/* Skip all testcases if no APIv2 support */
> +		kunit_suite_for_each_test_case(suite, test_case)
> +			WRITE_ONCE(test_case->status, KUNIT_SKIPPED);
> +	}
> +
> +	return 0;
> +}
> +
>  static struct kunit_case guest_state_buffer_testcases[] = {
>  	KUNIT_CASE(test_creating_buffer),
>  	KUNIT_CASE(test_adding_element),
> @@ -535,6 +553,7 @@ static struct kunit_case guest_state_buffer_testcases[] = {
>  static struct kunit_suite guest_state_buffer_test_suite = {
>  	.name = "guest_state_buffer_test",
>  	.test_cases = guest_state_buffer_testcases,
> +	.suite_init = init_gs_test_suite,
>  };
>  
>  kunit_test_suites(&guest_state_buffer_test_suite);
> -- 
> 2.54.0
> 


^ permalink raw reply related

* [PATCH 2/2] kunit: Add example of test suite that can be skipped at runtime
From: Vaibhav Jain @ 2026-06-04 16:28 UTC (permalink / raw)
  To: linuxppc-dev, kvm, kvm-ppc, linux-kselftest, kunit-dev,
	linux-kernel
  Cc: Vaibhav Jain, Madhavan Srinivasan, Michael Ellerman,
	Brendan Higgins, David Gow, Rae Moar
In-Reply-To: <20260604162805.556135-1-vaibhav@linux.ibm.com>

Add an example test suite name 'example_test_skip_suite' to
'kunit-example-test.c' that shows how to skip an entire test suite based on
runtime conditions.

The example suite 'example_skip_suite' provides a 'suite_init' callback
named example_skip_suite_init() which marks the entire suite as skipped
using kunit_mark_skipped().

This demonstrates a way for conditionally skipping test suites when any
prerequisites for kunit_suite execution are not met. The 'suite_init'
callback can perform any necessary checks and mark the suite as skipped,
preventing all test cases from executing while also indicating why the
suite was skipped.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 lib/kunit/kunit-example-test.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/lib/kunit/kunit-example-test.c b/lib/kunit/kunit-example-test.c
index 0bae7b7ca0b0..b8ded54fa46d 100644
--- a/lib/kunit/kunit-example-test.c
+++ b/lib/kunit/kunit-example-test.c
@@ -591,5 +591,34 @@ static struct kunit_suite example_init_test_suite = {
  */
 kunit_test_init_section_suites(&example_init_test_suite);
 
+/*
+ * This test should always be skipped.
+ */
+static void example_skip_suite_test(struct kunit *test)
+{
+	/* This line should never be seen */
+	KUNIT_FAIL(test, "You should not see a this.");
+}
+
+static struct kunit_case  example_skip_suite_test_cases[] = {
+	KUNIT_CASE(example_skip_suite_test),
+	{}
+};
+
+static int example_skip_suite_init(struct kunit_suite *suite)
+{
+	kunit_mark_skipped(suite, "Test suite expected to be skipped");
+	return 0;
+}
+
+static struct kunit_suite example_test_skip_suite = {
+	.name = "example_skip_suite",
+	.suite_init = example_skip_suite_init,
+	.test_cases = example_skip_suite_test_cases,
+};
+
+/* This registers a test suite that will be skipped */
+kunit_test_suite(example_test_skip_suite);
+
 MODULE_DESCRIPTION("Example KUnit test suite");
 MODULE_LICENSE("GPL v2");
-- 
2.54.0



^ permalink raw reply related

* [PATCH 1/2] kunit: Add ability to skip entire test suites
From: Vaibhav Jain @ 2026-06-04 16:28 UTC (permalink / raw)
  To: linuxppc-dev, kvm, kvm-ppc, linux-kselftest, kunit-dev,
	linux-kernel
  Cc: Vaibhav Jain, Madhavan Srinivasan, Michael Ellerman,
	Brendan Higgins, David Gow, Rae Moar
In-Reply-To: <20260604162805.556135-1-vaibhav@linux.ibm.com>

Currently, KUnit provides mechanisms to skip individual test cases, but
there is no way to skip an entire test suite based on runtime conditions
checked during suite initialization. This limitation forces test suites
to either fail or skip tests individually when certain prerequisites are
not available.

To address this limitation, the patch adds a 'status' field to struct
kunit_suite that allows suite_init callbacks to mark the entire suite as
KUNIT_SKIPPED. When a suite is marked as skipped, all test cases within
that suite are bypassed without execution.

The patch proposes changes to kunit_suite_has_succeeded() to check suite
status before evaluating individual test case results. Also
kunit_run_tests() is updated to skip suite execution if 'kunit_suite.status'
is set to KUNIT_SKIPPED, thats either set before suite_init or by the
suite_init callback itself.

This enables test suites to perform runtime capability checks in their
'suite_init' callback and gracefully skip all tests when prerequisites are
not met, rather than reporting failures or requiring each test case to
perform redundant checks.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 include/kunit/test.h |  1 +
 lib/kunit/test.c     | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/kunit/test.h b/include/kunit/test.h
index ce0573e196ce..395221d623f7 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -285,6 +285,7 @@ struct kunit_suite {
 	struct string_stream *log;
 	int suite_init_err;
 	bool is_init;
+	enum kunit_status status;
 };
 
 /* Stores an array of suites, end points one past the end */
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 99773e000e1b..989acc770265 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -214,6 +214,9 @@ enum kunit_status kunit_suite_has_succeeded(struct kunit_suite *suite)
 	const struct kunit_case *test_case;
 	enum kunit_status status = KUNIT_SKIPPED;
 
+	if (suite->status == KUNIT_SKIPPED)
+		return KUNIT_SKIPPED;
+
 	if (suite->suite_init_err)
 		return KUNIT_FAILURE;
 
@@ -795,12 +798,20 @@ int kunit_run_tests(struct kunit_suite *suite)
 	/* Taint the kernel so we know we've run tests. */
 	add_taint(TAINT_TEST, LOCKDEP_STILL_OK);
 
+	if (suite->status == KUNIT_SKIPPED)
+		goto suite_end;
+
 	if (suite->suite_init) {
 		suite->suite_init_err = suite->suite_init(suite);
 		if (suite->suite_init_err) {
+			suite->status = KUNIT_FAILURE;
 			kunit_err(suite, KUNIT_SUBTEST_INDENT
 				  "# failed to initialize (%d)", suite->suite_init_err);
 			goto suite_end;
+
+		} else if (suite->status == KUNIT_SKIPPED) {
+			/* Skip this kunit suite */
+			goto suite_end;
 		}
 	}
 
-- 
2.54.0



^ permalink raw reply related

* [PATCH 0/2] kunit: Add support for skipping entire test suites
From: Vaibhav Jain @ 2026-06-04 16:27 UTC (permalink / raw)
  To: linuxppc-dev, kvm, kvm-ppc, linux-kselftest, kunit-dev,
	linux-kernel
  Cc: Vaibhav Jain, Madhavan Srinivasan, Michael Ellerman,
	Brendan Higgins, David Gow, Rae Moar

This patch series introduces the ability to skip entire 'kunit_suite'
based on runtime conditions, addressing a limitation where test suites
could only skip individual test cases or fail when prerequisites were not
met.

The motivation for this feature comes from test suites that depend on
specific hardware features, kernel capabilities, or runtime conditions.
Currently, such suites must either:
* Fail when prerequisites are missing
* Skip each test case individually with redundant checks
* Implement workarounds to avoid running tests

An example of such a requirement came from [1] where the patch author
wanted to skip the entire 'kunit_suite' but then had to resort marking all
struct 'kunit_case' as skipped by accessing 'kunit_case.status' private
struct member. This usecase being addressed in the patch[1] can be better
implemented with the changes proposed in this patch series.

Structure of the patch series
=============================
PATCH 1:
* Add a 'status' field to struct kunit_suite that allows 'suite_init'
  callbacks to mark the entire suite as KUNIT_SKIPPED.
* Modify the KUnit core to check this newly introduced 'status' field
  and bypass all test cases when a suite is marked as skipped.

Patch 2:
* Providing an example in kunit-example-test.c demonstrating the usage
  pattern.

The implementation is minimal and non-intrusive, adding only a status field
to kunit_suite and checks in two key functions. Test suites that don't use
this proposed feature should be unaffected.

References
==========
[1] https://lore.kernel.org/all/20260604092931.344101-1-vaibhav@linux.ibm.com

Vaibhav Jain (2):
  kunit: Add ability to skip entire test suites
  kunit: Add example of test suite that can be skipped at runtime

 include/kunit/test.h           |  1 +
 lib/kunit/kunit-example-test.c | 29 +++++++++++++++++++++++++++++
 lib/kunit/test.c               | 11 +++++++++++
 3 files changed, 41 insertions(+)

-- 
2.54.0



^ permalink raw reply

* RE: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Michael Kelley @ 2026-06-04 16:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Michael Kelley, Jason Gunthorpe, Michael Kelley
  Cc: iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <yq5apl26qrof.fsf@kernel.org>

From: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Sent: Thursday, June 4, 2026 7:58 AM
> 
> Michael Kelley <mhklinux@outlook.com> writes:
> 
> > From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
> >>
> >> On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
> >>
> >> > Except that in a normal VM, the "unencrypted" pool attribute does *not*
> >> > describe the state of the memory itself.  In a normal VM, the memory is
> >> > unencrypted, but the "unencrypted" pool attribute is false. That
> >> > contradiction is the essence of my concern.
> >>
> >> I would argue no..
> >>
> >> When CC is enabled the default state of memory in a Linux environment
> >> is "encrypted". You have to take a special action to "decrypt" it.
> >>
> >> Thus the default state of memory in a non-CC environment is also
> >> paradoxically "encrypted" too.
> >
> > The need to have such an unnatural premise is usually an indication
> > of a conceptual problem with the overall model, or perhaps just a
> > terminology problem.
> >
> > Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
> > Name the pool attribute "cc_shared" instead of "unencrypted". Having
> > "cc_shared" set to false in a normal VM doesn't lead to the non-sensical
> > situation of claiming that a normal VM is encrypted. The boolean
> > "unencrypted" parameter that has been added to various calls also
> > becomes "cc_shared".  If "CC_SHARED" is a suitable name for the DMA
> > attribute, it ought to be suitable as the pool attribute. And everything
> > matches as well.
> >
> 
> That is better. It would also simplify:
> 
> 	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> 		return NULL;
> 
> to
> 	if (mem->cc_shared != !!(attrs & DMA_ATTR_CC_SHARED))
> 		return NULL;
> 
> 
> I already sent a v6 in the hope of getting this merged for the next
> merge window. Should I send a v7, or would you prefer that I do the
> rename on top of v6?
> 

I would advocate for a v7 with the rename, vs. a separate follow-on
patch to do the rename, just to reduce churn. But I don't know what
the tradeoffs are in trying to hit the next merge window. If a follow-on
patch is more practical from a timing standpoint, I won't object.

Michael



^ permalink raw reply

* Re: [PATCH v3 2/6] scsi: core: Move scsi_device_from_queue() to scsi_priv.h
From: Bart Van Assche @ 2026-06-04 15:51 UTC (permalink / raw)
  To: Catalin Iacob, Thomas Bogendoerfer, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Rich Felker, John Paul Adrian Glaubitz, David S. Miller,
	Andreas Larsson, James E.J. Bottomley, Martin K. Petersen,
	Jens Axboe, Yoshinori Sato
  Cc: linux-mips, linux-kernel, linuxppc-dev, linux-sh, sparclinux,
	linux-scsi
In-Reply-To: <20260604-remove-pktcdvd-references-v3-2-e2f06fb4eef4@gmail.com>

On 6/4/26 6:20 AM, Catalin Iacob wrote:
> scsi_device_from_queue() is only referenced in drivers/scsi so move its
> prototype to drivers/scsi/scsi_priv.h.
The subject of this patch suggests that the implementation of the
scsi_device_from_queue() function is moved while only the declaration
is moved.

Thanks,

Bart.


^ permalink raw reply

* Re: [PATCH v3 1/6] scsi: core: Remove remaining reference to the pktcdvd driver
From: Bart Van Assche @ 2026-06-04 15:51 UTC (permalink / raw)
  To: Catalin Iacob, Thomas Bogendoerfer, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Rich Felker, John Paul Adrian Glaubitz, David S. Miller,
	Andreas Larsson, James E.J. Bottomley, Martin K. Petersen,
	Jens Axboe, Yoshinori Sato
  Cc: linux-mips, linux-kernel, linuxppc-dev, linux-sh, sparclinux,
	linux-scsi
In-Reply-To: <20260604-remove-pktcdvd-references-v3-1-e2f06fb4eef4@gmail.com>

On 6/4/26 6:20 AM, Catalin Iacob wrote:
> Commit 1cea5180f2f8 ("block: remove pktcdvd driver") left behind an
> export that is now dead code. Remove it.
The subject should say something like "Unexport
scsi_device_from_queue()".

Thanks,

Bart.


^ permalink raw reply

* Re: [PATCH v3 0/6] Remove remaining references to the pktcdvd driver
From: Bart Van Assche @ 2026-06-04 15:50 UTC (permalink / raw)
  To: Catalin Iacob, Thomas Bogendoerfer, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
	Rich Felker, John Paul Adrian Glaubitz, David S. Miller,
	Andreas Larsson, James E.J. Bottomley, Martin K. Petersen,
	Jens Axboe, Yoshinori Sato
  Cc: linux-mips, linux-kernel, linuxppc-dev, linux-sh, sparclinux,
	linux-scsi
In-Reply-To: <20260604-remove-pktcdvd-references-v3-0-e2f06fb4eef4@gmail.com>

On 6/4/26 6:20 AM, Catalin Iacob wrote:
> Found this incidentally while looking at kernel sources to understand
> what pktcdvd is
If this series is reposted, please combine patches 1/6 and 2/6. Anyway,
this series looks good to me.

Thanks,

Bart.


^ permalink raw reply

* Re: [PATCH] tools/perf/sched: Update process names of processes in zombie state for both -s and -S options
From: Arnaldo Carvalho de Melo @ 2026-06-04 15:26 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Anubhav Shelat, Namhyung Kim, Ian Rogers, jolsa, adrian.hunter,
	mpetlan, tmricht, maddy, linux-perf-users, linuxppc-dev, hbathini,
	Tejas.Manhas1, Tanushree.Shah, Shivani.Nittor
In-Reply-To: <5ECBB4A2-57DE-48A0-BCFE-1B99DC4AABEE@linux.ibm.com>

On Thu, Jun 04, 2026 at 08:38:46PM +0530, Athira Rajeev wrote:
> > On 4 Jun 2026, at 7:47 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > 
> > On Thu, May 21, 2026 at 11:17:58AM -0300, Arnaldo Carvalho de Melo wrote:
> >> On Thu, May 21, 2026 at 02:02:53PM +0530, Athira Rajeev wrote:
> >>>> On 27 Apr 2026, at 11:26 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> >>>> On Sun, Apr 26, 2026 at 03:09:30PM +0530, Athira Rajeev wrote:
> >>>>> In redhat perftool testsuite, observed fail for this test:
> >>>>>   -- [ FAIL ] -- perf_sched :: test_timehist :: --with-summary (output regexp parsing)
> >>>>> 
> >>>>> This led to analysis of "perf sched timehist" summary options.
> >>>>> 
> >>>>> # perf sched record -a -o ./perf.data -- sleep 0.1
> >>>>>  This will record using perf sched record
> >>>>> 
> >>>>> perf sched timeliest has two options "-s" and "-S"
> >>>>> # perf sched -i ./perf.data timehist -S
> >>>>> -S : Captures summary also at the end
> >>>>> 
> >>>>> # perf sched -i ./perf.data timehist -s
> >>>>> -s : Captures only summary
> >>>>> 
> >>>>> The test saves -s result which has only summary and compares with
> >>>>> summary which comes at the end from -S . Since there is a difference
> >>>>> in these two, test fails.
> >>>>> 
> >>>>> Checking the behaviour change in -S and -s results, difference is:
> >>>>> 
> >>>>>                 rcu_sched[16]       2          4        0.013      0.001       0.003       0.006   33.23       0
> >>>>>              migration/11[73]       2          1        0.006      0.006       0.006       0.006    0.00       0
> >>>>>               migration/3[33]       2          1        0.006      0.006       0.006       0.006    0.00       0
> >>>>> -               :216753[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
> >>>>> +                 sleep[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
> >>>>>               migration/8[58]       2          1        0.005      0.005       0.005       0.005    0.00       0
> >>>>>           NetworkManager[811]       1          2        0.089      0.028       0.044       0.060   36.06       0
> >>>>>              migration/13[83]       2          1        0.005      0.005       0.005       0.005    0.00       0
> >>>>> 
> >>>>> Here 216753 is pid for sleep which is a zombie process. This is
> >>>>> happening in latest kernel due to an update in "-S" result.
> >>>>> In -S, the process name appears in the results "sleep[216753]",
> >>>>> where as in the -s, only pid is present in the summary result
> >>>>> ":216753[216753]".
> >>>>> 
> >>>>> After commit 39f473f6d0b2 ("perf sched timehist: decode process names
> >>>>> of processes in zombie state")
> >>>>> for -S option, if process name is using pid, it uses different way to
> >>>>> set it. So that we get the process name and not just Pid.
> >>>>> 
> >>>>> This change went in only for timehist_print_sample() function.
> >>>>> Add this improvement in generic place so that even -s option (which
> >>>>> captures summary) also will have meaningful information.
> >>>>> 
> >>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> >>>> 
> >>>> Acked-by: Namhyung Kim <namhyung@kernel.org>
> >>>> 
> >>>> Thanks,
> >>>> Namhyung
> >>> Hi,
> >>> 
> >>> Can we please have this pulled in, if the patch looks fine ?
> >> 
> >> Can you please check applying it on top of current perf-tools-next?
> > 
> > So, this seems to be also addressed by:
> > 
> > commit 39f473f6d0b24cf375893f2110b1cc9d8a079a42
> > Author: Anubhav Shelat <ashelat@redhat.com>
> > Date:   Wed Jul 16 16:39:15 2025 -0400
> > 
> >    perf sched timehist: decode process names of processes in zombie state
> > 
> >    Previously when running perf trace timehist --state, when recording
> >    processes in the zombie state the process name would not be decoded
> >    properly and appears with just the PID:
> > 
> >    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
> >    1140057.412222 [0012]  :1248612[1248612]                   0.000      0.000      0.332      Z
> >    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
> >    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
> >    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S
> > 
> >    Now some extra processing has been added to decode the process name:
> > 
> >    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
> >    1140057.412222 [0012]  sleep[1248612]                      0.000      0.000      0.332      Z
> >    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
> >    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
> >    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S
> > 
> >    Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
> >    Link: https://lore.kernel.org/r/20250716203914.45772-2-ashelat@redhat.com
> >    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > 
> > 
> > No? It is not applying to perf-tools-next, a quick look found the patch
> > above.
> 
> Hi Arnaldo
> 
> commit 39f473f6d0b2 ("perf sched timehist: decode process names
> of processes in zombie state”)
> added change for -S option. The patch I submitted is to add change in process name for “-s” option as well
> 
> I will check applying this on top of current perf-tools-next

Thanks for looking into this!

- Arnaldo


^ permalink raw reply

* Re: [PATCH V4 1/2] tools/perf: Fix the check for parameterized field in event term
From: Athira Rajeev @ 2026-06-04 15:12 UTC (permalink / raw)
  To: acme, jolsa, adrian.hunter, irogers, namhyung
  Cc: linux-perf-users, linuxppc-dev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, shivani, Thomas Richter, Madhavan Srinivasan,
	mpetlan
In-Reply-To: <4D09EA54-8065-48C2-B1DF-95D2AA15C6B6@linux.ibm.com>



> On 21 May 2026, at 12:07 PM, Athira Rajeev <atrajeev@linux.ibm.com> wrote:
> 
> 
> 
>> On 4 May 2026, at 9:12 PM, Athira Rajeev <atrajeev@linux.ibm.com> wrote:
>> 
>> The format_alias() function in util/pmu.c has a check to
>> detect whether the event has parameterized field ( =? ).
>> The string alias->terms contains the event and if the event
>> has user configurable parameter, there will be presence of
>> sub string "=?" in the alias->terms.
>> 
>> Snippet of code:
>> 
>> /* Paramemterized events have the parameters shown. */
>>      if (strstr(alias->terms, "=?")) {
>>              /* No parameters. */
>>              snprintf(buf, len, "%.*s/%s/", (int)pmu_name_len, pmu->name, alias->name);
>> 
>> if "strstr" contains the substring, it returns a pointer
>> and hence enters the above check which is not the expected
>> check. And hence "perf list" doesn't have the parameterized
>> fields in the result.
>> 
>> Fix this check to use:
>> 
>> if (!strstr(alias->terms, "=?")) {
>> 
>> With this change, perf list shows the events correctly with
>> the strings showing parameters.
>> 
>> Before the fix:
>> 
>> # ./perf list|grep -w PM_PAU_CYC
>> hv_24x7/PM_PAU_CYC/                                [Kernel PMU event]
>> 
>> With this fix:
>> 
>> # ./perf list|grep -w PM_PAU_CYC
>> hv_24x7/PM_PAU_CYC,chip=?/                         [Kernel PMU event]
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> 
> Hi,
> 
> Can we please have this pulled in, if the changes looks fine.
> 
> Thanks
> Athira

Hi,

Looking for any further review comments on this patchset. Please suggest if any changes needs to be addressed.

Thanks
Athira

>> ---
>> Changelog:
>> v3 -> v4:
>> Updated commit message to show real example
>> addressing review comment from Namhyung.
>> 
>> v2 -> v3:
>> Split the strstr correction in a single patch
>> 
>> tools/perf/util/pmu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
>> index 23337d2fa281..0b8d58543f17 100644
>> --- a/tools/perf/util/pmu.c
>> +++ b/tools/perf/util/pmu.c
>> @@ -2117,7 +2117,7 @@ static char *format_alias(char *buf, int len, const struct perf_pmu *pmu,
>>  skip_duplicate_pmus);
>> 
>> /* Paramemterized events have the parameters shown. */
>> - if (strstr(alias->terms, "=?")) {
>> + if (!strstr(alias->terms, "=?")) {
>> /* No parameters. */
>> snprintf(buf, len, "%.*s/%s/", (int)pmu_name_len, pmu->name, alias->name);
>> return buf;
>> -- 
>> 2.47.3




^ permalink raw reply

* Re: [PATCH] powerpc/vtime: Initialize starttime at boot for native accounting
From: Christophe Leroy (CS GROUP) @ 2026-06-04 15:12 UTC (permalink / raw)
  To: Shrikanth Hegde, maddy, linuxppc-dev; +Cc: frederic
In-Reply-To: <e69ad352-6835-414b-845b-03f45fdd9a45@kernel.org>



Le 04/06/2026 à 16:51, Christophe Leroy (CS GROUP) a écrit :
> 
> 
> Le 04/06/2026 à 15:24, Shrikanth Hegde a écrit :
>> It was observed that /proc/stat had very large value for one ore more
>> CPUs. It was more visible after recent code simplifications around
>> cpustats.
>>
>> System has 240 CPUs.
>>
>> cat /proc/uptime;
>> 194.18 46500.55
>> cat /proc/stat
>> cpu  5966 39 837032887 4650070 164 185 100 0 0 0
>> cpu0 108 0 837030890 19109 24 4 23 0 0 0
>>
>> Since uptime is 194s, system time of each CPU can't be more than 19400.
>> Sum of system time  of all CPUs can't be more than 19400*240 4656000.
>> In fact huge value is close to mftb(). Note mftb doesn't reset on powerVM
>> when the LPAR restart. It only resets when whole system resets. The same
>> issue exists for kexec too.
>>
>> This happens since starttime is not setup at init time. Once it is set
>> then subsequent vtime_delta will return the right delta.
>>
>> Fix it by initializing the starttime during CPU initialization. This
>> fixes the large times seen.
>>
>> cat /proc/uptime; cat /proc/stat
>> 15.78 3694.63
>> cpu  6035 35 1347 369479 23 144 49 0 0 0
>> cpu0 19 0 38 1508 0 1 14 0 0 0
>>
>> Now, system time is reported as expected.
>>
>> Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> 
> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
> 
>> ---
>>
>> Christophe, I have taken the patch as is from the discussion we had.
>> Let me know if i should send it with your signed-off-by tag. I have just
>> written the changelog. I sent it like this since tag was not there.
> 
> Suggested-by is fine for me.
> 
>>
>> discussion thread:
>> https://eur01.safelinks.protection.outlook.com/? 
>> url=https%3A%2F%2Flore.kernel.org%2Fall%2Fcd10be19- 
>> e0bc-4e0c-8dac-4f1c05d0de8f%40kernel.org%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C1702c7a5ff63417da4ef08dec248be1d%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C639161814851391682%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=DjLhS3imHryT613JPVG0mnkWjILSGGrt1tLvmqFHtVM%3D&reserved=0
>>
>> Also, does this warrant Fixes tag? I found these two likely candidates.
>> Likely this issues exists since beginning.
>> c223c90386bc powerpc32: provide VIRT_CPU_ACCOUNTING
> 
> You say system has 240 CPU so I suppose this is not ppc32. That commit 
> wsa not supposed to change anything for ppc64, did you identify anything 
> special in that commit related to ppc64 ?
> 
>> b38a181c11d0 powerpc/time: isolate scaled cputime accounting in 
>> dedicated functions.
> 
> This one is also pure code re-organisation, unless you've been able to 
> spot a particular issue ?

Maybe commit cf9efce0ce31 ("powerpc: Account time using timebase rather 
than PURR")

It removed snapshot_timebases() and I can't see anything to replace it.


> 
>>
>>   arch/powerpc/kernel/time.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>> index 3460d1a5a97c..11145c40183d 100644
>> --- a/arch/powerpc/kernel/time.c
>> +++ b/arch/powerpc/kernel/time.c
>> @@ -377,7 +377,6 @@ void vtime_task_switch(struct task_struct *prev)
>>       }
>>   }
>> -#ifdef CONFIG_NO_HZ_COMMON
>>   /**
>>    * vtime_reset - Fast forward vtime entry clocks
>>    *
>> @@ -394,6 +393,7 @@ void vtime_reset(void)
>>   #endif
>>   }
>> +#ifdef CONFIG_NO_HZ_COMMON
>>   /**
>>    * vtime_dyntick_start - Inform vtime about entry to idle-dynticks
>>    *
>> @@ -933,6 +933,7 @@ static void __init set_decrementer_max(void)
>>   static void __init init_decrementer_clockevent(void)
>>   {
>>       register_decrementer_clockevent(smp_processor_id());
>> +    vtime_reset();
>>   }
>>   void secondary_cpu_time_init(void)
>> @@ -948,6 +949,7 @@ void secondary_cpu_time_init(void)
>>       /* FIME: Should make unrelated change to move snapshot_timebase
>>        * call here ! */
>>       register_decrementer_clockevent(smp_processor_id());
>> +    vtime_reset();
>>   }
>>   /*
> 



^ permalink raw reply

* Re: [RFC 07/12] PCI: Convert xen-pciback and pci-driver to pci_is_sriov_* helpers
From: Juergen Gross @ 2026-06-04 15:11 UTC (permalink / raw)
  To: Dimitri Daskalakis, Bjorn Helgaas
  Cc: linux-pci, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Mahesh J Salgaonkar, Oliver O'Halloran,
	Niklas Schnelle, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Alex Williamson, Jason Gunthorpe, Kevin Tian, Ankit Agrawal,
	Leon Romanovsky, Stefano Stabellini, Oleksandr Tyshchenko,
	Keith Busch, Alexander Duyck, Jakub Kicinski, Dimitri Daskalakis,
	linuxppc-dev, linux-s390, kvm, xen-devel
In-Reply-To: <20260604150153.3619662-8-dimitri.daskalakis1@gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 283 bytes --]

On 04.06.26 17:01, Dimitri Daskalakis wrote:
> From: Dimitri Daskalakis <daskald@meta.com>
> 
> No functional changes.
> 
> Assisted-by: Claude:claude-opus-4.7
> Signed-off-by: Dimitri Daskalakis <daskald@meta.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH] tools/perf/sched: Update process names of processes in zombie state for both -s and -S options
From: Athira Rajeev @ 2026-06-04 15:08 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Anubhav Shelat, Namhyung Kim, Ian Rogers, jolsa, adrian.hunter,
	mpetlan, tmricht, maddy, linux-perf-users, linuxppc-dev, hbathini,
	Tejas.Manhas1, Tanushree.Shah, Shivani.Nittor
In-Reply-To: <aiGI--j2nJ_kas60@x1>



> On 4 Jun 2026, at 7:47 PM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> On Thu, May 21, 2026 at 11:17:58AM -0300, Arnaldo Carvalho de Melo wrote:
>> On Thu, May 21, 2026 at 02:02:53PM +0530, Athira Rajeev wrote:
>>>> On 27 Apr 2026, at 11:26 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>>> On Sun, Apr 26, 2026 at 03:09:30PM +0530, Athira Rajeev wrote:
>>>>> In redhat perftool testsuite, observed fail for this test:
>>>>>   -- [ FAIL ] -- perf_sched :: test_timehist :: --with-summary (output regexp parsing)
>>>>> 
>>>>> This led to analysis of "perf sched timehist" summary options.
>>>>> 
>>>>> # perf sched record -a -o ./perf.data -- sleep 0.1
>>>>>  This will record using perf sched record
>>>>> 
>>>>> perf sched timeliest has two options "-s" and "-S"
>>>>> # perf sched -i ./perf.data timehist -S
>>>>> -S : Captures summary also at the end
>>>>> 
>>>>> # perf sched -i ./perf.data timehist -s
>>>>> -s : Captures only summary
>>>>> 
>>>>> The test saves -s result which has only summary and compares with
>>>>> summary which comes at the end from -S . Since there is a difference
>>>>> in these two, test fails.
>>>>> 
>>>>> Checking the behaviour change in -S and -s results, difference is:
>>>>> 
>>>>>                 rcu_sched[16]       2          4        0.013      0.001       0.003       0.006   33.23       0
>>>>>              migration/11[73]       2          1        0.006      0.006       0.006       0.006    0.00       0
>>>>>               migration/3[33]       2          1        0.006      0.006       0.006       0.006    0.00       0
>>>>> -               :216753[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
>>>>> +                 sleep[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
>>>>>               migration/8[58]       2          1        0.005      0.005       0.005       0.005    0.00       0
>>>>>           NetworkManager[811]       1          2        0.089      0.028       0.044       0.060   36.06       0
>>>>>              migration/13[83]       2          1        0.005      0.005       0.005       0.005    0.00       0
>>>>> 
>>>>> Here 216753 is pid for sleep which is a zombie process. This is
>>>>> happening in latest kernel due to an update in "-S" result.
>>>>> In -S, the process name appears in the results "sleep[216753]",
>>>>> where as in the -s, only pid is present in the summary result
>>>>> ":216753[216753]".
>>>>> 
>>>>> After commit 39f473f6d0b2 ("perf sched timehist: decode process names
>>>>> of processes in zombie state")
>>>>> for -S option, if process name is using pid, it uses different way to
>>>>> set it. So that we get the process name and not just Pid.
>>>>> 
>>>>> This change went in only for timehist_print_sample() function.
>>>>> Add this improvement in generic place so that even -s option (which
>>>>> captures summary) also will have meaningful information.
>>>>> 
>>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
>>>> 
>>>> Acked-by: Namhyung Kim <namhyung@kernel.org>
>>>> 
>>>> Thanks,
>>>> Namhyung
>>> Hi,
>>> 
>>> Can we please have this pulled in, if the patch looks fine ?
>> 
>> Can you please check applying it on top of current perf-tools-next?
> 
> So, this seems to be also addressed by:
> 
> commit 39f473f6d0b24cf375893f2110b1cc9d8a079a42
> Author: Anubhav Shelat <ashelat@redhat.com>
> Date:   Wed Jul 16 16:39:15 2025 -0400
> 
>    perf sched timehist: decode process names of processes in zombie state
> 
>    Previously when running perf trace timehist --state, when recording
>    processes in the zombie state the process name would not be decoded
>    properly and appears with just the PID:
> 
>    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
>    1140057.412222 [0012]  :1248612[1248612]                   0.000      0.000      0.332      Z
>    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
>    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
>    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S
> 
>    Now some extra processing has been added to decode the process name:
> 
>    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
>    1140057.412222 [0012]  sleep[1248612]                      0.000      0.000      0.332      Z
>    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
>    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
>    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S
> 
>    Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
>    Link: https://lore.kernel.org/r/20250716203914.45772-2-ashelat@redhat.com
>    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> 
> 
> No? It is not applying to perf-tools-next, a quick look found the patch
> above.

Hi Arnaldo

commit 39f473f6d0b2 ("perf sched timehist: decode process names
of processes in zombie state”)
added change for -S option. The patch I submitted is to add change in process name for “-s” option as well

I will check applying this on top of current perf-tools-next

Thanks
Athira


> 
> - Arnaldo




^ permalink raw reply

* RE: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Aneesh Kumar K.V @ 2026-06-04 14:57 UTC (permalink / raw)
  To: Michael Kelley, Jason Gunthorpe, Michael Kelley
  Cc: iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <SN6PR02MB4157F94C902B78E55E99372DD4102@SN6PR02MB4157.namprd02.prod.outlook.com>

Michael Kelley <mhklinux@outlook.com> writes:

> From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
>> 
>> On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
>> 
>> > Except that in a normal VM, the "unencrypted" pool attribute does *not*
>> > describe the state of the memory itself.  In a normal VM, the memory is
>> > unencrypted, but the "unencrypted" pool attribute is false. That
>> > contradiction is the essence of my concern.
>> 
>> I would argue no..
>> 
>> When CC is enabled the default state of memory in a Linux environment
>> is "encrypted". You have to take a special action to "decrypt" it.
>> 
>> Thus the default state of memory in a non-CC environment is also
>> paradoxically "encrypted" too. 
>
> The need to have such an unnatural premise is usually an indication
> of a conceptual problem with the overall model, or perhaps just a
> terminology problem. 
>
> Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
> Name the pool attribute "cc_shared" instead of "unencrypted". Having
> "cc_shared" set to false in a normal VM doesn't lead to the non-sensical
> situation of claiming that a normal VM is encrypted. The boolean
> "unencrypted" parameter that has been added to various calls also
> becomes "cc_shared".  If "CC_SHARED" is a suitable name for the DMA
> attribute, it ought to be suitable as the pool attribute. And everything
> matches as well.
>

That is better. It would also simplify:

	if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
		return NULL;

to
	if (mem->cc_shared != !!(attrs & DMA_ATTR_CC_SHARED))
		return NULL;


I already sent a v6 in the hope of getting this merged for the next
merge window. Should I send a v7, or would you prefer that I do the
rename on top of v6?

-aneesh


^ permalink raw reply

* Re: [PATCH] powerpc/vtime: Initialize starttime at boot for native accounting
From: Christophe Leroy (CS GROUP) @ 2026-06-04 14:51 UTC (permalink / raw)
  To: Shrikanth Hegde, maddy, linuxppc-dev, christophe.leroy; +Cc: frederic
In-Reply-To: <20260604132429.297665-1-sshegde@linux.ibm.com>



Le 04/06/2026 à 15:24, Shrikanth Hegde a écrit :
> It was observed that /proc/stat had very large value for one ore more
> CPUs. It was more visible after recent code simplifications around
> cpustats.
> 
> System has 240 CPUs.
> 
> cat /proc/uptime;
> 194.18 46500.55
> cat /proc/stat
> cpu  5966 39 837032887 4650070 164 185 100 0 0 0
> cpu0 108 0 837030890 19109 24 4 23 0 0 0
> 
> Since uptime is 194s, system time of each CPU can't be more than 19400.
> Sum of system time  of all CPUs can't be more than 19400*240 4656000.
> In fact huge value is close to mftb(). Note mftb doesn't reset on powerVM
> when the LPAR restart. It only resets when whole system resets. The same
> issue exists for kexec too.
> 
> This happens since starttime is not setup at init time. Once it is set
> then subsequent vtime_delta will return the right delta.
> 
> Fix it by initializing the starttime during CPU initialization. This
> fixes the large times seen.
> 
> cat /proc/uptime; cat /proc/stat
> 15.78 3694.63
> cpu  6035 35 1347 369479 23 144 49 0 0 0
> cpu0 19 0 38 1508 0 1 14 0 0 0
> 
> Now, system time is reported as expected.
> 
> Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>

> ---
> 
> Christophe, I have taken the patch as is from the discussion we had.
> Let me know if i should send it with your signed-off-by tag. I have just
> written the changelog. I sent it like this since tag was not there.

Suggested-by is fine for me.

> 
> discussion thread:
> https://lore.kernel.org/all/cd10be19-e0bc-4e0c-8dac-4f1c05d0de8f@kernel.org/
> 
> Also, does this warrant Fixes tag? I found these two likely candidates.
> Likely this issues exists since beginning.
> c223c90386bc powerpc32: provide VIRT_CPU_ACCOUNTING

You say system has 240 CPU so I suppose this is not ppc32. That commit 
wsa not supposed to change anything for ppc64, did you identify anything 
special in that commit related to ppc64 ?

> b38a181c11d0 powerpc/time: isolate scaled cputime accounting in dedicated functions.

This one is also pure code re-organisation, unless you've been able to 
spot a particular issue ?

> 
>   arch/powerpc/kernel/time.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 3460d1a5a97c..11145c40183d 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -377,7 +377,6 @@ void vtime_task_switch(struct task_struct *prev)
>   	}
>   }
>   
> -#ifdef CONFIG_NO_HZ_COMMON
>   /**
>    * vtime_reset - Fast forward vtime entry clocks
>    *
> @@ -394,6 +393,7 @@ void vtime_reset(void)
>   #endif
>   }
>   
> +#ifdef CONFIG_NO_HZ_COMMON
>   /**
>    * vtime_dyntick_start - Inform vtime about entry to idle-dynticks
>    *
> @@ -933,6 +933,7 @@ static void __init set_decrementer_max(void)
>   static void __init init_decrementer_clockevent(void)
>   {
>   	register_decrementer_clockevent(smp_processor_id());
> +	vtime_reset();
>   }
>   
>   void secondary_cpu_time_init(void)
> @@ -948,6 +949,7 @@ void secondary_cpu_time_init(void)
>   	/* FIME: Should make unrelated change to move snapshot_timebase
>   	 * call here ! */
>   	register_decrementer_clockevent(smp_processor_id());
> +	vtime_reset();
>   }
>   
>   /*



^ permalink raw reply

* Re: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Jason Gunthorpe @ 2026-06-04 14:30 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Aneesh Kumar K.V, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <SN6PR02MB4157F94C902B78E55E99372DD4102@SN6PR02MB4157.namprd02.prod.outlook.com>

On Thu, Jun 04, 2026 at 02:05:35PM +0000, Michael Kelley wrote:
> From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
> > 
> > On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
> > 
> > > Except that in a normal VM, the "unencrypted" pool attribute does *not*
> > > describe the state of the memory itself.  In a normal VM, the memory is
> > > unencrypted, but the "unencrypted" pool attribute is false. That
> > > contradiction is the essence of my concern.
> > 
> > I would argue no..
> > 
> > When CC is enabled the default state of memory in a Linux environment
> > is "encrypted". You have to take a special action to "decrypt" it.
> > 
> > Thus the default state of memory in a non-CC environment is also
> > paradoxically "encrypted" too. 
> 
> The need to have such an unnatural premise is usually an indication
> of a conceptual problem with the overall model, or perhaps just a
> terminology problem. 

Oh yes I do think the AMD derived terminogy is aweful :(

> Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
> Name the pool attribute "cc_shared" instead of "unencrypted". 

Yeah maybe. I sometimes imagine replacing the encrypted/decrypted
names with cc_shared too just to make it sane.

> "cc_shared" set to false in a normal VM doesn't lead to the non-sensical
> situation of claiming that a normal VM is encrypted.

It seems like a good idea to me

Jason


^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3S HV: Validate arch_compat against host compatibility mode
From: Gautam Menghani @ 2026-06-04 14:20 UTC (permalink / raw)
  To: Amit Machhiwal
  Cc: linuxppc-dev, Madhavan Srinivasan, Vaibhav Jain,
	Harsh Prateek Bora, Ritesh Harjani, Anushree Mathur,
	Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
	kvm, stable, linux-kernel
In-Reply-To: <20260603141539.47620-1-amachhiw@linux.ibm.com>

On Wed, Jun 03, 2026 at 07:45:39PM +0530, Amit Machhiwal wrote:
> On IBM POWER systems, newer processor generations can operate in
> compatibility modes corresponding to earlier generations. This becomes
> relevant for nested virtualization, where nested KVM guests may need to
> run with a specific processor compatibility level.
> 
> Currently, when running a nested KVM guest (L2) inside a Power11 pSeries
> logical partition (L1) booted in Power10 compatibility mode, the guest
> fails to boot while setting 'arch_compat'. This happens because the CPU
> class is derived from the hardware PVR (via mfspr()), which reflects the
> physical processor generation (Power11), rather than the effective
> compatibility mode (Power10).
> 
> As a result, userspace may request a Power11 arch_compat for the L2
> guest. However, the L1 partition, running in Power10 compatibility, has
> only negotiated support up to Power10 with the Power Hypervisor (L0).
> When H_GUEST_SET_STATE is invoked with a Power11 Logical PVR, the
> hypervisor rejects the request, leading to a late guest boot failure:
> 
>   KVM-NESTEDv2: couldn't set guest wide elements
>   [..KVM reg dump..]
> 
> This situation should be detected earlier. Rejecting unsupported
> 'arch_compat' values in 'kvmppc_set_arch_compat()' avoids issuing an
> invalid H_GUEST_SET_STATE hcall and provides a clearer failure mode.
> 
> Add a check to reject Power11 'arch_compat' requests when the host is
> running in Power10 compatibility mode, returning -EINVAL early instead
> of deferring the failure to the hypervisor.
> 
> Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>
> Cc: <stable@vger.kernel.org> # v6.13+
> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> ---
> Changelog:
> 
> * Moved this patch out of the v3 series [1] as discussed here [2]
> * Addressed below review comments from Ritesh:
>   - Based the PVR validation on cpu features
>   - Fixed hcall name typo
>   - Stable backport
> 
> [1] https://lore.kernel.org/all/20260522152744.55251-1-amachhiw@linux.ibm.com/
> [2] https://lore.kernel.org/all/20260522152744.55251-2-amachhiw@linux.ibm.com/
> ---
>  arch/powerpc/kvm/book3s_hv.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 61dbeea317f3..e16dbb199366 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -446,7 +446,17 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
>  			guest_pcr_bit = PCR_ARCH_300;
>  			break;
>  		case PVR_ARCH_31:
> +			guest_pcr_bit = PCR_ARCH_31;
> +			break;
>  		case PVR_ARCH_31_P11:
> +			/*
> +			 * Need to check this for ISA 3.1, as Power10 and
> +			 * Power11 share the same PCR. For any subsequent ISA
> +			 * versions, this will be taken care of by the guest vs
> +			 * host PCR comparison below.
> +			 */
> +			if (!cpu_has_feature(CPU_FTR_P11_PVR))
> +				return -EINVAL;
>  			guest_pcr_bit = PCR_ARCH_31;
>  			break;
>  		default:
> 
> base-commit: ba3e43a9e601636f5edb54e259a74f96ca3b8fd8
> -- 
> 2.50.1 (Apple Git-155)
> 

I booted a KVM guest on LPAR with this patch in the following scenarios:
1. P10 guest on P10 host: No error observed
2. P11 guest on P11 host: No error observed
3. P11 guest on P11 host booted in P10 compat mode: No error observed

Tested-by: Gautam Menghani <gautam@linux.ibm.com>


^ permalink raw reply

* Re: [PATCH] tools/perf/sched: Update process names of processes in zombie state for both -s and -S options
From: Arnaldo Carvalho de Melo @ 2026-06-04 14:17 UTC (permalink / raw)
  To: Athira Rajeev, Anubhav Shelat
  Cc: Namhyung Kim, Ian Rogers, jolsa, adrian.hunter, mpetlan, tmricht,
	maddy, linux-perf-users, linuxppc-dev, hbathini, Tejas.Manhas1,
	Tanushree.Shah, Shivani.Nittor
In-Reply-To: <ag8UFqPmRu8v4g_9@x1>

On Thu, May 21, 2026 at 11:17:58AM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, May 21, 2026 at 02:02:53PM +0530, Athira Rajeev wrote:
> > > On 27 Apr 2026, at 11:26 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > > On Sun, Apr 26, 2026 at 03:09:30PM +0530, Athira Rajeev wrote:
> > >> In redhat perftool testsuite, observed fail for this test:
> > >>    -- [ FAIL ] -- perf_sched :: test_timehist :: --with-summary (output regexp parsing)
> > >> 
> > >> This led to analysis of "perf sched timehist" summary options.
> > >> 
> > >>  # perf sched record -a -o ./perf.data -- sleep 0.1
> > >>   This will record using perf sched record
> > >> 
> > >> perf sched timeliest has two options "-s" and "-S"
> > >>  # perf sched -i ./perf.data timehist -S
> > >> -S : Captures summary also at the end
> > >> 
> > >>  # perf sched -i ./perf.data timehist -s
> > >> -s : Captures only summary
> > >> 
> > >> The test saves -s result which has only summary and compares with
> > >> summary which comes at the end from -S . Since there is a difference
> > >> in these two, test fails.
> > >> 
> > >> Checking the behaviour change in -S and -s results, difference is:
> > >> 
> > >>                  rcu_sched[16]       2          4        0.013      0.001       0.003       0.006   33.23       0
> > >>               migration/11[73]       2          1        0.006      0.006       0.006       0.006    0.00       0
> > >>                migration/3[33]       2          1        0.006      0.006       0.006       0.006    0.00       0
> > >> -               :216753[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
> > >> +                 sleep[216753]      -1          1        0.041      0.041       0.041       0.041    0.00       0
> > >>                migration/8[58]       2          1        0.005      0.005       0.005       0.005    0.00       0
> > >>            NetworkManager[811]       1          2        0.089      0.028       0.044       0.060   36.06       0
> > >>               migration/13[83]       2          1        0.005      0.005       0.005       0.005    0.00       0
> > >> 
> > >> Here 216753 is pid for sleep which is a zombie process. This is
> > >> happening in latest kernel due to an update in "-S" result.
> > >> In -S, the process name appears in the results "sleep[216753]",
> > >> where as in the -s, only pid is present in the summary result
> > >> ":216753[216753]".
> > >> 
> > >> After commit 39f473f6d0b2 ("perf sched timehist: decode process names
> > >> of processes in zombie state")
> > >> for -S option, if process name is using pid, it uses different way to
> > >> set it. So that we get the process name and not just Pid.
> > >> 
> > >> This change went in only for timehist_print_sample() function.
> > >> Add this improvement in generic place so that even -s option (which
> > >> captures summary) also will have meaningful information.
> > >> 
> > >> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
> > > 
> > > Acked-by: Namhyung Kim <namhyung@kernel.org>
> > > 
> > > Thanks,
> > > Namhyung
> > Hi,
> > 
> > Can we please have this pulled in, if the patch looks fine ?
> 
> Can you please check applying it on top of current perf-tools-next?

So, this seems to be also addressed by:

commit 39f473f6d0b24cf375893f2110b1cc9d8a079a42
Author: Anubhav Shelat <ashelat@redhat.com>
Date:   Wed Jul 16 16:39:15 2025 -0400

    perf sched timehist: decode process names of processes in zombie state

    Previously when running perf trace timehist --state, when recording
    processes in the zombie state the process name would not be decoded
    properly and appears with just the PID:

    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
    1140057.412222 [0012]  :1248612[1248612]                   0.000      0.000      0.332      Z
    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S

    Now some extra processing has been added to decode the process name:

    1140057.412177 [0006]  Mutter Input Th[3139/3104]          0.956      0.019      0.041      S
    1140057.412222 [0012]  sleep[1248612]                      0.000      0.000      0.332      Z
    1140057.412275 [0004]  <idle>                              0.052      0.052      0.953      I
    1140057.412284 [0008]  <idle>                              0.070      0.070      0.932      I
    1140057.412333 [0004]  KMS thread[3126/3104]               0.953      0.112      0.058      S

    Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
    Link: https://lore.kernel.org/r/20250716203914.45772-2-ashelat@redhat.com
    Signed-off-by: Namhyung Kim <namhyung@kernel.org>


No? It is not applying to perf-tools-next, a quick look found the patch
above.

- Arnaldo


^ permalink raw reply

* RE: [PATCH v5 05/20] dma-pool: track decrypted atomic pools and select them via attrs
From: Michael Kelley @ 2026-06-04 14:05 UTC (permalink / raw)
  To: Jason Gunthorpe, Michael Kelley
  Cc: Aneesh Kumar K.V, iommu@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
	Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
	Mostafa Saleh, Petr Tesarik, Alexey Kardashevskiy, Dan Williams,
	Xu Yilun, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Sven Schnelle, x86@kernel.org, Jiri Pirko
In-Reply-To: <20260603005454.GM2487554@ziepe.ca>

From: Jason Gunthorpe <jgg@ziepe.ca> Sent: Tuesday, June 2, 2026 5:55 PM
> 
> On Tue, Jun 02, 2026 at 02:24:40PM +0000, Michael Kelley wrote:
> 
> > Except that in a normal VM, the "unencrypted" pool attribute does *not*
> > describe the state of the memory itself.  In a normal VM, the memory is
> > unencrypted, but the "unencrypted" pool attribute is false. That
> > contradiction is the essence of my concern.
> 
> I would argue no..
> 
> When CC is enabled the default state of memory in a Linux environment
> is "encrypted". You have to take a special action to "decrypt" it.
> 
> Thus the default state of memory in a non-CC environment is also
> paradoxically "encrypted" too. 

The need to have such an unnatural premise is usually an indication
of a conceptual problem with the overall model, or perhaps just a
terminology problem. 

Here's a proposal. The new DMA attribute is DMA_ATTR_CC_SHARED.
Name the pool attribute "cc_shared" instead of "unencrypted". Having
"cc_shared" set to false in a normal VM doesn't lead to the non-sensical
situation of claiming that a normal VM is encrypted. The boolean
"unencrypted" parameter that has been added to various calls also
becomes "cc_shared".  If "CC_SHARED" is a suitable name for the DMA
attribute, it ought to be suitable as the pool attribute. And everything
matches as well.

Michael  


> "decryption" is impossible.
> 
> Therefore the "unencrypted" state is a special state that only memory
> inside a CC VM can have. A normal VM can never have "unencrypted"
> memory at all, so having it be false in the pool is accurate as far as
> the APIs go.
> 
> un-encrypted = true means "the memory in this pool was transformed with
> set_memory_decrypted()" - which is impossible on a normal VM.
> 
> Jason



^ permalink raw reply

* Re: [PATCH] powerpc: Export set_memory_encrypted and set_memory_decrypted
From: Jason Gunthorpe @ 2026-06-04 13:57 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: Maxime Ripard, Jiri Pirko, Christoph Hellwig, T.J. Mercier, maddy,
	mpe, npiggin, chleroy, linuxppc-dev, lkp, linux-kernel, iommu,
	linux-mm, agordeev, gerald.schaefer, linux-s390, Dan Williams,
	Tom Lendacky, x86
In-Reply-To: <CAO_48GEJsg4X7++zg-ztQgVibY_FjjManaA5_W3usjicGUQPdg@mail.gmail.com>

On Thu, Jun 04, 2026 at 12:51:49PM +0530, Sumit Semwal wrote:

> Given that Christoph's objection is not really about the modules part,
> but that the set_memory_{encrypted,decrypted} should not be used here,
> one option is to revert 78b30c50a7ac until that issue is sorted out?

Please no, we have stuff already using this so it would be a
functional regression. Revert making heaps into a module since that
doesn't have a functional regression.

Jason


^ permalink raw reply

* [PATCH] powerpc/vtime: Initialize starttime at boot for native accounting
From: Shrikanth Hegde @ 2026-06-04 13:24 UTC (permalink / raw)
  To: maddy, linuxppc-dev, christophe.leroy
  Cc: sshegde, frederic, Christophe Leroy (CS GROUP)

It was observed that /proc/stat had very large value for one ore more
CPUs. It was more visible after recent code simplifications around
cpustats.

System has 240 CPUs.

cat /proc/uptime;
194.18 46500.55
cat /proc/stat
cpu  5966 39 837032887 4650070 164 185 100 0 0 0
cpu0 108 0 837030890 19109 24 4 23 0 0 0

Since uptime is 194s, system time of each CPU can't be more than 19400.
Sum of system time  of all CPUs can't be more than 19400*240 4656000.
In fact huge value is close to mftb(). Note mftb doesn't reset on powerVM
when the LPAR restart. It only resets when whole system resets. The same
issue exists for kexec too.

This happens since starttime is not setup at init time. Once it is set
then subsequent vtime_delta will return the right delta. 

Fix it by initializing the starttime during CPU initialization. This
fixes the large times seen.

cat /proc/uptime; cat /proc/stat
15.78 3694.63
cpu  6035 35 1347 369479 23 144 49 0 0 0
cpu0 19 0 38 1508 0 1 14 0 0 0

Now, system time is reported as expected.

Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---

Christophe, I have taken the patch as is from the discussion we had.
Let me know if i should send it with your signed-off-by tag. I have just
written the changelog. I sent it like this since tag was not there.

discussion thread:
https://lore.kernel.org/all/cd10be19-e0bc-4e0c-8dac-4f1c05d0de8f@kernel.org/

Also, does this warrant Fixes tag? I found these two likely candidates.
Likely this issues exists since beginning.
c223c90386bc powerpc32: provide VIRT_CPU_ACCOUNTING
b38a181c11d0 powerpc/time: isolate scaled cputime accounting in dedicated functions.

 arch/powerpc/kernel/time.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3460d1a5a97c..11145c40183d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -377,7 +377,6 @@ void vtime_task_switch(struct task_struct *prev)
 	}
 }
 
-#ifdef CONFIG_NO_HZ_COMMON
 /**
  * vtime_reset - Fast forward vtime entry clocks
  *
@@ -394,6 +393,7 @@ void vtime_reset(void)
 #endif
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
 /**
  * vtime_dyntick_start - Inform vtime about entry to idle-dynticks
  *
@@ -933,6 +933,7 @@ static void __init set_decrementer_max(void)
 static void __init init_decrementer_clockevent(void)
 {
 	register_decrementer_clockevent(smp_processor_id());
+	vtime_reset();
 }
 
 void secondary_cpu_time_init(void)
@@ -948,6 +949,7 @@ void secondary_cpu_time_init(void)
 	/* FIME: Should make unrelated change to move snapshot_timebase
 	 * call here ! */
 	register_decrementer_clockevent(smp_processor_id());
+	vtime_reset();
 }
 
 /*
-- 
2.47.3



^ permalink raw reply related

* [PATCH v3 6/6] sparc: Remove remaining defconfig references to the pktcdvd driver
From: Catalin Iacob @ 2026-06-04 13:20 UTC (permalink / raw)
  To: Thomas Bogendoerfer, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Rich Felker,
	John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
	James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Yoshinori Sato
  Cc: linux-mips, linux-kernel, linuxppc-dev, linux-sh, sparclinux,
	linux-scsi, Catalin Iacob
In-Reply-To: <20260604-remove-pktcdvd-references-v3-0-e2f06fb4eef4@gmail.com>

Commit 1cea5180f2f8 ("block: remove pktcdvd driver") left behind some
CONFIG_CONFIG_CDROM_PKTCDVD* references in defconfigs. Remove them.

Signed-off-by: Catalin Iacob <iacobcatalin@gmail.com>
---
 arch/sparc/configs/sparc64_defconfig | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/sparc/configs/sparc64_defconfig b/arch/sparc/configs/sparc64_defconfig
index 632081a262ba..4abea39281cd 100644
--- a/arch/sparc/configs/sparc64_defconfig
+++ b/arch/sparc/configs/sparc64_defconfig
@@ -60,8 +60,6 @@ CONFIG_CONNECTOR=m
 CONFIG_BLK_DEV_LOOP=m
 CONFIG_BLK_DEV_CRYPTOLOOP=m
 CONFIG_BLK_DEV_NBD=m
-CONFIG_CDROM_PKTCDVD=m
-CONFIG_CDROM_PKTCDVD_WCACHE=y
 CONFIG_ATA_OVER_ETH=m
 CONFIG_SUNVDC=m
 CONFIG_ATA=y

-- 
2.54.0



^ permalink raw reply related

* [PATCH v3 5/6] sh: Remove remaining defconfig reference to the pktcdvd driver
From: Catalin Iacob @ 2026-06-04 13:20 UTC (permalink / raw)
  To: Thomas Bogendoerfer, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy (CS GROUP), Rich Felker,
	John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
	James E.J. Bottomley, Martin K. Petersen, Jens Axboe,
	Yoshinori Sato
  Cc: linux-mips, linux-kernel, linuxppc-dev, linux-sh, sparclinux,
	linux-scsi, Catalin Iacob
In-Reply-To: <20260604-remove-pktcdvd-references-v3-0-e2f06fb4eef4@gmail.com>

Commit 1cea5180f2f8 ("block: remove pktcdvd driver") left behind a
CONFIG_CONFIG_CDROM_PKTCDVD reference in defconfigs. Remove it.

Signed-off-by: Catalin Iacob <iacobcatalin@gmail.com>
---
 arch/sh/configs/sh2007_defconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/sh/configs/sh2007_defconfig b/arch/sh/configs/sh2007_defconfig
index 5d9080499485..f287a41cd38c 100644
--- a/arch/sh/configs/sh2007_defconfig
+++ b/arch/sh/configs/sh2007_defconfig
@@ -45,7 +45,6 @@ CONFIG_NETWORK_SECMARK=y
 CONFIG_NET_PKTGEN=y
 CONFIG_BLK_DEV_LOOP=y
 CONFIG_BLK_DEV_RAM=y
-CONFIG_CDROM_PKTCDVD=y
 CONFIG_RAID_ATTRS=y
 CONFIG_SCSI=y
 CONFIG_BLK_DEV_SD=y

-- 
2.54.0



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox