* [PATCH] kernel-shark: Multi-thread the computaion of CPU graph
@ 2024-09-23 9:09 Libo Chen
2024-09-28 14:32 ` Yordan Karadzhov
0 siblings, 1 reply; 3+ messages in thread
From: Libo Chen @ 2024-09-23 9:09 UTC (permalink / raw)
To: y.karadz; +Cc: libo.chen, linux-trace-devel
Parallelize _newCPUGraph() calls to dramatically speed up
graph rendering particularly for traces from very large systems.
OpenMP technically is a new dependency here, but it's so embedded
with GCC toolchains, as long as your GCC is not older than v4.9,
the libgomp library that comes with it will work.
Signed-off-by: Libo Chen <libo.chen@oracle.com>
---
CMakeLists.txt | 5 +++++
src/KsGLWidget.cpp | 15 ++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index c36d757..8dd9ff9 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -84,6 +84,11 @@ set(EXECUTABLE_OUTPUT_PATH "${KS_DIR}/bin")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
+find_package(OpenMP 3.2.5)
+if (OPENMP_FOUND)
+ set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
+
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
diff --git a/src/KsGLWidget.cpp b/src/KsGLWidget.cpp
index 9311d98..00a1951 100644
--- a/src/KsGLWidget.cpp
+++ b/src/KsGLWidget.cpp
@@ -13,6 +13,9 @@
#include <GL/glut.h>
#include <GL/gl.h>
+// OpenMP
+#include <omp.h>
+
// KernelShark
#include "libkshark-plugin.h"
#include "KsGLWidget.hpp"
@@ -690,10 +693,20 @@ void KsGLWidget::_makeGraphs()
for (auto it = _streamPlots.begin(); it != _streamPlots.end(); ++it) {
sd = it.key();
+ QVector<KsPlot::Graph *> cpuGraphs(it.value()._cpuList.count());
+ QVector<KsPlot::Graph *> taskGraphs(it.value()._taskList.count());
+
/* Create CPU graphs according to the cpuList. */
it.value()._cpuGraphs = {};
+ omp_set_num_threads(omp_get_num_procs());
+ #pragma omp parallel for
for (auto const &cpu: it.value()._cpuList) {
- g = lamAddGraph(sd, _newCPUGraph(sd, cpu), _vSpacing);
+ int idx = it.value()._cpuList.indexOf(cpu);
+ cpuGraphs[idx] = _newCPUGraph(sd, cpu);
+ }
+ QVectorIterator<KsPlot::Graph *> itCpuGraphs(cpuGraphs);
+ while (itCpuGraphs.hasNext()) {
+ g = lamAddGraph(sd, itCpuGraphs.next(), _vSpacing);
it.value()._cpuGraphs.append(g);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] kernel-shark: Multi-thread the computaion of CPU graph
2024-09-23 9:09 [PATCH] kernel-shark: Multi-thread the computaion of CPU graph Libo Chen
@ 2024-09-28 14:32 ` Yordan Karadzhov
2024-09-28 23:16 ` Libo Chen
0 siblings, 1 reply; 3+ messages in thread
From: Yordan Karadzhov @ 2024-09-28 14:32 UTC (permalink / raw)
To: Libo Chen; +Cc: linux-trace-devel
Hi Libo,
I like the idea of the patch, however a bit of extra work is needed
before merging it. See my comments in the code below.
On 9/23/24 12:09, Libo Chen wrote:
> Parallelize _newCPUGraph() calls to dramatically speed up
> graph rendering particularly for traces from very large systems.
>
> OpenMP technically is a new dependency here, but it's so embedded
> with GCC toolchains, as long as your GCC is not older than v4.9,
> the libgomp library that comes with it will work.
>
> Signed-off-by: Libo Chen <libo.chen@oracle.com>
> ---
> CMakeLists.txt | 5 +++++
> src/KsGLWidget.cpp | 15 ++++++++++++++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/CMakeLists.txt b/CMakeLists.txt
> index c36d757..8dd9ff9 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -84,6 +84,11 @@ set(EXECUTABLE_OUTPUT_PATH "${KS_DIR}/bin")
> set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
> set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
>
> +find_package(OpenMP 3.2.5)
> +if (OPENMP_FOUND)
> + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
> + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
The "if" must be closed.
> +
> set(CMAKE_CXX_STANDARD 17)
> set(CMAKE_CXX_STANDARD_REQUIRED ON)
> set(CMAKE_CXX_EXTENSIONS OFF)
> diff --git a/src/KsGLWidget.cpp b/src/KsGLWidget.cpp
> index 9311d98..00a1951 100644
> --- a/src/KsGLWidget.cpp
> +++ b/src/KsGLWidget.cpp
> @@ -13,6 +13,9 @@
> #include <GL/glut.h>
> #include <GL/gl.h>
>
> +// OpenMP
> +#include <omp.h>
> +
> // KernelShark
> #include "libkshark-plugin.h"
> #include "KsGLWidget.hpp"
> @@ -690,10 +693,20 @@ void KsGLWidget::_makeGraphs()
>
> for (auto it = _streamPlots.begin(); it != _streamPlots.end(); ++it) {
> sd = it.key();
> + QVector<KsPlot::Graph *> cpuGraphs(it.value()._cpuList.count());
> + QVector<KsPlot::Graph *> taskGraphs(it.value()._taskList.count());
> +
> /* Create CPU graphs according to the cpuList. */
> it.value()._cpuGraphs = {};
> + omp_set_num_threads(omp_get_num_procs());
Not sure how setting the thread number works in opm, but I would guess
that you have to do it just once. If this is the case, please move the
the line above to the constructor. Note that _makeGraphs() gets executed
almost every time you do something in the GUI (zoom, shift, select
event, ...)
> + #pragma omp parallel for
> for (auto const &cpu: it.value()._cpuList) {
> - g = lamAddGraph(sd, _newCPUGraph(sd, cpu), _vSpacing);
> + int idx = it.value()._cpuList.indexOf(cpu);
> + cpuGraphs[idx] = _newCPUGraph(sd, cpu);
> + }
> + QVectorIterator<KsPlot::Graph *> itCpuGraphs(cpuGraphs);
> + while (itCpuGraphs.hasNext()) {
> + g = lamAddGraph(sd, itCpuGraphs.next(), _vSpacing);
> it.value()._cpuGraphs.append(g);
> }
>
I wonder why you add this optimization only for the CPU graphs? Please
do the same for the Task and Combo graphs as well.
Thanks!
Yordan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] kernel-shark: Multi-thread the computaion of CPU graph
2024-09-28 14:32 ` Yordan Karadzhov
@ 2024-09-28 23:16 ` Libo Chen
0 siblings, 0 replies; 3+ messages in thread
From: Libo Chen @ 2024-09-28 23:16 UTC (permalink / raw)
To: Yordan Karadzhov; +Cc: linux-trace-devel@vger.kernel.org
(Resend in plain text)
Hi Yordan,
> On Sep 28, 2024, at 07:32, Yordan Karadzhov <y.karadz@gmail.com> wrote:
>
> Hi Libo,
>
> I like the idea of the patch, however a bit of extra work is needed before merging it. See my comments in the code below.
>
> On 9/23/24 12:09, Libo Chen wrote:
>> Parallelize _newCPUGraph() calls to dramatically speed up
>> graph rendering particularly for traces from very large systems.
>> OpenMP technically is a new dependency here, but it's so embedded
>> with GCC toolchains, as long as your GCC is not older than v4.9,
>> the libgomp library that comes with it will work.
>> Signed-off-by: Libo Chen <libo.chen@oracle.com>
>> ---
>> CMakeLists.txt | 5 +++++
>> src/KsGLWidget.cpp | 15 ++++++++++++++-
>> 2 files changed, 19 insertions(+), 1 deletion(-)
>> diff --git a/CMakeLists.txt b/CMakeLists.txt
>> index c36d757..8dd9ff9 100644
>> --- a/CMakeLists.txt
>> +++ b/CMakeLists.txt
>> @@ -84,6 +84,11 @@ set(EXECUTABLE_OUTPUT_PATH "${KS_DIR}/bin")
>> set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
>> set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -pthread -fPIC -fno-common")
>> +find_package(OpenMP 3.2.5)
>> +if (OPENMP_FOUND)
>> + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
>> + set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
>
> The "if" must be closed.
Whoops, will fix that
>> +
>> set(CMAKE_CXX_STANDARD 17)
>> set(CMAKE_CXX_STANDARD_REQUIRED ON)
>> set(CMAKE_CXX_EXTENSIONS OFF)
>> diff --git a/src/KsGLWidget.cpp b/src/KsGLWidget.cpp
>> index 9311d98..00a1951 100644
>> --- a/src/KsGLWidget.cpp
>> +++ b/src/KsGLWidget.cpp
>> @@ -13,6 +13,9 @@
>> #include <GL/glut.h>
>> #include <GL/gl.h>
>> +// OpenMP
>> +#include <omp.h>
>> +
>> // KernelShark
>> #include "libkshark-plugin.h"
>> #include "KsGLWidget.hpp"
>> @@ -690,10 +693,20 @@ void KsGLWidget::_makeGraphs()
>> for (auto it = _streamPlots.begin(); it != _streamPlots.end(); ++it) {
>> sd = it.key();
>> + QVector<KsPlot::Graph *> cpuGraphs(it.value()._cpuList.count());
>> + QVector<KsPlot::Graph *> taskGraphs(it.value()._taskList.count());
>> +
>> /* Create CPU graphs according to the cpuList. */
>> it.value()._cpuGraphs = {};
>> + omp_set_num_threads(omp_get_num_procs());
>
> Not sure how setting the thread number works in opm, but I would guess that you have to do it just once. If this is the case, please move the the line above to the constructor. Note that _makeGraphs() gets executed almost every time you do something in the GUI (zoom, shift, select event, ...)
Yeah, I did think about setting it once but the only reason I was considering not doing that is one can plug/unplug or online/offline CPUs while kernelshark is running. This is somewhat an edge case on baremetals but could be more common in VMs. I will take your advice here unless somebody in the future starts to complain about it and TBH one should probably not change the number of online CPUs while having CPU intensive workloads running.
>> + #pragma omp parallel for
>> for (auto const &cpu: it.value()._cpuList) {
>> - g = lamAddGraph(sd, _newCPUGraph(sd, cpu), _vSpacing);
>> + int idx = it.value()._cpuList.indexOf(cpu);
>> + cpuGraphs[idx] = _newCPUGraph(sd, cpu);
>> + }
>> + QVectorIterator<KsPlot::Graph *> itCpuGraphs(cpuGraphs);
>> + while (itCpuGraphs.hasNext()) {
>> + g = lamAddGraph(sd, itCpuGraphs.next(), _vSpacing);
>> it.value()._cpuGraphs.append(g);
>> }
>>
>
> I wonder why you add this optimization only for the CPU graphs? Please do the same for the Task and Combo graphs as well.
Haha, it was intentional. I was just trying to gauge interest on this optimization. I will send out a new one to include other graphs.
Best,
Libo
> Thanks!
> Yordan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-09-28 23:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 9:09 [PATCH] kernel-shark: Multi-thread the computaion of CPU graph Libo Chen
2024-09-28 14:32 ` Yordan Karadzhov
2024-09-28 23:16 ` Libo Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).