[PATCH v2 1/1] perf test: Ensure lock contention using pipe mode

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
@ 2025-07-25 17:08 Jan Polensky
  2025-07-26  5:31 ` Namhyung Kim
  2025-08-01 17:24 ` Namhyung Kim
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Polensky @ 2025-07-25 17:08 UTC (permalink / raw)
  To: adrian.hunter, irogers, namhyung, Thomas Richter; +Cc: linux-perf-users

The 'kernel lock contention analysis test' requires reliable triggering
of lock contention. On some systems, previous benchmark calls failed to
generate sufficient contention due to low system activity or resource
limits.

This patch adds the -p (pipe) option to all calls of perf bench sched
messaging, ensuring consistent lock contention without relying on
socket-based communication.

Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
---
 tools/perf/tests/shell/lock_contention.sh | 26 +++++++++++------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh
index 30d195d4c62f..2c2887b22407 100755
--- a/tools/perf/tests/shell/lock_contention.sh
+++ b/tools/perf/tests/shell/lock_contention.sh
@@ -44,7 +44,7 @@ check() {
 test_record()
 {
 	echo "Testing perf lock record and perf lock contention"
-	perf lock record -o ${perfdata} -- perf bench sched messaging > /dev/null 2>&1
+	perf lock record -o ${perfdata} -- perf bench sched messaging -p > /dev/null 2>&1
 	# the output goes to the stderr and we expect only 1 output (-E 1)
 	perf lock contention -i ${perfdata} -E 1 -q 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
@@ -64,7 +64,7 @@ test_bpf()
 	fi

 	# the perf lock contention output goes to the stderr
-	perf lock con -a -b -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
 		err=1
@@ -75,7 +75,7 @@ test_bpf()
 test_record_concurrent()
 {
 	echo "Testing perf lock record and perf lock contention at the same time"
-	perf lock record -o- -- perf bench sched messaging 2> /dev/null | \
+	perf lock record -o- -- perf bench sched messaging -p 2> /dev/null | \
 	perf lock contention -i- -E 1 -q 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] Recorded result count is not 1:" "$(cat "${result}" | wc -l)"
@@ -99,7 +99,7 @@ test_aggr_task()
 	fi

 	# the perf lock contention output goes to the stderr
-	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
 		err=1
@@ -122,7 +122,7 @@ test_aggr_addr()
 	fi

 	# the perf lock contention output goes to the stderr
-	perf lock con -a -b -l -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -l -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
 		err=1
@@ -140,7 +140,7 @@ test_aggr_cgroup()
 	fi

 	# the perf lock contention output goes to the stderr
-	perf lock con -a -b -g -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -g -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
 		err=1
@@ -162,7 +162,7 @@ test_type_filter()
 		return
 	fi

-	perf lock con -a -b -Y spinlock -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -Y spinlock -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(grep -c -v spinlock "${result}")" != "0" ]; then
 		echo "[Fail] BPF result should not have non-spinlocks:" "$(cat "${result}")"
 		err=1
@@ -194,7 +194,7 @@ test_lock_filter()
 		return
 	fi

-	perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(grep -c -v "${test_lock_filter_type}" "${result}")" != "0" ]; then
 		echo "[Fail] BPF result should not have non-${test_lock_filter_type} locks:" "$(cat "${result}")"
 		err=1
@@ -222,7 +222,7 @@ test_stack_filter()
 		return
 	fi

-	perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result should have a lock from unix_stream:" "$(cat "${result}")"
 		err=1
@@ -250,7 +250,7 @@ test_aggr_task_stack_filter()
 		return
 	fi

-	perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result should have a task from unix_stream:" "$(cat "${result}")"
 		err=1
@@ -266,7 +266,7 @@ test_cgroup_filter()
 		return
 	fi

-	perf lock con -a -b -g -E 1 -F wait_total -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -g -E 1 -F wait_total -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result should have a cgroup result:" "$(cat "${result}")"
 		err=1
@@ -274,7 +274,7 @@ test_cgroup_filter()
 	fi

 	cgroup=$(cat "${result}" | awk '{ print $3 }')
-	perf lock con -a -b -g -E 1 -G "${cgroup}" -q -- perf bench sched messaging > /dev/null 2> ${result}
+	perf lock con -a -b -g -E 1 -G "${cgroup}" -q -- perf bench sched messaging -p > /dev/null 2> ${result}
 	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
 		echo "[Fail] BPF result should have a result with cgroup filter:" "$(cat "${cgroup}")"
 		err=1
@@ -309,7 +309,7 @@ test_csv_output()
 	fi

 	# the perf lock contention output goes to the stderr
-	perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging > /dev/null 2>&1
+	perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging -p > /dev/null 2>&1
 	output=$(grep -v "^#" ${result} | tr -d -c , | wc -c)
 	if [ "${header}" != "${output}" ]; then
 		echo "[Fail] BPF result does not match the number of commas: ${header} != ${output}"
--
2.48.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
  2025-07-25 17:08 [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode Jan Polensky
@ 2025-07-26  5:31 ` Namhyung Kim
  2025-07-28 19:20   ` Jan Polensky
  2025-08-01 17:24 ` Namhyung Kim
  1 sibling, 1 reply; 6+ messages in thread
From: Namhyung Kim @ 2025-07-26  5:31 UTC (permalink / raw)
  To: Jan Polensky; +Cc: adrian.hunter, irogers, Thomas Richter, linux-perf-users

On Fri, Jul 25, 2025 at 07:08:01PM +0200, Jan Polensky wrote:
> The 'kernel lock contention analysis test' requires reliable triggering
> of lock contention. On some systems, previous benchmark calls failed to
> generate sufficient contention due to low system activity or resource
> limits.

Right, we need a reliable reproducer.

> 
> This patch adds the -p (pipe) option to all calls of perf bench sched
> messaging, ensuring consistent lock contention without relying on
> socket-based communication.

But I don't understand why pipe is different than sockets.  Can you
please elaborate?

Thanks,
Namhyung

> 
> Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
> Signed-off-by: Jan Polensky <japo@linux.ibm.com>
> ---
>  tools/perf/tests/shell/lock_contention.sh | 26 +++++++++++------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh
> index 30d195d4c62f..2c2887b22407 100755
> --- a/tools/perf/tests/shell/lock_contention.sh
> +++ b/tools/perf/tests/shell/lock_contention.sh
> @@ -44,7 +44,7 @@ check() {
>  test_record()
>  {
>  	echo "Testing perf lock record and perf lock contention"
> -	perf lock record -o ${perfdata} -- perf bench sched messaging > /dev/null 2>&1
> +	perf lock record -o ${perfdata} -- perf bench sched messaging -p > /dev/null 2>&1
>  	# the output goes to the stderr and we expect only 1 output (-E 1)
>  	perf lock contention -i ${perfdata} -E 1 -q 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
> @@ -64,7 +64,7 @@ test_bpf()
>  	fi
> 
>  	# the perf lock contention output goes to the stderr
> -	perf lock con -a -b -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
>  		err=1
> @@ -75,7 +75,7 @@ test_bpf()
>  test_record_concurrent()
>  {
>  	echo "Testing perf lock record and perf lock contention at the same time"
> -	perf lock record -o- -- perf bench sched messaging 2> /dev/null | \
> +	perf lock record -o- -- perf bench sched messaging -p 2> /dev/null | \
>  	perf lock contention -i- -E 1 -q 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] Recorded result count is not 1:" "$(cat "${result}" | wc -l)"
> @@ -99,7 +99,7 @@ test_aggr_task()
>  	fi
> 
>  	# the perf lock contention output goes to the stderr
> -	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -t -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
>  		err=1
> @@ -122,7 +122,7 @@ test_aggr_addr()
>  	fi
> 
>  	# the perf lock contention output goes to the stderr
> -	perf lock con -a -b -l -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -l -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
>  		err=1
> @@ -140,7 +140,7 @@ test_aggr_cgroup()
>  	fi
> 
>  	# the perf lock contention output goes to the stderr
> -	perf lock con -a -b -g -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -g -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)"
>  		err=1
> @@ -162,7 +162,7 @@ test_type_filter()
>  		return
>  	fi
> 
> -	perf lock con -a -b -Y spinlock -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -Y spinlock -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(grep -c -v spinlock "${result}")" != "0" ]; then
>  		echo "[Fail] BPF result should not have non-spinlocks:" "$(cat "${result}")"
>  		err=1
> @@ -194,7 +194,7 @@ test_lock_filter()
>  		return
>  	fi
> 
> -	perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(grep -c -v "${test_lock_filter_type}" "${result}")" != "0" ]; then
>  		echo "[Fail] BPF result should not have non-${test_lock_filter_type} locks:" "$(cat "${result}")"
>  		err=1
> @@ -222,7 +222,7 @@ test_stack_filter()
>  		return
>  	fi
> 
> -	perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result should have a lock from unix_stream:" "$(cat "${result}")"
>  		err=1
> @@ -250,7 +250,7 @@ test_aggr_task_stack_filter()
>  		return
>  	fi
> 
> -	perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result should have a task from unix_stream:" "$(cat "${result}")"
>  		err=1
> @@ -266,7 +266,7 @@ test_cgroup_filter()
>  		return
>  	fi
> 
> -	perf lock con -a -b -g -E 1 -F wait_total -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -g -E 1 -F wait_total -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result should have a cgroup result:" "$(cat "${result}")"
>  		err=1
> @@ -274,7 +274,7 @@ test_cgroup_filter()
>  	fi
> 
>  	cgroup=$(cat "${result}" | awk '{ print $3 }')
> -	perf lock con -a -b -g -E 1 -G "${cgroup}" -q -- perf bench sched messaging > /dev/null 2> ${result}
> +	perf lock con -a -b -g -E 1 -G "${cgroup}" -q -- perf bench sched messaging -p > /dev/null 2> ${result}
>  	if [ "$(cat "${result}" | wc -l)" != "1" ]; then
>  		echo "[Fail] BPF result should have a result with cgroup filter:" "$(cat "${cgroup}")"
>  		err=1
> @@ -309,7 +309,7 @@ test_csv_output()
>  	fi
> 
>  	# the perf lock contention output goes to the stderr
> -	perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging > /dev/null 2>&1
> +	perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging -p > /dev/null 2>&1
>  	output=$(grep -v "^#" ${result} | tr -d -c , | wc -c)
>  	if [ "${header}" != "${output}" ]; then
>  		echo "[Fail] BPF result does not match the number of commas: ${header} != ${output}"
> --
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
  2025-07-26  5:31 ` Namhyung Kim
@ 2025-07-28 19:20   ` Jan Polensky
  2025-07-30 18:12     ` Namhyung Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Polensky @ 2025-07-28 19:20 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: adrian.hunter, irogers, Thomas Richter, linux-perf-users

On Fri, Jul 25, 2025 at 10:31:51PM -0700, Namhyung Kim wrote:
> On Fri, Jul 25, 2025 at 07:08:01PM +0200, Jan Polensky wrote:
> > The 'kernel lock contention analysis test' requires reliable triggering
> > of lock contention. On some systems, previous benchmark calls failed to
> > generate sufficient contention due to low system activity or resource
> > limits.
>
> Right, we need a reliable reproducer.
>
> >
> > This patch adds the -p (pipe) option to all calls of perf bench sched
> > messaging, ensuring consistent lock contention without relying on
> > socket-based communication.
>
> But I don't understand why pipe is different than sockets.  Can you
> please elaborate?

The solution suggested by v1 in
https://lore.kernel.org/all/aIOvdQ003hRqFEH1@li-276bd24c-2dcc-11b2-a85c-945b6f05615c.ibm.com/
can be significantly faster and more reproducible in some cases. However,
on large systems it may fail with the error "perf: socketpair(): Too many
open files", which in turn can lead to kernel lock contention. While this
can be mitigated by increasing the file descriptor limit via ulimit -n
<number>, we should avoid modifying system settings during testing.

	[root@localhost perf]# ./perf stat -- perf bench sched messaging --group $(nproc) --thread
	# Running 'sched/messaging' benchmark:
	perf: socketpair(): Too many open files

	 Performance counter stats for 'perf bench sched messaging --group 32 --thread':

				 43.14 msec task-clock                       #    0.104 CPUs utilized
				 1,013      context-switches                 #   23.483 K/sec
				   900      cpu-migrations                   #   20.863 K/sec
				 2,905      page-faults                      #   67.342 K/sec
		   162,801,472      instructions                     #    0.74  insn per cycle
		   220,427,613      cycles                           #    5.110 GHz

		   0.414804757 seconds time elapsed

		   0.004687000 seconds user
		   0.071900000 seconds sys

Analyzing the lock content w/o pipe:

    [root@localhost ~]# perf record -e 'lock:contention*' -a -- perf bench sched messaging; perf script | wc -l
    # Running 'sched/messaging' benchmark:
    # 20 sender and receiver processes per group
    # 10 groups == 400 processes run

         Total time: 0.135 [sec]
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.363 MB perf.data ]
    0

Analyzing the lock content with pipe:

    [root@localhost ~]# perf record -e 'lock:contention*' -a -- perf bench sched messaging -p; perf script | wc -l
    # Running 'sched/messaging' benchmark:
    # 20 sender and receiver processes per group
    # 10 groups == 400 processes run

         Total time: 0.108 [sec]
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 1.604 MB perf.data (14789 samples) ]
    14789

By using the -g option, we identified pipe as a good source of contention:

    [skip]
    sched-messaging 2028653 [003] 625253.243099: lock:contention_begin: 0x85846e40 (flags=SPIN|MUTEX)
             3a211f16200 __traceiter_contention_begin+0x50 ([kernel.kallsyms])
             3a212d92e0e __mutex_lock.constprop.0+0x1be ([kernel.kallsyms])
             3a2122643f2 anon_pipe_write+0x52 ([kernel.kallsyms])
             3a21225789a vfs_write+0x1ca ([kernel.kallsyms])
             3a212257ca8 ksys_write+0xd8 ([kernel.kallsyms])
             3a212d8c8f4 __do_syscall+0x164 ([kernel.kallsyms])
             3a212d99154 system_call+0x74 ([kernel.kallsyms])
             3ff8e2a9ba6 __internal_syscall_cancel+0xa6 (/usr/lib64/libc.so.6)

	sched-messaging 2028652 [005] 625253.243099:   lock:contention_end: 0x85846e40 (ret=0)
             3a211f16280 __traceiter_contention_end+0x50 ([kernel.kallsyms])
             3a212d92fd6 __mutex_lock.constprop.0+0x386 ([kernel.kallsyms])
             3a2122643f2 anon_pipe_write+0x52 ([kernel.kallsyms])
             3a21225789a vfs_write+0x1ca ([kernel.kallsyms])
             3a212257ca8 ksys_write+0xd8 ([kernel.kallsyms])
             3a212d8c8f4 __do_syscall+0x164 ([kernel.kallsyms])
             3a212d99154 system_call+0x74 ([kernel.kallsyms])
             3ff8e2a9ba6 __internal_syscall_cancel+0xa6 (/usr/lib64/libc.so.6)
	[skip]

This suggests that sockets are better optimized to avoid such locking
issues, and therefore are a poor choice in this specific benchmark
scenario.

> Thanks,
> Namhyung
>
[skip]
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
  2025-07-28 19:20   ` Jan Polensky
@ 2025-07-30 18:12     ` Namhyung Kim
  2025-07-31 13:51       ` Jan Polensky
  0 siblings, 1 reply; 6+ messages in thread
From: Namhyung Kim @ 2025-07-30 18:12 UTC (permalink / raw)
  To: Jan Polensky; +Cc: adrian.hunter, irogers, Thomas Richter, linux-perf-users

On Mon, Jul 28, 2025 at 09:20:10PM +0200, Jan Polensky wrote:
> On Fri, Jul 25, 2025 at 10:31:51PM -0700, Namhyung Kim wrote:
> > On Fri, Jul 25, 2025 at 07:08:01PM +0200, Jan Polensky wrote:
> > > The 'kernel lock contention analysis test' requires reliable triggering
> > > of lock contention. On some systems, previous benchmark calls failed to
> > > generate sufficient contention due to low system activity or resource
> > > limits.
> >
> > Right, we need a reliable reproducer.
> >
> > >
> > > This patch adds the -p (pipe) option to all calls of perf bench sched
> > > messaging, ensuring consistent lock contention without relying on
> > > socket-based communication.
> >
> > But I don't understand why pipe is different than sockets.  Can you
> > please elaborate?
> 
> The solution suggested by v1 in
> https://lore.kernel.org/all/aIOvdQ003hRqFEH1@li-276bd24c-2dcc-11b2-a85c-945b6f05615c.ibm.com/
> can be significantly faster and more reproducible in some cases. However,
> on large systems it may fail with the error "perf: socketpair(): Too many
> open files", which in turn can lead to kernel lock contention. While this
> can be mitigated by increasing the file descriptor limit via ulimit -n
> <number>, we should avoid modifying system settings during testing.

Interesting, I didn't know the socketpair would generate more file
descriptors than pipe.  Maybe there's a bug in perf bench handling
socket file descriptors?

> 
> 	[root@localhost perf]# ./perf stat -- perf bench sched messaging --group $(nproc) --thread
> 	# Running 'sched/messaging' benchmark:
> 	perf: socketpair(): Too many open files
> 
> 	 Performance counter stats for 'perf bench sched messaging --group 32 --thread':
> 
> 				 43.14 msec task-clock                       #    0.104 CPUs utilized
> 				 1,013      context-switches                 #   23.483 K/sec
> 				   900      cpu-migrations                   #   20.863 K/sec
> 				 2,905      page-faults                      #   67.342 K/sec
> 		   162,801,472      instructions                     #    0.74  insn per cycle
> 		   220,427,613      cycles                           #    5.110 GHz
> 
> 		   0.414804757 seconds time elapsed
> 
> 		   0.004687000 seconds user
> 		   0.071900000 seconds sys
> 
> Analyzing the lock content w/o pipe:
> 
>     [root@localhost ~]# perf record -e 'lock:contention*' -a -- perf bench sched messaging; perf script | wc -l
>     # Running 'sched/messaging' benchmark:
>     # 20 sender and receiver processes per group
>     # 10 groups == 400 processes run
> 
>          Total time: 0.135 [sec]
>     [ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.363 MB perf.data ]
>     0

Hmm.. strange.  It causes some contention on my system.  But it could be
an arch specific issue.  I'm ok with changing it to pipe then.

Thanks,
Namhyung

> 
> Analyzing the lock content with pipe:
> 
>     [root@localhost ~]# perf record -e 'lock:contention*' -a -- perf bench sched messaging -p; perf script | wc -l
>     # Running 'sched/messaging' benchmark:
>     # 20 sender and receiver processes per group
>     # 10 groups == 400 processes run
> 
>          Total time: 0.108 [sec]
>     [ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 1.604 MB perf.data (14789 samples) ]
>     14789
> 
> By using the -g option, we identified pipe as a good source of contention:
> 
>     [skip]
>     sched-messaging 2028653 [003] 625253.243099: lock:contention_begin: 0x85846e40 (flags=SPIN|MUTEX)
>              3a211f16200 __traceiter_contention_begin+0x50 ([kernel.kallsyms])
>              3a212d92e0e __mutex_lock.constprop.0+0x1be ([kernel.kallsyms])
>              3a2122643f2 anon_pipe_write+0x52 ([kernel.kallsyms])
>              3a21225789a vfs_write+0x1ca ([kernel.kallsyms])
>              3a212257ca8 ksys_write+0xd8 ([kernel.kallsyms])
>              3a212d8c8f4 __do_syscall+0x164 ([kernel.kallsyms])
>              3a212d99154 system_call+0x74 ([kernel.kallsyms])
>              3ff8e2a9ba6 __internal_syscall_cancel+0xa6 (/usr/lib64/libc.so.6)
> 
> 	sched-messaging 2028652 [005] 625253.243099:   lock:contention_end: 0x85846e40 (ret=0)
>              3a211f16280 __traceiter_contention_end+0x50 ([kernel.kallsyms])
>              3a212d92fd6 __mutex_lock.constprop.0+0x386 ([kernel.kallsyms])
>              3a2122643f2 anon_pipe_write+0x52 ([kernel.kallsyms])
>              3a21225789a vfs_write+0x1ca ([kernel.kallsyms])
>              3a212257ca8 ksys_write+0xd8 ([kernel.kallsyms])
>              3a212d8c8f4 __do_syscall+0x164 ([kernel.kallsyms])
>              3a212d99154 system_call+0x74 ([kernel.kallsyms])
>              3ff8e2a9ba6 __internal_syscall_cancel+0xa6 (/usr/lib64/libc.so.6)
> 	[skip]
> 
> This suggests that sockets are better optimized to avoid such locking
> issues, and therefore are a poor choice in this specific benchmark
> scenario.
> 
> > Thanks,
> > Namhyung
> >
> [skip]
> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
  2025-07-30 18:12     ` Namhyung Kim
@ 2025-07-31 13:51       ` Jan Polensky
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Polensky @ 2025-07-31 13:51 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: adrian.hunter, irogers, Thomas Richter, linux-perf-users

On Wed, Jul 30, 2025 at 11:12:09AM -0700, Namhyung Kim wrote:
> On Mon, Jul 28, 2025 at 09:20:10PM +0200, Jan Polensky wrote:
> > On Fri, Jul 25, 2025 at 10:31:51PM -0700, Namhyung Kim wrote:
> > > On Fri, Jul 25, 2025 at 07:08:01PM +0200, Jan Polensky wrote:
[skip]
> > The solution suggested by v1 in
> > https://lore.kernel.org/all/aIOvdQ003hRqFEH1@li-276bd24c-2dcc-11b2-a85c-945b6f05615c.ibm.com/
> > can be significantly faster and more reproducible in some cases. However,
> > on large systems it may fail with the error "perf: socketpair(): Too many
> > open files", which in turn can lead to kernel lock contention. While this
> > can be mitigated by increasing the file descriptor limit via ulimit -n
> > <number>, we should avoid modifying system settings during testing.
>
> Interesting, I didn't know the socketpair would generate more file
> descriptors than pipe.  Maybe there's a bug in perf bench handling
> socket file descriptors?
>
From what I can tell, this seems to be expected behavior. socketpair()
creates a pair of sockets. The issue arises because the file descriptor
limit is too low for the number of CPUs/nproc on the system. As a general
practice, we try to avoid modifying system defaults during testing. If
the solution worked with the --group parameter, one could temporarily
raise the limit for the executing shell using ulimit -n <num>. However,
since this doesn't address the lock contention for s390, it wouldn't
solve the underlying problem.

For context, the default file descriptor limit (ulimit -n) is often set
to 1024, while systems may have a high CPU count:

    $ ulimit -n
    1024
    $ echo $(nproc)
    32

Which can lead to issues when running benchmarks with many threads. For
example:

    32 groups * 20 sender * 20 receiver = 1280 --> exceeds the default limit
    25 groups * 20 sender * 20 receiver = 1000 --> stays within the limit

The desired behavior with fewer groups works as expected:

    $ ./perf --debug stderr bench sched messaging --group 25 --thread
    # Running 'sched/messaging' benchmark:
    # 20 sender and receiver threads per group
    # 25 groups == 1000 threads run

         Total time: 0.142 [sec]

When the number of socketpairs exceeds the file descriptor limit, the
benchmark fails as expected. For example, with 32 groups:

    $ strace -f -e trace=socketpair ./perf --debug stderr bench sched messaging --group $(nproc) --thread
    [skip]
    [pid 94410] socketpair(AF_UNIX, SOCK_STREAM, 0, [1019, 1020]) = 0
    strace: Process 95418 attached
    [pid 94410] socketpair(AF_UNIX, SOCK_STREAM, 0, [1021, 1022]) = 0
    strace: Process 95419 attached
    [pid 94410] socketpair(AF_UNIX, SOCK_STREAM, 0, 0x3ffde576ef0) = -1 EMFILE (Too many open files)
    perf: socketpair(): Too many open files
    [skip]

This shows that the 509th socketpair fails due to hitting the ulimit -n
threshold of 1024 in this case.

So it seems to work as intended, but the system limits are simply not
suitable for this particular case.
> >
> > 	[root@localhost perf]# ./perf stat -- perf bench sched messaging --group $(nproc) --thread
> > 	# Running 'sched/messaging' benchmark:
> > 	perf: socketpair(): Too many open files
> >
> > 	 Performance counter stats for 'perf bench sched messaging --group 32 --thread':
> >
> > 				 43.14 msec task-clock                       #    0.104 CPUs utilized
> > 				 1,013      context-switches                 #   23.483 K/sec
> > 				   900      cpu-migrations                   #   20.863 K/sec
> > 				 2,905      page-faults                      #   67.342 K/sec
> > 		   162,801,472      instructions                     #    0.74  insn per cycle
> > 		   220,427,613      cycles                           #    5.110 GHz
> >
> > 		   0.414804757 seconds time elapsed
> >
> > 		   0.004687000 seconds user
> > 		   0.071900000 seconds sys
> >
> > Analyzing the lock content w/o pipe:
> >
> >     [root@localhost ~]# perf record -e 'lock:contention*' -a -- perf bench sched messaging; perf script | wc -l
> >     # Running 'sched/messaging' benchmark:
> >     # 20 sender and receiver processes per group
> >     # 10 groups == 400 processes run
> >
> >          Total time: 0.135 [sec]
> >     [ perf record: Woken up 1 times to write data ]
> >     [ perf record: Captured and wrote 0.363 MB perf.data ]
> >     0
>
> Hmm.. strange.  It causes some contention on my system.  But it could be
> an arch specific issue.  I'm ok with changing it to pipe then.
Yes, communication is highly optimized on s390x systems, which might
explain why we don't observe the same contention here.
>
> Thanks,
> Namhyung
Thank you for your input and feedback, much appreciated.
Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode
  2025-07-25 17:08 [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode Jan Polensky
  2025-07-26  5:31 ` Namhyung Kim
@ 2025-08-01 17:24 ` Namhyung Kim
  1 sibling, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2025-08-01 17:24 UTC (permalink / raw)
  To: adrian.hunter, irogers, tmricht, Jan Polensky; +Cc: linux-perf-users

On Fri, 25 Jul 2025 19:08:01 +0200, Jan Polensky wrote:
> The 'kernel lock contention analysis test' requires reliable triggering
> of lock contention. On some systems, previous benchmark calls failed to
> generate sufficient contention due to low system activity or resource
> limits.
> 
> This patch adds the -p (pipe) option to all calls of perf bench sched
> messaging, ensuring consistent lock contention without relying on
> socket-based communication.
> 
> [...]
Applied to perf-tools-next, thanks!

Best regards,
Namhyung



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-01 17:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-25 17:08 [PATCH v2 1/1] perf test: Ensure lock contention using pipe mode Jan Polensky
2025-07-26  5:31 ` Namhyung Kim
2025-07-28 19:20   ` Jan Polensky
2025-07-30 18:12     ` Namhyung Kim
2025-07-31 13:51       ` Jan Polensky
2025-08-01 17:24 ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).