public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* perf loss on parallel compile due to conention on the buf semaphore
@ 2024-08-15 12:25 Mateusz Guzik
  2024-08-15 12:26 ` Mateusz Guzik
  2024-08-15 22:56 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Mateusz Guzik @ 2024-08-15 12:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

I have an ext4-based system where xfs got mounted on tmpfs for testing
purposes. The directory is being used a lot by gcc when compiling.

I'm testing with 24 compilers running in parallel, each operating on
their own hello world source file, listed at the end for reference.

Both ext4 and btrfs backing the directory result in 100% cpu
utilization and about 1500 compiles/second. With xfs I see about 20%
idle(!) and about 1100 compiles/second.

According to offcputime-bpfcc -K the time is spent waiting on the buf
thing, sample traces:

   finish_task_switch.isra.0
    __schedule
    schedule
    schedule_timeout
    __down_common
    down
    xfs_buf_lock
    xfs_buf_find_lock
    xfs_buf_get_map
    xfs_buf_read_map
    xfs_trans_read_buf_map
    xfs_read_agi
    xfs_ialloc_read_agi
    xfs_dialloc
    xfs_create
    xfs_generic_create
    path_openat
    do_filp_open
    do_sys_openat2
    __x64_sys_openat
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                cc (602142)
        10639

    finish_task_switch.isra.0
    __schedule
    schedule
    schedule_timeout
    __down_common
    down
    xfs_buf_lock
    xfs_buf_find_lock
    xfs_buf_get_map
    xfs_buf_read_map
    xfs_trans_read_buf_map
    xfs_read_agi
    xfs_iunlink
    xfs_dir_remove_child
    xfs_remove
    xfs_vn_unlink
    vfs_unlink
    do_unlinkat
    __x64_sys_unlink
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                as (598688)
        12050

The fact that this is contended aside, I'll note the stock semaphore
code does not do adaptive spinning, which avoidably significantly
worsens the impact. You can probably convert this to a rw semaphore
and only ever writelock, which should sort out this aspect. I did not
check what can be done to contend less to begin with.

reproducing:
create a hello world .c file (say /tmp/src.c) and plop into /src:
for i in $(seq 0 23); do cp /tmp/src.c /src/src${i}.c; done

plop the following into will-it-scale/tests/cc.c && ./cc_processes -t 24

#include <sys/types.h>
#include <unistd.h>

char *testcase_description = "compile";

void testcase(unsigned long long *iterations, unsigned long nr)
{
        char cmd[1024];

        sprintf(&cmd, "cc -c -o /tmp/out.%d /src/src%d.c", nr, nr);

        while (1) {
                system(cmd);

                (*iterations)++;
        }
}

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-08-15 22:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-15 12:25 perf loss on parallel compile due to conention on the buf semaphore Mateusz Guzik
2024-08-15 12:26 ` Mateusz Guzik
2024-08-15 22:56 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox