All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Wiesner <jwiesner@suse.de>
To: Waiman Long <longman@redhat.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>, "Will Deacon" <will@kernel.org>,
	"Boqun Feng" <boqun.feng@gmail.com>,
	linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com,
	"Hillf Danton" <hdanton@sina.com>,
	"Mukesh Ojha" <quic_mojha@quicinc.com>,
	"Ting11 Wang 王婷" <wangting11@xiaomi.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v6 1/6] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
Date: Fri, 16 Dec 2022 16:02:50 +0100	[thread overview]
Message-ID: <20221216150250.GA18361@incl> (raw)
In-Reply-To: <20221118022016.462070-2-longman@redhat.com>

On Thu, Nov 17, 2022 at 09:20:11PM -0500, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
> 
>   Non-first RT waiter    First waiter      Lock holder
>   -------------------    ------------      -----------
>   Acquire wait_lock
>   rwsem_try_write_lock():
>     Set handoff bit if RT or
>       wait too long
>     Set waiter->handoff_set
>   Release wait_lock
>                          Acquire wait_lock
>                          Inherit waiter->handoff_set
>                          Release wait_lock
> 					   Clear owner
>                                            Release lock
>   if (waiter.handoff_set) {
>     rwsem_spin_on_owner(();
>     if (OWNER_NULL)
>       goto trylock_again;
>   }
>   trylock_again:
>   Acquire wait_lock
>   rwsem_try_write_lock():
>      if (first->handoff_set && (waiter != first))
>      	return false;
>   Release wait_lock
> 
> A non-first waiter cannot really acquire the rwsem even if it mistakenly
> believes that it can spin on OWNER_NULL value. If that waiter happens
> to be an RT task running on the same CPU as the first waiter, it can
> block the first waiter from acquiring the rwsem leading to live lock.
> Fix this problem by making sure that a non-first waiter cannot spin in
> the slowpath loop without sleeping.
> 
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Reviewed-and-tested-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> Cc: stable@vger.kernel.org
> ---

I was checking if commit 6eebd5fb2083 ("locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter") resolves the issue that was discussed in [1]. I modified the program and script slighly:
fsim.c:
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
void sig_handle(int sig) { exit(0); }
int main(void)
{
        unsigned long c;
        signal(SIGALRM, sig_handle);
        alarm(3);
        while (1)
                c++;
}

run-fsim.sh:
#!/bin/bash
if [ ! -e fsim ]; then
        gcc -o fsim fsim.c
        if [ $? -ne 0 ]; then
                echo Failed to compile fsim
                exit -1
        fi
fi
MAX_ITERATIONS=20000
#The fsim processes are meant to run on both logical CPUs belonging to a CPU core, e.g. 1 and 129.
CPU_RANGE1="${1:-1 11}"
CPU_RANGE2="${1:-129 139}"
for i in `seq 1 $MAX_ITERATIONS`; do
        echo "Start $i/$MAX_ITERATIONS: `date`"
        for CPU in `seq $CPU_RANGE1` `seq $CPU_RANGE2`; do
                taskset -c $CPU chrt -r 10 ./fsim &>/dev/null &
                taskset -c $CPU chrt -r 20 ./fsim &>/dev/null &
                taskset -c $CPU chrt -r 30 ./fsim &>/dev/null &
                taskset -c $CPU chrt -r 40 ./fsim &>/dev/null &
        done
        echo "Wait  $i/$MAX_ITERATIONS: `date`"
        wait
done

No soft lockups were triggered but after 1.5 hours of testing, the fsim processes got stuck and only one of them was visible in the output of top:
> top - 18:45:01 up 44 min,  3 users,  load average: 72.00, 71.04, 54.81
> Tasks: 2226 total,   4 running, 2222 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.0 us,  0.4 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> MiB Mem : 239777.1+total, 235671.7+free, 4332.156 used, 1435.641 buff/cache
> MiB Swap: 1023.996 total, 1023.996 free,    0.000 used. 235444.9+avail Mem
>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> 100666 root     -31   0       0      0      0 D 94.84 0.000  14:59.40 fsim
>  98193 root      20   0   42224   6844   3484 R 0.794 0.003   0:07.05 top
>      1 root      20   0   79220  12544   9124 S 0.000 0.005   0:11.95 systemd

All of the fsim processes got stuck at the same code path - while exiting:
> [ 2462.649033] INFO: task fsim:100600 blocked for more than 491 seconds.
> [ 2462.649036]       Tainted: G            E     N 5.14.21-sle15-sp5-221214-hoff3-7 #8
> [ 2462.649038] task:fsim            state:D stack:    0 pid:100600 ppid: 95456 flags:0x00000000
> [ 2462.649042] Call Trace:
> [ 2462.649045]  <TASK>
> [ 2462.649046]  __schedule+0x2cd/0x1140
> [ 2462.649059]  schedule+0x5c/0xc0
> [ 2462.649061]  rwsem_down_write_slowpath+0x349/0x5d0
> [ 2462.649070]  unlink_file_vma+0x2d/0x60
> [ 2462.649074]  free_pgtables+0x67/0x110
> [ 2462.649083]  exit_mmap+0xaf/0x1f0
> [ 2462.649088]  mmput+0x56/0x120
> [ 2462.649090]  do_exit+0x306/0xb50
> [ 2462.649095]  do_group_exit+0x3a/0xa0
> [ 2462.649098]  __x64_sys_exit_group+0x14/0x20
> [ 2462.649102]  do_syscall_64+0x5b/0x80
> [ 2462.649116]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
> [ 2462.649120] RIP: 0033:0x7f90abae6c46
> [ 2462.649122] RSP: 002b:00007ffc0ca21638 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [ 2462.649124] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f90abae6c46
> [ 2462.649125] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> [ 2462.649127] RBP: 00007f90abdf5970 R08: 00000000000000e7 R09: ffffffffffffff80
> [ 2462.649128] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f90abdf5970
> [ 2462.649129] R13: 0000000000000001 R14: 00007f90abdf9328 R15: 0000000000000000
> [ 2462.649132]  </TASK>
> [ 2462.649133] INFO: task fsim:100601 blocked for more than 491 seconds.
> [ 2462.649216] INFO: task fsim:100603 blocked for more than 491 seconds.
> [ 2462.649295] INFO: task fsim:100604 blocked for more than 491 seconds.
> [ 2462.649371] INFO: task fsim:100605 blocked for more than 491 seconds.
> [ 2462.649449] INFO: task fsim:100606 blocked for more than 491 seconds.
> [ 2462.649526] INFO: task fsim:100607 blocked for more than 491 seconds.
> [ 2462.649606] INFO: task fsim:100608 blocked for more than 491 seconds.
> [ 2462.649676] INFO: task fsim:100609 blocked for more than 491 seconds.

So, I tested these fixes all together added on top of 6eebd5fb2083:
* locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
* locking/rwsem: Disable preemption at all down_read*() and up_read() code paths
* locking/rwsem: Disable preemption at all down_write*() and up_write() code paths

After 20 hours of runtime, none of the fsim processes got stuck nor any soft lockups occurred. AFAICT, it works.
Tested-by: Jiri Wiesner <jwiesner@suse.de>

[1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/

-- 
Jiri Wiesner
SUSE Labs

  reply	other threads:[~2022-12-16 15:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18  2:20 [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2022-11-18  2:20 ` [PATCH v6 1/6] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath Waiman Long
2022-12-16 15:02   ` Jiri Wiesner [this message]
2023-01-20 22:58   ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 2/6] locking/rwsem: Disable preemption at all down_read*() and up_read() code paths Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 3/6] locking/rwsem: Disable preemption at all down_write*() and up_write() " Waiman Long
2022-12-16 15:03   ` Jiri Wiesner
2022-11-18  2:20 ` [PATCH v6 4/6] locking/rwsem: Change waiter->hanodff_set to a handoff_state enum Waiman Long
2022-11-18  2:20 ` [PATCH v6 5/6] locking/rwsem: Enable direct rwsem lock handoff Waiman Long
2023-01-23 14:59   ` Peter Zijlstra
2023-01-23 17:30     ` Waiman Long
2023-01-23 22:07       ` Waiman Long
2023-01-24 12:58         ` Peter Zijlstra
2023-01-24 12:29       ` Peter Zijlstra
2023-01-25  1:53         ` Waiman Long
2022-11-18  2:20 ` [PATCH v6 6/6] locking/rwsem: Update handoff lock events tracking Waiman Long
2023-01-17 20:53 ` [PATCH v6 0/6] lockinig/rwsem: Fix rwsem bugs & enable true lock handoff Waiman Long
2023-01-22 13:46 ` Peter Zijlstra
2023-01-23  3:40   ` Waiman Long
2023-01-23 21:10     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221216150250.GA18361@incl \
    --to=jwiesner@suse.de \
    --cc=boqun.feng@gmail.com \
    --cc=hdanton@sina.com \
    --cc=john.p.donnelly@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quic_mojha@quicinc.com \
    --cc=stable@vger.kernel.org \
    --cc=wangting11@xiaomi.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.