From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4F91DCDB465 for ; Mon, 16 Oct 2023 21:16:29 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=LAnQL8CI; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4S8VKR5tH6z3cBH for ; Tue, 17 Oct 2023 08:16:27 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=LAnQL8CI; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=haren@linux.ibm.com; receiver=lists.ozlabs.org) Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4S8VJN4GMdz2yTx for ; Tue, 17 Oct 2023 08:15:32 +1100 (AEDT) Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39GLF5vh004007; Mon, 16 Oct 2023 21:15:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=7SljkJq01i5Umz36RMHQzu4FdRO4iLS5R+Zxgks2Wv0=; b=LAnQL8CINYAjPBjUnXr4IVfVEMP9wGpL1PQ05gie0acb5hPuENmtzax/IdZCQmMlCN7u 7q1m0FPF+/MQDy2+qG0MC3WuFdZORJEIjOR0dJTkGpoSCWzs+yn4kO1yTA1kGtH00qs0 roidDSN73Wf7NtqqrY/Sf6whOgbIzPwHoGow+pPcufWTysTz/ycJ54CPBb6YZIaaXP5w niPRbBnRHJXIZehGsh4dXDQrH24CW9yMsvd5ZZd4TOnTrXGV0PTcW2/DK+xnc3hPdp8l ll9fEtAHfCQ+talIBsr7rcEsL/0quZYEhg8t/jnWCgR/pt/PI7CdMBkXLsMmBdw2oLNC nw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tscwvr0av-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 Oct 2023 21:15:25 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 39GLFKxp006349; Mon, 16 Oct 2023 21:15:24 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tscwvr0ah-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 Oct 2023 21:15:24 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 39GK0qHZ012855; Mon, 16 Oct 2023 21:15:24 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([172.16.1.72]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3tr5py3rb3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 16 Oct 2023 21:15:24 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 39GLFMNk21365386 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 16 Oct 2023 21:15:22 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 750495805B; Mon, 16 Oct 2023 21:15:22 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 86CE558058; Mon, 16 Oct 2023 21:15:21 +0000 (GMT) Received: from localhost.localdomain (unknown [9.67.91.245]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTP; Mon, 16 Oct 2023 21:15:21 +0000 (GMT) Subject: Re: [PATCH] powerpc/pseries/vas: Migration suspend waits for no in-progress open windows To: Nathan Lynch , linuxppc-dev@lists.ozlabs.org References: <20230927031558.759396-1-haren@linux.ibm.com> <8734yjwoax.fsf@li-e15d104c-2135-11b2-a85c-d7ef17e56be6.ibm.com> <87ttqwvqtx.fsf@li-e15d104c-2135-11b2-a85c-d7ef17e56be6.ibm.com> <87mswiux7s.fsf@li-e15d104c-2135-11b2-a85c-d7ef17e56be6.ibm.com> From: Haren Myneni Message-ID: <93293c23-f6fe-702a-d296-df8add421b2a@linux.ibm.com> Date: Mon, 16 Oct 2023 14:15:20 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <87mswiux7s.fsf@li-e15d104c-2135-11b2-a85c-d7ef17e56be6.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -4Bl1Q-3i8buf3Zr8ndIsg-JF5EKQOw5 X-Proofpoint-ORIG-GUID: jh_xm-nanG1wHOhoyOY53RvCQYoNXfbf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-16_10,2023-10-12_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 mlxscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 adultscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2310160185 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: npiggin@gmail.com Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 10/16/23 1:30 PM, Nathan Lynch wrote: > Nathan Lynch writes: >> Haren Myneni writes: >>>> Haren Myneni writes: >>>>> The hypervisor returns migration failure if all VAS windows are not >>>>> closed. During pre-migration stage, vas_migration_handler() sets >>>>> migration_in_progress flag and closes all windows from the list. >>>>> The allocate VAS window routine checks the migration flag, setup >>>>> the window and then add it to the list. So there is possibility of >>>>> the migration handler missing the window that is still in the >>>>> process of setup. >>>>> >>>>> t1: Allocate and open VAS t2: Migration event >>>>> window >>>>> >>>>> lock vas_pseries_mutex >>>>> If migration_in_progress set >>>>> unlock vas_pseries_mutex >>>>> return >>>>> open window HCALL >>>>> unlock vas_pseries_mutex >>>>> Modify window HCALL lock vas_pseries_mutex >>>>> setup window migration_in_progress=true >>>>> Closes all windows from >>>>> the list >>>>> unlock vas_pseries_mutex >>>>> lock vas_pseries_mutex return >>>>> if nr_closed_windows == 0 >>>>> // No DLPAR CPU or migration >>>>> add to the list >>>>> unlock vas_pseries_mutex >>>>> return >>>>> unlock vas_pseries_mutex >>>>> Close VAS window >>>>> // due to DLPAR CPU or migration >>>>> return -EBUSY >>>> >>>> Could the the path t1 takes simply hold the mutex for the duration of >>>> its execution instead of dropping and reacquiring it in the middle? >>>> >>>> Here's the relevant code from vas_allocate_window(): >>>> >>>> mutex_lock(&vas_pseries_mutex); >>>> if (migration_in_progress) >>>> rc = -EBUSY; >>>> else >>>> rc = allocate_setup_window(txwin, (u64 *)&domain[0], >>>> cop_feat_caps->win_type); >>>> mutex_unlock(&vas_pseries_mutex); >>>> if (rc) >>>> goto out; >>>> >>>> rc = h_modify_vas_window(txwin); >>>> if (!rc) >>>> rc = get_vas_user_win_ref(&txwin->vas_win.task_ref); >>>> if (rc) >>>> goto out_free; >>>> >>>> txwin->win_type = cop_feat_caps->win_type; >>>> mutex_lock(&vas_pseries_mutex); >>>> if (!caps->nr_close_wins) { >>>> list_add(&txwin->win_list, &caps->list); >>>> caps->nr_open_windows++; >>>> mutex_unlock(&vas_pseries_mutex); >>>> vas_user_win_add_mm_context(&txwin->vas_win.task_ref); >>>> return &txwin->vas_win; >>>> } >>>> mutex_unlock(&vas_pseries_mutex); >>>> >>>> Is there something about h_modify_vas_window() or get_vas_user_win_ref() >>>> that requires temporarily dropping the lock? >>>> >>> >>> Thanks Nathan for your comments. >>> >>> vas_pseries_mutex protects window ID and IRQ allocation between alloc >>> and free window HCALLs, and window list. Generally try to not using >>> mutex in HCALLs, but we need this mutex with these HCALLs. >>> >>> We can add h_modify_vas_window() or get_vas_user_win_ref() with in the >>> mutex context, but not needed. >> >> Hmm. I contend that it would fix your bug in a simpler way that >> eliminates the race instead of coping with it by adding more state and >> complicating the locking model. With your change, readers of the >> migration_in_progress flag check it under the mutex, but the writer >> updates it outside of the mutex, which seems strange and unlikely to be >> correct. > > Expanding on this, with your change, migration_in_progress becomes a > boolean atomic_t flag accessed only with atomic_set() and > atomic_read(). These are non-RMW operations. Documentation/atomic_t.txt > says: > > Non-RMW ops: > > The non-RMW ops are (typically) regular LOADs and STOREs and are > canonically implemented using READ_ONCE(), WRITE_ONCE(), > smp_load_acquire() and smp_store_release() respectively. Therefore, if > you find yourself only using the Non-RMW operations of atomic_t, you > do not in fact need atomic_t at all and are doing it wrong. > > So making migration_in_progress an atomic_t does not confer any > advantageous properties to it that it lacks as a plain boolean. > > Considering also (from the same document): > > - non-RMW operations are unordered; > > - RMW operations that have no return value are unordered; > > I am concerned that threads executing these segments of code will not > always observe each others' effects in the intended order: > > // vas_allocate_window() > > mutex_lock(&vas_pseries_mutex); > if (!caps->nr_close_wins && !atomic_read(&migration_in_progress)) { > list_add(&txwin->win_list, &caps->list); > caps->nr_open_windows++; > atomic_dec(&caps->nr_open_wins_progress); > mutex_unlock(&vas_pseries_mutex); > vas_user_win_add_mm_context(&txwin->vas_win.task_ref); > return &txwin->vas_win; > } > mutex_unlock(&vas_pseries_mutex); > ... > atomic_dec(&caps->nr_open_wins_progress); > wake_up(&open_win_progress_wq); > > // vas_migration_handler() > > atomic_set(&migration_in_progress, 1); > ... > mutex_lock(&vas_pseries_mutex); > rc = reconfig_close_windows(vcaps, vcaps->nr_open_windows, > true); > mutex_unlock(&vas_pseries_mutex); > /* > * Windows are included in the list after successful > * open. So wait for closing these in-progress open > * windows in vas_allocate_window() which will be > * done if the migration_in_progress is set. > */ > rc = wait_event_interruptible(open_win_progress_wq, > !atomic_read(&vcaps->nr_open_wins_progress)); > > Maybe it's OK in practice for some reason? I'm finding it difficult to > reason about :-) > Thanks for the review. Should be OK in this case since holding mutex before reading. But as you pointed, migration_in_progress flag should be just boolean. I will repost the patch with this change. Thanks Haren