From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18FDAC7EE37 for ; Thu, 8 Jun 2023 15:50:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236732AbjFHPuj (ORCPT ); Thu, 8 Jun 2023 11:50:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236736AbjFHPui (ORCPT ); Thu, 8 Jun 2023 11:50:38 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FE4930EF; Thu, 8 Jun 2023 08:50:12 -0700 (PDT) Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 358FGlFn005695; Thu, 8 Jun 2023 15:48:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=tMWkrHWmdf0IYFFE01r4wAwCnOFP+Qrc/B/Q6JRs7Yw=; b=DTy60rdZZgh016ux5vpfVgalKqqM+vKEGoo5uLgRt1twE3XZKvHMmQ8lh/H7HdT7wLZw 2gZHVI4ZDNDAyHNEyeHhIWBFqUFu+7etr1drsnnoLcvN0lLBl4/gotbDxkOTHV50Nx+q 0AFJow3PLyKQF+DIQEs8MKVoS2KUIFx7oARSH1/vuFiq08XvGTeacsy3Msj3qEWv0MP6 9fnZ9kJlFmhYfVgaVbGb2j/e7L4p59ewaqD/ENOovf7gVMbTviXZiglP4xuGssuDWRLN ekLhLsz8UfCCQOjGRddzEKBY/5eupoF5EVM0zN2sfTxws7+65EhKaV53MCjpoI+AVHFv Kg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710w1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:08 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 358FHAQQ007204; Thu, 8 Jun 2023 15:48:07 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710uy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:07 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3586jH87011521; Thu, 8 Jun 2023 15:48:04 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3r2a77hbe5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:04 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 358Fm0LF25756168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Jun 2023 15:48:00 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B537520043; Thu, 8 Jun 2023 15:48:00 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 90A1A2004B; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Received: from thinkpad-T15 (unknown [9.179.28.214]) by smtpav04.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Date: Thu, 8 Jun 2023 17:47:56 +0200 From: Gerald Schaefer To: Hugh Dickins Cc: Vasily Gorbik , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Message-ID: <20230608174756.27cace18@thinkpad-T15> In-Reply-To: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> <175ebec8-761-c3f-2d98-6c3bd87161c8@google.com> <20230606214037.09c6b280@thinkpad-T15> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: j2pxU5M5eHqEDuvzUa-GCOc-JmsCbsws X-Proofpoint-GUID: CM_eGiry6q8Eks-V-DjTbf2yMscXJE-F X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-08_11,2023-06-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 lowpriorityscore=0 malwarescore=0 bulkscore=0 spamscore=0 phishscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306080136 Precedence: bulk List-ID: X-Mailing-List: linux-s390@vger.kernel.org On Wed, 7 Jun 2023 20:35:05 -0700 (PDT) Hugh Dickins wrote: > On Tue, 6 Jun 2023, Gerald Schaefer wrote: > > On Mon, 5 Jun 2023 22:11:52 -0700 (PDT) > > Hugh Dickins wrote: =20 > > > On Thu, 1 Jun 2023 15:57:51 +0200 > > > Gerald Schaefer wrote: =20 > > > >=20 > > > > Yes, we have 2 pagetables in one 4K page, which could result in same > > > > rcu_head reuse. It might be possible to use the cleverness from our > > > > page_table_free() function, e.g. to only do the call_rcu() once, for > > > > the case where both 2K pagetable fragments become unused, similar to > > > > how we decide when to actually call __free_page(). > > > >=20 > > > > However, it might be much worse, and page->rcu_head from a pagetable > > > > page cannot be used at all for s390, because we also use page->lru > > > > to keep our list of free 2K pagetable fragments. I always get confu= sed > > > > by struct page unions, so not completely sure, but it seems to me t= hat > > > > page->rcu_head would overlay with page->lru, right? =20 > > >=20 > > > Sigh, yes, page->rcu_head overlays page->lru. But (please correct me= if > > > I'm wrong) I think that s390 could use exactly the same technique for > > > its list of free 2K pagetable fragments as it uses for its list of THP > > > "deposited" pagetable fragments, over in arch/s390/mm/pgtable.c: use > > > the first two longs of the page table itself for threading the list. = =20 > >=20 > > Nice idea, I think that could actually work, since we only need the emp= ty > > 2K halves on the list. So it should be possible to store the list_head > > inside those. =20 >=20 > Jason quickly pointed out the flaw in my thinking there. Yes, while I had the right concerns about "the to-be-freed pagetables would still be accessible, but not really valid, if we added them back to the lis= t, with list_heads inside them", when suggesting the approach w/o passing over the mm, I missed that we would have the very same issue already with the existing page_table_free_rcu(). Thankfully Jason was watching out! >=20 > > =20 > > >=20 > > > And while it could use third and fourth longs instead, I don't see any > > > need for that: a deposited pagetable has been allocated, so would not > > > be on the list of free fragments. =20 > >=20 > > Correct, that should not interfere. > > =20 > > >=20 > > > Below is one of the grossest patches I've ever posted: gross because > > > it's a rushed attempt to see whether that is viable, while it would t= ake > > > me longer to understand all the s390 cleverness there (even though the > > > PP AA commentary above page_table_alloc() is excellent). =20 > >=20 > > Sounds fair, this is also one of the grossest code we have, which is al= so > > why Alexander added the comment. I guess we could need even more commen= ts > > inside the code, as it still confuses me more than it should. > >=20 > > Considering that, you did remarkably well. Your patch seems to work fin= e, > > at least it survived some LTP mm tests. I will also add it to our CI ru= ns, > > to give it some more testing. Will report tomorrow when it broke someth= ing. > > See also below for some patch comments. =20 >=20 > Many thanks for your effort on this patch. I don't expect the testing > of it to catch Jason's point, that I'm corrupting the page table while > it's on its way through RCU to being freed, but he's right nonetheless. Right, tests ran fine, but we would have introduced subtle issues with racing gup_fast, I guess. >=20 > I'll integrate your fixes below into what I have here, but probably > just archive it as something to refer to later in case it might play > a part; but probably it will not - sorry for wasting your time. No worries, looking at that s390 code can never be amiss. It seems I need regular refresh, at least I'm sure I already understood it better in the past. And who knows, with Jasons recent thoughts, that "list_head inside pagetable" idea might not be dead yet. >=20 > > =20 > > >=20 > > > I'm hoping the use of page->lru in arch/s390/mm/gmap.c is disjoint. > > > And cmma_init_nodat()? Ah, that's __init so I guess disjoint. =20 > >=20 > > cmma_init_nodat() should be disjoint, not only because it is __init, > > but also because it explicitly skips pagetable pages, so it should > > never touch page->lru of those. > >=20 > > Not very familiar with the gmap code, it does look disjoint, and we sho= uld > > also use complete 4K pages for pagetables instead of 2K fragments there, > > but Christian or Claudio should also have a look. > > =20 > > >=20 > > > Gerald, s390 folk: would it be possible for you to give this > > > a try, suggest corrections and improvements, and then I can make it > > > a separate patch of the series; and work on avoiding concurrent use > > > of the rcu_head by pagetable fragment buddies (ideally fit in with > > > the scheme already there, maybe DD bits to go along with the PP AA). = =20 > >=20 > > It feels like it could be possible to not only avoid the double > > rcu_head, but also avoid passing over the mm via page->pt_mm. > > I.e. have pte_free_defer(), which has the mm, do all the checks and > > list updates that page_table_free() does, for which we need the mm. > > Then just skip the pgtable_pte_page_dtor() + __free_page() at the end, > > and do call_rcu(pte_free_now) instead. The pte_free_now() could then > > just do _dtor/__free_page similar to the generic version. =20 >=20 > I'm not sure: I missed your suggestion there when I first skimmed > through, and today have spent more time getting deeper into how it's > done at present. I am now feeling more confident of a way forward, > a nicely integrated way forward, than I was yesterday. > Though getting it right may not be so easy. I think my "feeling" was a d=C3=A9j=C3=A0 vu of the existing logic that we = use for page_table_free_rcu() -> __tlb_remove_table(), where we also have no mm any more at the end, and use the PP bits magic to find out if the page can be freed, or if we still have fragments left. Of course, in that case, we also would not need the mm any more for list handling, as the to-be-freed fragments were already put back on the list, but with PP bits set, to prevent re-use. And clearing those would then make the fragment usable from the list again. I guess that would also be the major difference here, i.e. your RCU call-back would need to be able to add fragments back to the list, after having them removed before to make room for page->rcu_head, but with Jasons thoughts that does not seem so impossible after all. I do not yet understand if the list_head would then compulsorily need to be inside the pagetable, because page->rcu_head/lru still cannot be used (again). But you already have a patch for that, so either way might be possible. >=20 > When Jason pointed out the existing RCU, I initially hoped that it might > already provide the necessary framework: but sadly not, because the > unbatched case (used when additional memory is not available) does not > use RCU at all, but instead the tlb_remove_table_sync_one() IRQ hack. > If I used that, it would cripple the s390 implementation unacceptably. >=20 > >=20 > > I must admit that I still have no good overview of the "big picture" > > here, and especially if this approach would still fit in. Probably not, > > as the to-be-freed pagetables would still be accessible, but not really > > valid, if we added them back to the list, with list_heads inside them. > > So maybe call_rcu() has to be done always, and not only for the case > > where the whole 4K page becomes free, then we probably cannot do w/o > > passing over the mm for proper list handling. =20 >=20 > My current thinking (but may be proved wrong) is along the lines of: > why does something on its way to being freed need to be on any list > than the rcu_head list? I expect the current answer is, that the > other half is allocated, so the page won't be freed; but I hope that > we can put it back on that list once we're through with the rcu_head. Yes, that looks promising. Such a fragment would not necessarily need to be on the list, because while it is on its way, i.e. before the RCU call-back finished, it cannot be re-used anyway. page_table_alloc() could currently find such a fragment on the list, but only to see the PP bits set, so it will not use it. Only after __tlb_remove_table() in the RCU call-back resets the bits, it would be usable again. In your case, that could correspond to adding it back to the list. That could even be an improvement, because page_table_alloc() would not be bothered by such unusable fragments. [...] >=20 > Is it too early to wish you a happy reverse Xmas? Nice idea, we should make June 24th the reverse Xmas Remembrance Day :-) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 538C1C7EE25 for ; Thu, 8 Jun 2023 15:50:15 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QcTF12tGMz3f0C for ; Fri, 9 Jun 2023 01:50:13 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=DTy60rdZ; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=gerald.schaefer@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=DTy60rdZ; dkim-atps=neutral Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QcTD053k1z3dw1 for ; Fri, 9 Jun 2023 01:49:20 +1000 (AEST) Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 358FGlFn005695; Thu, 8 Jun 2023 15:48:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=tMWkrHWmdf0IYFFE01r4wAwCnOFP+Qrc/B/Q6JRs7Yw=; b=DTy60rdZZgh016ux5vpfVgalKqqM+vKEGoo5uLgRt1twE3XZKvHMmQ8lh/H7HdT7wLZw 2gZHVI4ZDNDAyHNEyeHhIWBFqUFu+7etr1drsnnoLcvN0lLBl4/gotbDxkOTHV50Nx+q 0AFJow3PLyKQF+DIQEs8MKVoS2KUIFx7oARSH1/vuFiq08XvGTeacsy3Msj3qEWv0MP6 9fnZ9kJlFmhYfVgaVbGb2j/e7L4p59ewaqD/ENOovf7gVMbTviXZiglP4xuGssuDWRLN ekLhLsz8UfCCQOjGRddzEKBY/5eupoF5EVM0zN2sfTxws7+65EhKaV53MCjpoI+AVHFv Kg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710w1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:08 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 358FHAQQ007204; Thu, 8 Jun 2023 15:48:07 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710uy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:07 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3586jH87011521; Thu, 8 Jun 2023 15:48:04 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3r2a77hbe5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:04 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 358Fm0LF25756168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Jun 2023 15:48:00 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B537520043; Thu, 8 Jun 2023 15:48:00 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 90A1A2004B; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Received: from thinkpad-T15 (unknown [9.179.28.214]) by smtpav04.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Date: Thu, 8 Jun 2023 17:47:56 +0200 From: Gerald Schaefer To: Hugh Dickins Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Message-ID: <20230608174756.27cace18@thinkpad-T15> In-Reply-To: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> <175ebec8-761-c3f-2d98-6c3bd87161c8@google.com> <20230606214037.09c6b280@thinkpad-T15> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: j2pxU5M5eHqEDuvzUa-GCOc-JmsCbsws X-Proofpoint-GUID: CM_eGiry6q8Eks-V-DjTbf2yMscXJE-F X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-08_11,2023-06-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 lowpriorityscore=0 malwarescore=0 bulkscore=0 spamscore=0 phishscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306080136 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Miaohe Lin , David Hildenbrand , Peter Zijlstra , Yang Shi , Peter Xu , linux-kernel@vger.kernel.org, Song Liu , sparclinux@vger.kernel.org, Alexander Gordeev , Claudio Imbrenda , Will Deacon , linux-s390@vger.kernel.org, Yu Zhao , Ira Weiny , Alistair Popple , Russell King , Matthew Wilcox , Steven Price , Christoph Hellwig , Jason Gunthorpe , "Aneesh Kumar K.V" , Axel Rasmussen , Christian Borntraeger , Thomas Hellstrom , Ralph Campbell , Pasha Tatashin , Vasily Gorbik , Anshuman Khandual , Heiko Carstens , Qi Zheng , Suren Baghdasaryan , linux-arm-kernel@lists.infradead.org, SeongJae Park , Jann Horn , linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi , Zack Rusin , Minchan Kim , "Kirill A. Shutemov" , Andrew Morton , Mel Gorman , "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, 7 Jun 2023 20:35:05 -0700 (PDT) Hugh Dickins wrote: > On Tue, 6 Jun 2023, Gerald Schaefer wrote: > > On Mon, 5 Jun 2023 22:11:52 -0700 (PDT) > > Hugh Dickins wrote: =20 > > > On Thu, 1 Jun 2023 15:57:51 +0200 > > > Gerald Schaefer wrote: =20 > > > >=20 > > > > Yes, we have 2 pagetables in one 4K page, which could result in same > > > > rcu_head reuse. It might be possible to use the cleverness from our > > > > page_table_free() function, e.g. to only do the call_rcu() once, for > > > > the case where both 2K pagetable fragments become unused, similar to > > > > how we decide when to actually call __free_page(). > > > >=20 > > > > However, it might be much worse, and page->rcu_head from a pagetable > > > > page cannot be used at all for s390, because we also use page->lru > > > > to keep our list of free 2K pagetable fragments. I always get confu= sed > > > > by struct page unions, so not completely sure, but it seems to me t= hat > > > > page->rcu_head would overlay with page->lru, right? =20 > > >=20 > > > Sigh, yes, page->rcu_head overlays page->lru. But (please correct me= if > > > I'm wrong) I think that s390 could use exactly the same technique for > > > its list of free 2K pagetable fragments as it uses for its list of THP > > > "deposited" pagetable fragments, over in arch/s390/mm/pgtable.c: use > > > the first two longs of the page table itself for threading the list. = =20 > >=20 > > Nice idea, I think that could actually work, since we only need the emp= ty > > 2K halves on the list. So it should be possible to store the list_head > > inside those. =20 >=20 > Jason quickly pointed out the flaw in my thinking there. Yes, while I had the right concerns about "the to-be-freed pagetables would still be accessible, but not really valid, if we added them back to the lis= t, with list_heads inside them", when suggesting the approach w/o passing over the mm, I missed that we would have the very same issue already with the existing page_table_free_rcu(). Thankfully Jason was watching out! >=20 > > =20 > > >=20 > > > And while it could use third and fourth longs instead, I don't see any > > > need for that: a deposited pagetable has been allocated, so would not > > > be on the list of free fragments. =20 > >=20 > > Correct, that should not interfere. > > =20 > > >=20 > > > Below is one of the grossest patches I've ever posted: gross because > > > it's a rushed attempt to see whether that is viable, while it would t= ake > > > me longer to understand all the s390 cleverness there (even though the > > > PP AA commentary above page_table_alloc() is excellent). =20 > >=20 > > Sounds fair, this is also one of the grossest code we have, which is al= so > > why Alexander added the comment. I guess we could need even more commen= ts > > inside the code, as it still confuses me more than it should. > >=20 > > Considering that, you did remarkably well. Your patch seems to work fin= e, > > at least it survived some LTP mm tests. I will also add it to our CI ru= ns, > > to give it some more testing. Will report tomorrow when it broke someth= ing. > > See also below for some patch comments. =20 >=20 > Many thanks for your effort on this patch. I don't expect the testing > of it to catch Jason's point, that I'm corrupting the page table while > it's on its way through RCU to being freed, but he's right nonetheless. Right, tests ran fine, but we would have introduced subtle issues with racing gup_fast, I guess. >=20 > I'll integrate your fixes below into what I have here, but probably > just archive it as something to refer to later in case it might play > a part; but probably it will not - sorry for wasting your time. No worries, looking at that s390 code can never be amiss. It seems I need regular refresh, at least I'm sure I already understood it better in the past. And who knows, with Jasons recent thoughts, that "list_head inside pagetable" idea might not be dead yet. >=20 > > =20 > > >=20 > > > I'm hoping the use of page->lru in arch/s390/mm/gmap.c is disjoint. > > > And cmma_init_nodat()? Ah, that's __init so I guess disjoint. =20 > >=20 > > cmma_init_nodat() should be disjoint, not only because it is __init, > > but also because it explicitly skips pagetable pages, so it should > > never touch page->lru of those. > >=20 > > Not very familiar with the gmap code, it does look disjoint, and we sho= uld > > also use complete 4K pages for pagetables instead of 2K fragments there, > > but Christian or Claudio should also have a look. > > =20 > > >=20 > > > Gerald, s390 folk: would it be possible for you to give this > > > a try, suggest corrections and improvements, and then I can make it > > > a separate patch of the series; and work on avoiding concurrent use > > > of the rcu_head by pagetable fragment buddies (ideally fit in with > > > the scheme already there, maybe DD bits to go along with the PP AA). = =20 > >=20 > > It feels like it could be possible to not only avoid the double > > rcu_head, but also avoid passing over the mm via page->pt_mm. > > I.e. have pte_free_defer(), which has the mm, do all the checks and > > list updates that page_table_free() does, for which we need the mm. > > Then just skip the pgtable_pte_page_dtor() + __free_page() at the end, > > and do call_rcu(pte_free_now) instead. The pte_free_now() could then > > just do _dtor/__free_page similar to the generic version. =20 >=20 > I'm not sure: I missed your suggestion there when I first skimmed > through, and today have spent more time getting deeper into how it's > done at present. I am now feeling more confident of a way forward, > a nicely integrated way forward, than I was yesterday. > Though getting it right may not be so easy. I think my "feeling" was a d=C3=A9j=C3=A0 vu of the existing logic that we = use for page_table_free_rcu() -> __tlb_remove_table(), where we also have no mm any more at the end, and use the PP bits magic to find out if the page can be freed, or if we still have fragments left. Of course, in that case, we also would not need the mm any more for list handling, as the to-be-freed fragments were already put back on the list, but with PP bits set, to prevent re-use. And clearing those would then make the fragment usable from the list again. I guess that would also be the major difference here, i.e. your RCU call-back would need to be able to add fragments back to the list, after having them removed before to make room for page->rcu_head, but with Jasons thoughts that does not seem so impossible after all. I do not yet understand if the list_head would then compulsorily need to be inside the pagetable, because page->rcu_head/lru still cannot be used (again). But you already have a patch for that, so either way might be possible. >=20 > When Jason pointed out the existing RCU, I initially hoped that it might > already provide the necessary framework: but sadly not, because the > unbatched case (used when additional memory is not available) does not > use RCU at all, but instead the tlb_remove_table_sync_one() IRQ hack. > If I used that, it would cripple the s390 implementation unacceptably. >=20 > >=20 > > I must admit that I still have no good overview of the "big picture" > > here, and especially if this approach would still fit in. Probably not, > > as the to-be-freed pagetables would still be accessible, but not really > > valid, if we added them back to the list, with list_heads inside them. > > So maybe call_rcu() has to be done always, and not only for the case > > where the whole 4K page becomes free, then we probably cannot do w/o > > passing over the mm for proper list handling. =20 >=20 > My current thinking (but may be proved wrong) is along the lines of: > why does something on its way to being freed need to be on any list > than the rcu_head list? I expect the current answer is, that the > other half is allocated, so the page won't be freed; but I hope that > we can put it back on that list once we're through with the rcu_head. Yes, that looks promising. Such a fragment would not necessarily need to be on the list, because while it is on its way, i.e. before the RCU call-back finished, it cannot be re-used anyway. page_table_alloc() could currently find such a fragment on the list, but only to see the PP bits set, so it will not use it. Only after __tlb_remove_table() in the RCU call-back resets the bits, it would be usable again. In your case, that could correspond to adding it back to the list. That could even be an improvement, because page_table_alloc() would not be bothered by such unusable fragments. [...] >=20 > Is it too early to wish you a happy reverse Xmas? Nice idea, we should make June 24th the reverse Xmas Remembrance Day :-) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C0C7C7EE23 for ; Thu, 8 Jun 2023 15:49:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=zZbx2hGB0knEjAoD+EiTMRPxCsiZzszKeaQFBG+aKtA=; b=0qy8tGIe2IiiBH biLSj07WjNaOBHsdO0i8pTu7zZmKkMlu4xg2aIVPKWUN3NjiNNf1axI2iJO3eYxqHudw5tL6785xW cZf/P6CZ+PInA3uYNsBa/wHqI/f1Rcoi1IQV0ziho0xVDLKQPz6ZoLydgABcUyhn++pMtIJfLmzY/ NEdG4dGQCOb+wdXlKMeMGXYFw/p1prTMgw2Rj2XCo9gl2HP/B3ZGvgz+xv8TVDzGf1At5IId0gW9r eC5oMtqma/mZsiYyslN67RpB2Kw951ZqgdahAfkxGx7TK0qhj7AsSCdjTVJaiP4+/XCI8hh2YPvpF 3rnZtcph/eOMn3e+3I7Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q7Ht1-009mnt-2V; Thu, 08 Jun 2023 15:49:19 +0000 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q7Hsy-009mmm-18 for linux-arm-kernel@lists.infradead.org; Thu, 08 Jun 2023 15:49:18 +0000 Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 358FGlFn005695; Thu, 8 Jun 2023 15:48:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pp1; bh=tMWkrHWmdf0IYFFE01r4wAwCnOFP+Qrc/B/Q6JRs7Yw=; b=DTy60rdZZgh016ux5vpfVgalKqqM+vKEGoo5uLgRt1twE3XZKvHMmQ8lh/H7HdT7wLZw 2gZHVI4ZDNDAyHNEyeHhIWBFqUFu+7etr1drsnnoLcvN0lLBl4/gotbDxkOTHV50Nx+q 0AFJow3PLyKQF+DIQEs8MKVoS2KUIFx7oARSH1/vuFiq08XvGTeacsy3Msj3qEWv0MP6 9fnZ9kJlFmhYfVgaVbGb2j/e7L4p59ewaqD/ENOovf7gVMbTviXZiglP4xuGssuDWRLN ekLhLsz8UfCCQOjGRddzEKBY/5eupoF5EVM0zN2sfTxws7+65EhKaV53MCjpoI+AVHFv Kg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710w1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:08 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 358FHAQQ007204; Thu, 8 Jun 2023 15:48:07 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r3hg710uy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:07 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3586jH87011521; Thu, 8 Jun 2023 15:48:04 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3r2a77hbe5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 08 Jun 2023 15:48:04 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 358Fm0LF25756168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 8 Jun 2023 15:48:00 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B537520043; Thu, 8 Jun 2023 15:48:00 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 90A1A2004B; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Received: from thinkpad-T15 (unknown [9.179.28.214]) by smtpav04.fra02v.mail.ibm.com (Postfix) with SMTP; Thu, 8 Jun 2023 15:47:58 +0000 (GMT) Date: Thu, 8 Jun 2023 17:47:56 +0200 From: Gerald Schaefer To: Hugh Dickins Cc: Vasily Gorbik , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 07/12] s390: add pte_free_defer(), with use of mmdrop_async() Message-ID: <20230608174756.27cace18@thinkpad-T15> In-Reply-To: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <6dd63b39-e71f-2e8b-7e0-83e02f3bcb39@google.com> <175ebec8-761-c3f-2d98-6c3bd87161c8@google.com> <20230606214037.09c6b280@thinkpad-T15> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: j2pxU5M5eHqEDuvzUa-GCOc-JmsCbsws X-Proofpoint-GUID: CM_eGiry6q8Eks-V-DjTbf2yMscXJE-F X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-08_11,2023-06-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 lowpriorityscore=0 malwarescore=0 bulkscore=0 spamscore=0 phishscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306080136 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230608_084916_401898_6E0E5A79 X-CRM114-Status: GOOD ( 78.49 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gV2VkLCA3IEp1biAyMDIzIDIwOjM1OjA1IC0wNzAwIChQRFQpCkh1Z2ggRGlja2lucyA8aHVn aGRAZ29vZ2xlLmNvbT4gd3JvdGU6Cgo+IE9uIFR1ZSwgNiBKdW4gMjAyMywgR2VyYWxkIFNjaGFl ZmVyIHdyb3RlOgo+ID4gT24gTW9uLCA1IEp1biAyMDIzIDIyOjExOjUyIC0wNzAwIChQRFQpCj4g PiBIdWdoIERpY2tpbnMgPGh1Z2hkQGdvb2dsZS5jb20+IHdyb3RlOiAgCj4gPiA+IE9uIFRodSwg MSBKdW4gMjAyMyAxNTo1Nzo1MSArMDIwMAo+ID4gPiBHZXJhbGQgU2NoYWVmZXIgPGdlcmFsZC5z Y2hhZWZlckBsaW51eC5pYm0uY29tPiB3cm90ZTogIAo+ID4gPiA+IAo+ID4gPiA+IFllcywgd2Ug aGF2ZSAyIHBhZ2V0YWJsZXMgaW4gb25lIDRLIHBhZ2UsIHdoaWNoIGNvdWxkIHJlc3VsdCBpbiBz YW1lCj4gPiA+ID4gcmN1X2hlYWQgcmV1c2UuIEl0IG1pZ2h0IGJlIHBvc3NpYmxlIHRvIHVzZSB0 aGUgY2xldmVybmVzcyBmcm9tIG91cgo+ID4gPiA+IHBhZ2VfdGFibGVfZnJlZSgpIGZ1bmN0aW9u LCBlLmcuIHRvIG9ubHkgZG8gdGhlIGNhbGxfcmN1KCkgb25jZSwgZm9yCj4gPiA+ID4gdGhlIGNh c2Ugd2hlcmUgYm90aCAySyBwYWdldGFibGUgZnJhZ21lbnRzIGJlY29tZSB1bnVzZWQsIHNpbWls YXIgdG8KPiA+ID4gPiBob3cgd2UgZGVjaWRlIHdoZW4gdG8gYWN0dWFsbHkgY2FsbCBfX2ZyZWVf cGFnZSgpLgo+ID4gPiA+IAo+ID4gPiA+IEhvd2V2ZXIsIGl0IG1pZ2h0IGJlIG11Y2ggd29yc2Us IGFuZCBwYWdlLT5yY3VfaGVhZCBmcm9tIGEgcGFnZXRhYmxlCj4gPiA+ID4gcGFnZSBjYW5ub3Qg YmUgdXNlZCBhdCBhbGwgZm9yIHMzOTAsIGJlY2F1c2Ugd2UgYWxzbyB1c2UgcGFnZS0+bHJ1Cj4g PiA+ID4gdG8ga2VlcCBvdXIgbGlzdCBvZiBmcmVlIDJLIHBhZ2V0YWJsZSBmcmFnbWVudHMuIEkg YWx3YXlzIGdldCBjb25mdXNlZAo+ID4gPiA+IGJ5IHN0cnVjdCBwYWdlIHVuaW9ucywgc28gbm90 IGNvbXBsZXRlbHkgc3VyZSwgYnV0IGl0IHNlZW1zIHRvIG1lIHRoYXQKPiA+ID4gPiBwYWdlLT5y Y3VfaGVhZCB3b3VsZCBvdmVybGF5IHdpdGggcGFnZS0+bHJ1LCByaWdodD8gICAgCj4gPiA+IAo+ ID4gPiBTaWdoLCB5ZXMsIHBhZ2UtPnJjdV9oZWFkIG92ZXJsYXlzIHBhZ2UtPmxydS4gIEJ1dCAo cGxlYXNlIGNvcnJlY3QgbWUgaWYKPiA+ID4gSSdtIHdyb25nKSBJIHRoaW5rIHRoYXQgczM5MCBj b3VsZCB1c2UgZXhhY3RseSB0aGUgc2FtZSB0ZWNobmlxdWUgZm9yCj4gPiA+IGl0cyBsaXN0IG9m IGZyZWUgMksgcGFnZXRhYmxlIGZyYWdtZW50cyBhcyBpdCB1c2VzIGZvciBpdHMgbGlzdCBvZiBU SFAKPiA+ID4gImRlcG9zaXRlZCIgcGFnZXRhYmxlIGZyYWdtZW50cywgb3ZlciBpbiBhcmNoL3Mz OTAvbW0vcGd0YWJsZS5jOiB1c2UKPiA+ID4gdGhlIGZpcnN0IHR3byBsb25ncyBvZiB0aGUgcGFn ZSB0YWJsZSBpdHNlbGYgZm9yIHRocmVhZGluZyB0aGUgbGlzdC4gIAo+ID4gCj4gPiBOaWNlIGlk ZWEsIEkgdGhpbmsgdGhhdCBjb3VsZCBhY3R1YWxseSB3b3JrLCBzaW5jZSB3ZSBvbmx5IG5lZWQg dGhlIGVtcHR5Cj4gPiAySyBoYWx2ZXMgb24gdGhlIGxpc3QuIFNvIGl0IHNob3VsZCBiZSBwb3Nz aWJsZSB0byBzdG9yZSB0aGUgbGlzdF9oZWFkCj4gPiBpbnNpZGUgdGhvc2UuICAKPiAKPiBKYXNv biBxdWlja2x5IHBvaW50ZWQgb3V0IHRoZSBmbGF3IGluIG15IHRoaW5raW5nIHRoZXJlLgoKWWVz LCB3aGlsZSBJIGhhZCB0aGUgcmlnaHQgY29uY2VybnMgYWJvdXQgInRoZSB0by1iZS1mcmVlZCBw YWdldGFibGVzIHdvdWxkCnN0aWxsIGJlIGFjY2Vzc2libGUsIGJ1dCBub3QgcmVhbGx5IHZhbGlk LCBpZiB3ZSBhZGRlZCB0aGVtIGJhY2sgdG8gdGhlIGxpc3QsCndpdGggbGlzdF9oZWFkcyBpbnNp ZGUgdGhlbSIsIHdoZW4gc3VnZ2VzdGluZyB0aGUgYXBwcm9hY2ggdy9vIHBhc3Npbmcgb3Zlcgp0 aGUgbW0sIEkgbWlzc2VkIHRoYXQgd2Ugd291bGQgaGF2ZSB0aGUgdmVyeSBzYW1lIGlzc3VlIGFs cmVhZHkgd2l0aCB0aGUKZXhpc3RpbmcgcGFnZV90YWJsZV9mcmVlX3JjdSgpLgoKVGhhbmtmdWxs eSBKYXNvbiB3YXMgd2F0Y2hpbmcgb3V0IQoKPiAKPiA+ICAgCj4gPiA+IAo+ID4gPiBBbmQgd2hp bGUgaXQgY291bGQgdXNlIHRoaXJkIGFuZCBmb3VydGggbG9uZ3MgaW5zdGVhZCwgSSBkb24ndCBz ZWUgYW55Cj4gPiA+IG5lZWQgZm9yIHRoYXQ6IGEgZGVwb3NpdGVkIHBhZ2V0YWJsZSBoYXMgYmVl biBhbGxvY2F0ZWQsIHNvIHdvdWxkIG5vdAo+ID4gPiBiZSBvbiB0aGUgbGlzdCBvZiBmcmVlIGZy YWdtZW50cy4gIAo+ID4gCj4gPiBDb3JyZWN0LCB0aGF0IHNob3VsZCBub3QgaW50ZXJmZXJlLgo+ ID4gICAKPiA+ID4gCj4gPiA+IEJlbG93IGlzIG9uZSBvZiB0aGUgZ3Jvc3Nlc3QgcGF0Y2hlcyBJ J3ZlIGV2ZXIgcG9zdGVkOiBncm9zcyBiZWNhdXNlCj4gPiA+IGl0J3MgYSBydXNoZWQgYXR0ZW1w dCB0byBzZWUgd2hldGhlciB0aGF0IGlzIHZpYWJsZSwgd2hpbGUgaXQgd291bGQgdGFrZQo+ID4g PiBtZSBsb25nZXIgdG8gdW5kZXJzdGFuZCBhbGwgdGhlIHMzOTAgY2xldmVybmVzcyB0aGVyZSAo ZXZlbiB0aG91Z2ggdGhlCj4gPiA+IFBQIEFBIGNvbW1lbnRhcnkgYWJvdmUgcGFnZV90YWJsZV9h bGxvYygpIGlzIGV4Y2VsbGVudCkuICAKPiA+IAo+ID4gU291bmRzIGZhaXIsIHRoaXMgaXMgYWxz byBvbmUgb2YgdGhlIGdyb3NzZXN0IGNvZGUgd2UgaGF2ZSwgd2hpY2ggaXMgYWxzbwo+ID4gd2h5 IEFsZXhhbmRlciBhZGRlZCB0aGUgY29tbWVudC4gSSBndWVzcyB3ZSBjb3VsZCBuZWVkIGV2ZW4g bW9yZSBjb21tZW50cwo+ID4gaW5zaWRlIHRoZSBjb2RlLCBhcyBpdCBzdGlsbCBjb25mdXNlcyBt ZSBtb3JlIHRoYW4gaXQgc2hvdWxkLgo+ID4gCj4gPiBDb25zaWRlcmluZyB0aGF0LCB5b3UgZGlk IHJlbWFya2FibHkgd2VsbC4gWW91ciBwYXRjaCBzZWVtcyB0byB3b3JrIGZpbmUsCj4gPiBhdCBs ZWFzdCBpdCBzdXJ2aXZlZCBzb21lIExUUCBtbSB0ZXN0cy4gSSB3aWxsIGFsc28gYWRkIGl0IHRv IG91ciBDSSBydW5zLAo+ID4gdG8gZ2l2ZSBpdCBzb21lIG1vcmUgdGVzdGluZy4gV2lsbCByZXBv cnQgdG9tb3Jyb3cgd2hlbiBpdCBicm9rZSBzb21ldGhpbmcuCj4gPiBTZWUgYWxzbyBiZWxvdyBm b3Igc29tZSBwYXRjaCBjb21tZW50cy4gIAo+IAo+IE1hbnkgdGhhbmtzIGZvciB5b3VyIGVmZm9y dCBvbiB0aGlzIHBhdGNoLiAgSSBkb24ndCBleHBlY3QgdGhlIHRlc3RpbmcKPiBvZiBpdCB0byBj YXRjaCBKYXNvbidzIHBvaW50LCB0aGF0IEknbSBjb3JydXB0aW5nIHRoZSBwYWdlIHRhYmxlIHdo aWxlCj4gaXQncyBvbiBpdHMgd2F5IHRocm91Z2ggUkNVIHRvIGJlaW5nIGZyZWVkLCBidXQgaGUn cyByaWdodCBub25ldGhlbGVzcy4KClJpZ2h0LCB0ZXN0cyByYW4gZmluZSwgYnV0IHdlIHdvdWxk IGhhdmUgaW50cm9kdWNlZCBzdWJ0bGUgaXNzdWVzIHdpdGgKcmFjaW5nIGd1cF9mYXN0LCBJIGd1 ZXNzLgoKPiAKPiBJJ2xsIGludGVncmF0ZSB5b3VyIGZpeGVzIGJlbG93IGludG8gd2hhdCBJIGhh dmUgaGVyZSwgYnV0IHByb2JhYmx5Cj4ganVzdCBhcmNoaXZlIGl0IGFzIHNvbWV0aGluZyB0byBy ZWZlciB0byBsYXRlciBpbiBjYXNlIGl0IG1pZ2h0IHBsYXkKPiBhIHBhcnQ7IGJ1dCBwcm9iYWJs eSBpdCB3aWxsIG5vdCAtIHNvcnJ5IGZvciB3YXN0aW5nIHlvdXIgdGltZS4KCk5vIHdvcnJpZXMs IGxvb2tpbmcgYXQgdGhhdCBzMzkwIGNvZGUgY2FuIG5ldmVyIGJlIGFtaXNzLiBJdCBzZWVtcyBJ IG5lZWQKcmVndWxhciByZWZyZXNoLCBhdCBsZWFzdCBJJ20gc3VyZSBJIGFscmVhZHkgdW5kZXJz dG9vZCBpdCBiZXR0ZXIgaW4gdGhlCnBhc3QuCgpBbmQgd2hvIGtub3dzLCB3aXRoIEphc29ucyBy ZWNlbnQgdGhvdWdodHMsIHRoYXQgImxpc3RfaGVhZCBpbnNpZGUKcGFnZXRhYmxlIiBpZGVhIG1p Z2h0IG5vdCBiZSBkZWFkIHlldC4KCj4gCj4gPiAgIAo+ID4gPiAKPiA+ID4gSSdtIGhvcGluZyB0 aGUgdXNlIG9mIHBhZ2UtPmxydSBpbiBhcmNoL3MzOTAvbW0vZ21hcC5jIGlzIGRpc2pvaW50Lgo+ ID4gPiBBbmQgY21tYV9pbml0X25vZGF0KCk/IEFoLCB0aGF0J3MgX19pbml0IHNvIEkgZ3Vlc3Mg ZGlzam9pbnQuICAKPiA+IAo+ID4gY21tYV9pbml0X25vZGF0KCkgc2hvdWxkIGJlIGRpc2pvaW50 LCBub3Qgb25seSBiZWNhdXNlIGl0IGlzIF9faW5pdCwKPiA+IGJ1dCBhbHNvIGJlY2F1c2UgaXQg ZXhwbGljaXRseSBza2lwcyBwYWdldGFibGUgcGFnZXMsIHNvIGl0IHNob3VsZAo+ID4gbmV2ZXIg dG91Y2ggcGFnZS0+bHJ1IG9mIHRob3NlLgo+ID4gCj4gPiBOb3QgdmVyeSBmYW1pbGlhciB3aXRo IHRoZSBnbWFwIGNvZGUsIGl0IGRvZXMgbG9vayBkaXNqb2ludCwgYW5kIHdlIHNob3VsZAo+ID4g YWxzbyB1c2UgY29tcGxldGUgNEsgcGFnZXMgZm9yIHBhZ2V0YWJsZXMgaW5zdGVhZCBvZiAySyBm cmFnbWVudHMgdGhlcmUsCj4gPiBidXQgQ2hyaXN0aWFuIG9yIENsYXVkaW8gc2hvdWxkIGFsc28g aGF2ZSBhIGxvb2suCj4gPiAgIAo+ID4gPiAKPiA+ID4gR2VyYWxkLCBzMzkwIGZvbGs6IHdvdWxk IGl0IGJlIHBvc3NpYmxlIGZvciB5b3UgdG8gZ2l2ZSB0aGlzCj4gPiA+IGEgdHJ5LCBzdWdnZXN0 IGNvcnJlY3Rpb25zIGFuZCBpbXByb3ZlbWVudHMsIGFuZCB0aGVuIEkgY2FuIG1ha2UgaXQKPiA+ ID4gYSBzZXBhcmF0ZSBwYXRjaCBvZiB0aGUgc2VyaWVzOyBhbmQgd29yayBvbiBhdm9pZGluZyBj b25jdXJyZW50IHVzZQo+ID4gPiBvZiB0aGUgcmN1X2hlYWQgYnkgcGFnZXRhYmxlIGZyYWdtZW50 IGJ1ZGRpZXMgKGlkZWFsbHkgZml0IGluIHdpdGgKPiA+ID4gdGhlIHNjaGVtZSBhbHJlYWR5IHRo ZXJlLCBtYXliZSBERCBiaXRzIHRvIGdvIGFsb25nIHdpdGggdGhlIFBQIEFBKS4gIAo+ID4gCj4g PiBJdCBmZWVscyBsaWtlIGl0IGNvdWxkIGJlIHBvc3NpYmxlIHRvIG5vdCBvbmx5IGF2b2lkIHRo ZSBkb3VibGUKPiA+IHJjdV9oZWFkLCBidXQgYWxzbyBhdm9pZCBwYXNzaW5nIG92ZXIgdGhlIG1t IHZpYSBwYWdlLT5wdF9tbS4KPiA+IEkuZS4gaGF2ZSBwdGVfZnJlZV9kZWZlcigpLCB3aGljaCBo YXMgdGhlIG1tLCBkbyBhbGwgdGhlIGNoZWNrcyBhbmQKPiA+IGxpc3QgdXBkYXRlcyB0aGF0IHBh Z2VfdGFibGVfZnJlZSgpIGRvZXMsIGZvciB3aGljaCB3ZSBuZWVkIHRoZSBtbS4KPiA+IFRoZW4g anVzdCBza2lwIHRoZSBwZ3RhYmxlX3B0ZV9wYWdlX2R0b3IoKSArIF9fZnJlZV9wYWdlKCkgYXQg dGhlIGVuZCwKPiA+IGFuZCBkbyBjYWxsX3JjdShwdGVfZnJlZV9ub3cpIGluc3RlYWQuIFRoZSBw dGVfZnJlZV9ub3coKSBjb3VsZCB0aGVuCj4gPiBqdXN0IGRvIF9kdG9yL19fZnJlZV9wYWdlIHNp bWlsYXIgdG8gdGhlIGdlbmVyaWMgdmVyc2lvbi4gIAo+IAo+IEknbSBub3Qgc3VyZTogSSBtaXNz ZWQgeW91ciBzdWdnZXN0aW9uIHRoZXJlIHdoZW4gSSBmaXJzdCBza2ltbWVkCj4gdGhyb3VnaCwg YW5kIHRvZGF5IGhhdmUgc3BlbnQgbW9yZSB0aW1lIGdldHRpbmcgZGVlcGVyIGludG8gaG93IGl0 J3MKPiBkb25lIGF0IHByZXNlbnQuICBJIGFtIG5vdyBmZWVsaW5nIG1vcmUgY29uZmlkZW50IG9m IGEgd2F5IGZvcndhcmQsCj4gYSBuaWNlbHkgaW50ZWdyYXRlZCB3YXkgZm9yd2FyZCwgdGhhbiBJ IHdhcyB5ZXN0ZXJkYXkuCj4gVGhvdWdoIGdldHRpbmcgaXQgcmlnaHQgbWF5IG5vdCBiZSBzbyBl YXN5LgoKSSB0aGluayBteSAiZmVlbGluZyIgd2FzIGEgZMOpasOgIHZ1IG9mIHRoZSBleGlzdGlu ZyBsb2dpYyB0aGF0IHdlIHVzZSBmb3IKcGFnZV90YWJsZV9mcmVlX3JjdSgpIC0+IF9fdGxiX3Jl bW92ZV90YWJsZSgpLCB3aGVyZSB3ZSBhbHNvIGhhdmUgbm8gbW0KYW55IG1vcmUgYXQgdGhlIGVu ZCwgYW5kIHVzZSB0aGUgUFAgYml0cyBtYWdpYyB0byBmaW5kIG91dCBpZiB0aGUgcGFnZQpjYW4g YmUgZnJlZWQsIG9yIGlmIHdlIHN0aWxsIGhhdmUgZnJhZ21lbnRzIGxlZnQuCgpPZiBjb3Vyc2Us IGluIHRoYXQgY2FzZSwgd2UgYWxzbyB3b3VsZCBub3QgbmVlZCB0aGUgbW0gYW55IG1vcmUgZm9y Cmxpc3QgaGFuZGxpbmcsIGFzIHRoZSB0by1iZS1mcmVlZCBmcmFnbWVudHMgd2VyZSBhbHJlYWR5 IHB1dCBiYWNrCm9uIHRoZSBsaXN0LCBidXQgd2l0aCBQUCBiaXRzIHNldCwgdG8gcHJldmVudCBy ZS11c2UuIEFuZCBjbGVhcmluZwp0aG9zZSB3b3VsZCB0aGVuIG1ha2UgdGhlIGZyYWdtZW50IHVz YWJsZSBmcm9tIHRoZSBsaXN0IGFnYWluLgoKSSBndWVzcyB0aGF0IHdvdWxkIGFsc28gYmUgdGhl IG1ham9yIGRpZmZlcmVuY2UgaGVyZSwgaS5lLiB5b3VyIFJDVQpjYWxsLWJhY2sgd291bGQgbmVl ZCB0byBiZSBhYmxlIHRvIGFkZCBmcmFnbWVudHMgYmFjayB0byB0aGUgbGlzdCwKYWZ0ZXIgaGF2 aW5nIHRoZW0gcmVtb3ZlZCBiZWZvcmUgdG8gbWFrZSByb29tIGZvciBwYWdlLT5yY3VfaGVhZCwK YnV0IHdpdGggSmFzb25zIHRob3VnaHRzIHRoYXQgZG9lcyBub3Qgc2VlbSBzbyBpbXBvc3NpYmxl IGFmdGVyIGFsbC4KCkkgZG8gbm90IHlldCB1bmRlcnN0YW5kIGlmIHRoZSBsaXN0X2hlYWQgd291 bGQgdGhlbiBjb21wdWxzb3JpbHkgbmVlZAp0byBiZSBpbnNpZGUgdGhlIHBhZ2V0YWJsZSwgYmVj YXVzZSBwYWdlLT5yY3VfaGVhZC9scnUgc3RpbGwgY2Fubm90IGJlCnVzZWQgKGFnYWluKS4gQnV0 IHlvdSBhbHJlYWR5IGhhdmUgYSBwYXRjaCBmb3IgdGhhdCwgc28gZWl0aGVyIHdheQptaWdodCBi ZSBwb3NzaWJsZS4KCj4gCj4gV2hlbiBKYXNvbiBwb2ludGVkIG91dCB0aGUgZXhpc3RpbmcgUkNV LCBJIGluaXRpYWxseSBob3BlZCB0aGF0IGl0IG1pZ2h0Cj4gYWxyZWFkeSBwcm92aWRlIHRoZSBu ZWNlc3NhcnkgZnJhbWV3b3JrOiBidXQgc2FkbHkgbm90LCBiZWNhdXNlIHRoZQo+IHVuYmF0Y2hl ZCBjYXNlICh1c2VkIHdoZW4gYWRkaXRpb25hbCBtZW1vcnkgaXMgbm90IGF2YWlsYWJsZSkgZG9l cyBub3QKPiB1c2UgUkNVIGF0IGFsbCwgYnV0IGluc3RlYWQgdGhlIHRsYl9yZW1vdmVfdGFibGVf c3luY19vbmUoKSBJUlEgaGFjay4KPiBJZiBJIHVzZWQgdGhhdCwgaXQgd291bGQgY3JpcHBsZSB0 aGUgczM5MCBpbXBsZW1lbnRhdGlvbiB1bmFjY2VwdGFibHkuCj4gCj4gPiAKPiA+IEkgbXVzdCBh ZG1pdCB0aGF0IEkgc3RpbGwgaGF2ZSBubyBnb29kIG92ZXJ2aWV3IG9mIHRoZSAiYmlnIHBpY3R1 cmUiCj4gPiBoZXJlLCBhbmQgZXNwZWNpYWxseSBpZiB0aGlzIGFwcHJvYWNoIHdvdWxkIHN0aWxs IGZpdCBpbi4gUHJvYmFibHkgbm90LAo+ID4gYXMgdGhlIHRvLWJlLWZyZWVkIHBhZ2V0YWJsZXMg d291bGQgc3RpbGwgYmUgYWNjZXNzaWJsZSwgYnV0IG5vdCByZWFsbHkKPiA+IHZhbGlkLCBpZiB3 ZSBhZGRlZCB0aGVtIGJhY2sgdG8gdGhlIGxpc3QsIHdpdGggbGlzdF9oZWFkcyBpbnNpZGUgdGhl bS4KPiA+IFNvIG1heWJlIGNhbGxfcmN1KCkgaGFzIHRvIGJlIGRvbmUgYWx3YXlzLCBhbmQgbm90 IG9ubHkgZm9yIHRoZSBjYXNlCj4gPiB3aGVyZSB0aGUgd2hvbGUgNEsgcGFnZSBiZWNvbWVzIGZy ZWUsIHRoZW4gd2UgcHJvYmFibHkgY2Fubm90IGRvIHcvbwo+ID4gcGFzc2luZyBvdmVyIHRoZSBt bSBmb3IgcHJvcGVyIGxpc3QgaGFuZGxpbmcuICAKPiAKPiBNeSBjdXJyZW50IHRoaW5raW5nIChi dXQgbWF5IGJlIHByb3ZlZCB3cm9uZykgaXMgYWxvbmcgdGhlIGxpbmVzIG9mOgo+IHdoeSBkb2Vz IHNvbWV0aGluZyBvbiBpdHMgd2F5IHRvIGJlaW5nIGZyZWVkIG5lZWQgdG8gYmUgb24gYW55IGxp c3QKPiB0aGFuIHRoZSByY3VfaGVhZCBsaXN0PyAgSSBleHBlY3QgdGhlIGN1cnJlbnQgYW5zd2Vy IGlzLCB0aGF0IHRoZQo+IG90aGVyIGhhbGYgaXMgYWxsb2NhdGVkLCBzbyB0aGUgcGFnZSB3b24n dCBiZSBmcmVlZDsgYnV0IEkgaG9wZSB0aGF0Cj4gd2UgY2FuIHB1dCBpdCBiYWNrIG9uIHRoYXQg bGlzdCBvbmNlIHdlJ3JlIHRocm91Z2ggd2l0aCB0aGUgcmN1X2hlYWQuCgpZZXMsIHRoYXQgbG9v a3MgcHJvbWlzaW5nLiBTdWNoIGEgZnJhZ21lbnQgd291bGQgbm90IG5lY2Vzc2FyaWx5IG5lZWQK dG8gYmUgb24gdGhlIGxpc3QsIGJlY2F1c2Ugd2hpbGUgaXQgaXMgb24gaXRzIHdheSwgaS5lLiBi ZWZvcmUgdGhlClJDVSBjYWxsLWJhY2sgZmluaXNoZWQsIGl0IGNhbm5vdCBiZSByZS11c2VkIGFu eXdheS4KCnBhZ2VfdGFibGVfYWxsb2MoKSBjb3VsZCBjdXJyZW50bHkgZmluZCBzdWNoIGEgZnJh Z21lbnQgb24gdGhlIGxpc3QsIGJ1dApvbmx5IHRvIHNlZSB0aGUgUFAgYml0cyBzZXQsIHNvIGl0 IHdpbGwgbm90IHVzZSBpdC4gT25seSBhZnRlcgpfX3RsYl9yZW1vdmVfdGFibGUoKSBpbiB0aGUg UkNVIGNhbGwtYmFjayByZXNldHMgdGhlIGJpdHMsIGl0IHdvdWxkIGJlCnVzYWJsZSBhZ2Fpbi4K CkluIHlvdXIgY2FzZSwgdGhhdCBjb3VsZCBjb3JyZXNwb25kIHRvIGFkZGluZyBpdCBiYWNrIHRv IHRoZSBsaXN0LgpUaGF0IGNvdWxkIGV2ZW4gYmUgYW4gaW1wcm92ZW1lbnQsIGJlY2F1c2UgcGFn ZV90YWJsZV9hbGxvYygpIHdvdWxkCm5vdCBiZSBib3RoZXJlZCBieSBzdWNoIHVudXNhYmxlIGZy YWdtZW50cy4KClsuLi5dCj4gCj4gSXMgaXQgdG9vIGVhcmx5IHRvIHdpc2ggeW91IGEgaGFwcHkg cmV2ZXJzZSBYbWFzPwoKTmljZSBpZGVhLCB3ZSBzaG91bGQgbWFrZSBKdW5lIDI0dGggdGhlIHJl dmVyc2UgWG1hcyBSZW1lbWJyYW5jZSBEYXkgOi0pCgpfX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxpbmcgbGlzdApsaW51 eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3RzLmluZnJhZGVhZC5v cmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg==