From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EDE7C4338F for ; Wed, 28 Jul 2021 17:05:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22AFF61037 for ; Wed, 28 Jul 2021 17:05:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229977AbhG1RFX (ORCPT ); Wed, 28 Jul 2021 13:05:23 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:56016 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229515AbhG1RFW (ORCPT ); Wed, 28 Jul 2021 13:05:22 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]:41318) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1m8mzc-009twJ-Vr; Wed, 28 Jul 2021 11:05:17 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95]:42296 helo=email.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1m8mzb-00AVbT-SU; Wed, 28 Jul 2021 11:05:16 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Alexey Gladkov Cc: Hillf Danton , LKML , syzbot+01985d7909f9468f013c@syzkaller.appspotmail.com, syzbot+59dd63761094a80ad06d@syzkaller.appspotmail.com, syzbot+6cd79f45bb8fa1c9eeae@syzkaller.appspotmail.com, syzbot+b6e65bd125a05f803d6b@syzkaller.appspotmail.com In-Reply-To: <20210728122448.lh2e3nr4txhsmcwt@example.org> (Alexey Gladkov's message of "Wed, 28 Jul 2021 14:24:48 +0200") References: <000000000000efe97f05c74bb995@google.com> <20210728025837.1641-1-hdanton@sina.com> <20210728122448.lh2e3nr4txhsmcwt@example.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Date: Wed, 28 Jul 2021 12:05:08 -0500 Message-ID: <87im0ucqt7.fsf@disp2133> MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1m8mzb-00AVbT-SU;;;mid=<87im0ucqt7.fsf@disp2133>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX196ZsBb5LCytZ7s0FvmpIS4CCZiP6zUHLA= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v1] ucounts: Fix race condition between alloc_ucounts and put_ucounts X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alexey Gladkov writes: > On Wed, Jul 28, 2021 at 10:58:37AM +0800, Hillf Danton wrote: >> On Tue, 27 Jul 2021 17:24:18 +0200 Alexey Gladkov wrote: >> > +++ b/kernel/ucount.c >> > @@ -160,6 +160,7 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) >> > { >> > struct hlist_head *hashent = ucounts_hashentry(ns, uid); >> > struct ucounts *ucounts, *new; >> > + long overflow; >> > >> > spin_lock_irq(&ucounts_lock); >> > ucounts = find_ucounts(ns, uid, hashent); >> > @@ -184,8 +185,12 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid) >> > return new; >> > } >> > } >> > + overflow = atomic_add_negative(1, &ucounts->count); >> > spin_unlock_irq(&ucounts_lock); >> > - ucounts = get_ucounts(ucounts); >> > + if (overflow) { >> > + put_ucounts(ucounts); >> >> Given if (atomic_add_unless(atomic, -1, 1)) >> return 0; >> >> put can not help roll back overflow. > > In case of overflow, we don't try to rollback overflow. We return an > error. Unfortunately I don't see the email with the original comment, but let me see if I can clarify a little. The code in get_ucounts explicitly uses atomic_add_negative as a performance optimization. Which means just by testing the negative status of the count it is easy to tell if the count is larger than is supported. Where this matters is that atomic_add_negative can be cheaper than cmpxchg. Which means it is faster to reserve all of the negative numbers to catch the case where the counter grows too large, then to precisely bound the count at a specific cut off. This particular code path can not use atomic_add_unless(.., -1,...) get_ucounts may have already hit the limit so it may be a negative value other than -1. >> BTW can you specify a bit on the real workloads with the risk of count overflow? One place where I think it is possible to reach a count of 2^31 is to set the rlimit for pending signals to unlimited and post a bunch of realtime signals to a process which simply does not read them. As pointed out in Alex's link below this code notices when the maximum count is reached and fails gracefully unlike refcount_t which would leak memory. The point is to handle unrealistic workloads gracefully from a reference counting perspective. If real workloads start reaching the maximum count something probably needs to change. (larger counts or changing what gets counted). > For example, one user has too many processes in one namespace. > > It is necessary to check and handle the possibility of counter overflow > in this case. Linus described it here: > > https://lore.kernel.org/lkml/CAHk-%3dwjYOCgM%2bmKzwTZwkDDg12DdYjFFkmoFKYLim7NFmR9HBg@mail.gmail.com/ > >> > + return NULL; >> > + } >> > return ucounts; >> > } >> > >> > @@ -193,8 +198,7 @@ void put_ucounts(struct ucounts *ucounts) >> > { >> > unsigned long flags; >> > >> > - if (atomic_dec_and_test(&ucounts->count)) { >> > - spin_lock_irqsave(&ucounts_lock, flags); >> > + if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { >> > hlist_del_init(&ucounts->node); >> > spin_unlock_irqrestore(&ucounts_lock, flags); >> > kfree(ucounts); >> > -- >> > 2.29.3 Eric