From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [PATCH] bfq: Fix use-after-free with cgroups Date: Mon, 13 Dec 2021 15:52:31 +0100 Message-ID: <20211213145231.GD14044@quack2.suse.cz> References: <20211201133439.3309-1-jack@suse.cz> <20211207190843.GA40898@blackbody.suse.cz> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1639407151; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=43M2it8+IRN7ztvN3MVNcuqsPzC82xu3mXqH42vHp/w=; b=Bha888jUF49BJg0Yx/Q2f/dGOpxdG07cUjW1uJEJAbPnAhE5vYIuAMYgFDrCLxrUFVgn5k 2m0DLn6v0cYFG+CBwFFnAxOnkQ5BkSuRMuZHfgymNj2+XcGIGBpiWwQlZOHOeSh6bjzYjb mioZYYjS/7OFMlp6W0oRsML6QT48u5Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1639407151; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=43M2it8+IRN7ztvN3MVNcuqsPzC82xu3mXqH42vHp/w=; b=7a+4nkwTbIc+ynCyI7UiIhiu18ei2lYqSvxWypfGadh8I+0X1w4tkbpz4vsa0cVIkLIYuT jzUxR55YuV4L5uBg== Content-Disposition: inline In-Reply-To: <20211207190843.GA40898@blackbody.suse.cz> List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Michal =?iso-8859-1?Q?Koutn=FD?= Cc: Jan Kara , Paolo Valente , Jens Axboe , linux-block@vger.kernel.org, fvogt@suse.de, Tejun Heo , cgroups@vger.kernel.org, stable@vger.kernel.org, Fabian Vogt On Tue 07-12-21 20:08:43, Michal Koutn=FD wrote: > On Wed, Dec 01, 2021 at 02:34:39PM +0100, Jan Kara wrote: > > After some analysis we've found out that the culprit of the problem is > > that some task is reparented from cgroup G to the root cgroup and G is > > offlined. >=20 > Just sharing my interpretation for context -- (I saw this was a system > using the unified cgroup hierarchy, io_cgrp_subsys_on_dfl_key was > enabled) and what was observed could also have been disabling the io > controller on given level -- that would also manifest similarly -- the > task is migrated to parent and the former blkcg is offlined. Yes, that's another possibility. > > +static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_gr= oup *bfqg) > > [...] > > - bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); > > [...] > > + hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_node) > > + bfq_bfqq_move(bfqd, bfqq, bfqd->root_group); >=20 > Here I assume root_group is (representing) the global blkcg root and > this reparenting thus skips all ancestors between the removed leaf and > the root. IIUC the associated io_context would then be treated as if it > was running in the root blkcg. > (Admittedly, this isn't a change from this patch but it may cause some > surprises if the given process runs after the operation.) Yes, this is what happens in bfq_reparent_children() and basically preserves what BFQ was already doing for a subset of bfq queues. > Reparenting to the immediate ancestors should be safe as cgroup core > should ensure children are offlined before parents. Would it make sense > to you? I suppose yes, it makes more sense to reparent just to immediate parents instead of the root of the blkcg hierarchy. Initially when developing the patch I was not sure whether parent has to be still alive but as you write it should be safe. I'll modify the patch to: static void bfq_reparent_children(struct bfq_data *bfqd, struct bfq_group *= bfqg) { struct bfq_queue *bfqq; struct hlist_node *next; struct bfq_group *parent; parent =3D bfqg_parent(bfqg); if (!parent) parent =3D bfqd->root_group; hlist_for_each_entry_safe(bfqq, next, &bfqg->children, children_nod= e) bfq_bfqq_move(bfqd, bfqq, parent); } =20 > > @@ -897,38 +844,17 @@ static void bfq_pd_offline(struct blkg_policy_dat= a *pd) > > [...] > > - * It may happen that some queues are still active > > - * (busy) upon group destruction (if the corresponding > > - * processes have been forced to terminate). We move > > - * all the leaf entities corresponding to these queues > > - * to the root_group. >=20 > This comment is removed but it seems to me it assumed that the > reparented entities are only some transitional remainings of terminated > tasks but they may be the processes migrated upwards with a long (IO > active) life ahead. Yes, this seemed to be a misconception of the old code... Honza --=20 Jan Kara SUSE Labs, CR