From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752723AbcAVISw (ORCPT ); Fri, 22 Jan 2016 03:18:52 -0500 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:60276 "EHLO e06smtp16.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751818AbcAVISm (ORCPT ); Fri, 22 Jan 2016 03:18:42 -0500 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: borntraeger@de.ibm.com X-IBM-RcptTo: cgroups@vger.kernel.org;kvm@vger.kernel.org;linux-kernel@vger.kernel.org;linux-s390@vger.kernel.org Subject: Re: [PATCH 1/2] cgroup: make sure a parent css isn't offlined before its children To: Tejun Heo , Peter Zijlstra References: <56978452.6010606@de.ibm.com> <20160114195630.GA3520@mtj.duckdns.org> <5698A023.9070703@de.ibm.com> <20160121203111.GF5157@mtj.duckdns.org> <20160121212416.GL6357@twins.programming.kicks-ass.net> <20160121212812.GJ5157@mtj.duckdns.org> Cc: linux-kernel@vger.kernel.org, linux-s390 , KVM list , Oleg Nesterov , "Paul E. McKenney" , Li Zefan , Johannes Weiner , cgroups@vger.kernel.org, kernel-team@fb.com From: Christian Borntraeger Message-ID: <56A1E5DB.1010201@de.ibm.com> Date: Fri, 22 Jan 2016 09:18:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160121212812.GJ5157@mtj.duckdns.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16012208-0025-0000-0000-000008B06FD1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/21/2016 10:28 PM, Tejun Heo wrote: > On Thu, Jan 21, 2016 at 10:24:16PM +0100, Peter Zijlstra wrote: >> On Thu, Jan 21, 2016 at 03:31:11PM -0500, Tejun Heo wrote: >>> There are three subsystem callbacks in css shutdown path - >>> css_offline(), css_released() and css_free(). Except for >>> css_released(), cgroup core didn't use to guarantee the order of >>> invocation. css_offline() or css_free() could be called on a parent >>> css before its children. This behavior is unexpected and led to >>> use-after-free in cpu controller. >>> >>> This patch updates offline path so that a parent css is never offlined >>> before its children. Each css keeps online_cnt which reaches zero iff >>> itself and all its children are offline and offline_css() is invoked >>> only after online_cnt reaches zero. >>> >>> This fixes the reported cpu controller malfunction. The next patch >>> will update css_free() handling. >> >> No, I need to fix the cpu controller too, because the offending code >> sits off of css_free() (the next patch), but also does a call_rcu() in >> between, which also doesn't guarantee order. > > Ah, I see. Christian, can you please apply all three patches and see > whether the problem gets fixed? Once verified, I'll update the patch > description and repost. With these 3 patches I always run into the dio/scsi problem, but never in the css issue. So I cannot test a full day or so, but it looks like the problem is gone. At least it worked multiple times for 30minutes or so until my system was killed by the io issue. Tested-by: Christian Borntraeger