From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934762AbbCDHCN (ORCPT ); Wed, 4 Mar 2015 02:02:13 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:3960 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933252AbbCDHCI (ORCPT ); Wed, 4 Mar 2015 02:02:08 -0500 Message-ID: <54F6ADD2.3080403@huawei.com> Date: Wed, 4 Mar 2015 15:01:38 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F681A7.4050203@cn.fujitsu.com> In-Reply-To: <54F681A7.4050203@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. > But anyway, the bug is real, we must fix it. next_zone() pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL. ... pgdat = next_online_pgdat(pgdat); int nid = next_online_node(pgdat->node_id); // so here is the null pointer access Thanks for your new patch, I'll test it. Thanks, Xishi Qiu