From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pat Erley Subject: Re: BUG: non-zero nr_pmds on freeing mm: 1 Date: Mon, 09 Feb 2015 11:45:25 -0600 Message-ID: <54D8F235.7040105@erley.org> References: <20150209164248.GA29522@node.dhcp.inet.fi> <20150209171320.GB29522@node.dhcp.inet.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from erley.org ([97.107.129.9]:47043 "EHLO remote.erley.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760942AbbBIRqH (ORCPT ); Mon, 9 Feb 2015 12:46:07 -0500 In-Reply-To: <20150209171320.GB29522@node.dhcp.inet.fi> Sender: linux-next-owner@vger.kernel.org List-ID: To: "Kirill A. Shutemov" , Sedat Dilek Cc: Linux-Next , kirill.shutemov@linux.intel.com, linux-mm , Johannes Weiner , Michal Hocko , Andrew Morton On 02/09/2015 11:13 AM, Kirill A. Shutemov wrote: > On Mon, Feb 09, 2015 at 06:06:11PM +0100, Sedat Dilek wrote: >> On Mon, Feb 9, 2015 at 5:42 PM, Kirill A. Shutemov wrote: >>> On Sat, Feb 07, 2015 at 08:33:02AM +0100, Sedat Dilek wrote: >>>> On Sat, Feb 7, 2015 at 6:12 AM, Pat Erley wrote: >>>>> I'm seeing the message in $subject on my Xen DOM0 on next-20150204 on >>>>> x86_64. I haven't had time to bisect it, but have seen some discussion on >>>>> similar topics here recently. I can trigger this pretty reliably by >>>>> watching Netflix. At some point (minutes to hours) into it, the netflix >>>>> video goes black (audio keeps going, so it still thinks it's working) and >>>>> the error appears in dmesg. Refreshing the page gets the video going again, >>>>> and it will continue playing for some indeterminate amount of time. >>>>> >>>>> Kirill, I've CC'd you as looking in the logs, you've patched a false >>>>> positive trigger of this very recently(patch in kernel I'm running). Am I >>>>> actually hitting a problem, or is this another false positive case? Any >>>>> additional details that might help? >>>>> >>>>> Dmesg from system attached. >>>> >>>> [ CC some mm folks ] >>>> >>>> I have seen this, too. >>>> >>>> root# grep "BUG: non-zero nr_pmds on freeing mm:" /var/log/kern.log | wc -l >>>> 21 >>>> >>>> Checking my logs: On next-20150203 and next-20150204. >>>> >>>> I am here not in a VM environment and cannot say what causes these messages. >>> >>> Sorry, my fault. >>> >>> The patch below should fix that. >>> >>> From 11bce596e653302e41f819435912f01ca8cbc27e Mon Sep 17 00:00:00 2001 >>> From: "Kirill A. Shutemov" >>> Date: Mon, 9 Feb 2015 18:34:56 +0200 >>> Subject: [PATCH] mm: fix race on pmd accounting >>> >>> Do not account the pmd table to the process if other thread allocated it >>> under us. >>> >>> Signed-off-by: Kirill A. Shutemov >>> Reported-by: Sedat Dilek >> >> Still building with the fix... >> >> Please feel free to add Pat as a reporter. >> >> Reported-by: Pat Erley >> >> Is that fixing...? >> >> commit daa1b0f29cdccae269123e7f8ae0348dbafdc3a7 >> "mm: account pmd page tables to the process" >> >> If yes, please add a Fixes-tag [2]... >> >> Fixes: daa1b0f29cdc ("mm: account pmd page tables to the process") >> >> I will re-test with LTP/mmap and report. > > The commit is not in Linus tree, so the sha1-id is goinging to change. > I won't be able to test for at least 6 hours (more likely closer to 8 as I have to get home, boot the machine, apply patch, compile, reboot, and wait). So not likely I'll be able to get a 'tested-by' on this one without holding up the whole flow of the patch. Thanks for the prompt fix Kirill!