From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755022AbcEXXaD (ORCPT <rfc822;w@1wt.eu>);
	Tue, 24 May 2016 19:30:03 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58037 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752005AbcEXXaB (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 24 May 2016 19:30:01 -0400
Date: Wed, 25 May 2016 01:29:58 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andrei Vagin <avagin@gmail.com>, LKML <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>, Andy Lutomirski <luto@kernel.org>,
        Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: x86: A process doesn't stop on hw breakpoints sometimes
Message-ID: <20160524232958.GA14477@redhat.com>
References: <CANaxB-xwu3pJmRDHDwRAug4Hz_XF9GSkOhZn-FCfsk_BiZL-xg@mail.gmail.com>
 <CALCETrWsO3h-trontU121d-2YnpJ=7aqYDngzr4PpAeFfC14Sw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrWsO3h-trontU121d-2YnpJ=7aqYDngzr4PpAeFfC14Sw@mail.gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 24 May 2016 23:30:00 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 05/23, Andy Lutomirski wrote:
>
> I'm guessing you're either hitting a subtle bug in the mess that is
> breakpoint handling or you're hitting a bug in perf's context switch
> code.

yes, same feeling...

> Given that the breakpoint gets missed many times in a row,

yes, the child specially tries to hit the same bp again and again,

> this is
> presumably either a bug in breakpoint programming (i.e. the thing
> isn't actually set in dr0/dr7) or a bug in the bp state tracking.

or some buf in perf_sched_in(). In fact this is what I think now, but
I can be wrong.

> If
> it were a bug in RF flag handling, I'd expect it to skip once and trip
> the second time through.

Exactly.

It would be nice to ensure that this problem has actually gone, and how.

So, Andrei, if you have any motivation, we can continue. The next step
needs a simple kernel patch or kernel module which allows to read dr0/dr7
and print these registers in the "fail" loop.

Oleg.