From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754665AbcAMEex (ORCPT <rfc822;w@1wt.eu>);
	Tue, 12 Jan 2016 23:34:53 -0500
Received: from szxga03-in.huawei.com ([119.145.14.66]:36491 "EHLO
	szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753474AbcAMEev (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 12 Jan 2016 23:34:51 -0500
Message-ID: <5695D3CB.3030604@huawei.com>
Date: Wed, 13 Jan 2016 12:34:19 +0800
From: "Wangnan (F)" <wangnan0@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
MIME-Version: 1.0
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: <acme@kernel.org>, <linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
        <lizefan@huawei.com>, <netdev@vger.kernel.org>, <davem@davemloft.net>,
        Adrian Hunter <adrian.hunter@intel.com>,
        Arnaldo Carvalho de Melo <acme@redhat.com>,
        David Ahern <dsahern@gmail.com>, Ingo Molnar <mingo@kernel.org>,
        Yunlong Song <yunlong.song@huawei.com>
Subject: Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it
 by PERF_SAMPLE_TAILSIZE
References: <1452520124-2073-1-git-send-email-wangnan0@huawei.com> <1452520124-2073-28-git-send-email-wangnan0@huawei.com> <20160111180913.GA25950@ast-mbp.thefacebook.com> <56949028.2070208@huawei.com> <20160112061145.GA31444@ast-mbp.thefacebook.com> <5694F347.5010700@huawei.com> <20160112195641.GA34601@ast-mbp.thefacebook.com>
In-Reply-To: <20160112195641.GA34601@ast-mbp.thefacebook.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.111.66.109]
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0),
	refid=str=0001.0A020202.5695D3E7.0069,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0,
	ip=0.0.0.0,
	so=2013-05-26 15:14:31,
	dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: a13b990821fc0912cfcb5ed5e0d67e2e
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 2016/1/13 3:56, Alexei Starovoitov wrote:
> On Tue, Jan 12, 2016 at 08:36:23PM +0800, Wangnan (F) wrote:
>>> hmm, in this kernel patch I see that you're adding 8 bytes for
>>> every record via this extra TAILSISZE flag and in perf you're
>>> walking the ring buffer backwards by reading this 8 byte
>>> sizes, comparing header sizes and so on until reaching beginning,
>>> where you start dumping it as normal.
>>> So for this 'signal to perf' approach to work the ring buffer
>>> will contain tailsizes everywhere just so that user space can
>>> find the beginning. That's not very pretty. imo if kernel
>>> can do header read to adjust data_tail it would make user
>>> space side clean. May be there are other solutions.
>>> Adding tailsize seems like brute force hack.
>>> There must be some nicer way.
>> Hi Peter,
>>
>>   What's your opinion? Should we reconsider moving size field from header the
>> end?
>> Or moving whole header to the end of a record?
> I think moving the whole header under new TAILHEADER flag is
> actually very good idea. The ring buffer will be fully utilized
> and no extra bytes necessary. User space would need to parse it
> backwards, but for this use case it fits well.

I have another crazy suggestion: can we make kernel writing to
the ring buffer from the end to the beginning? For example:

This is the initial state of the ring buffer, head pointer
pointes to the end of it:

       -------------> Address increase

                                     head
                                       |
                                       V
  +--+---+-------+----------+------+---+
  |                                    |
  +--+---+-------+----------+------+---+


Write the first event at the end of the ring buffer, and *decrease*
the head pointer:

                                 head
                                   |
                                   V
  +--+---+-------+----------+------+---+
  |                                | A |
  +--+---+-------+----------+------+---+


Another record:
                           head
                            |
                            V
  +--+---+-------+----------+------+---+
  |                         |   B  | A |
  +--+---+-------+----------+------+---+


Ring buffer rewind, A is fully overwritten and B is broken:

                                head
                                  |
                                  V
  +--+---+-------+----------+-----+----+
  |F | E |   D   | C        | ... | F  |
  +--+---+-------+----------+-----+----+

At this time user can parse the ring buffer normally from
F to C. From timestamp in it he know which one is the
oldest.

By this perf don't need too much extra work to do. There's no
performance penalty at all, and the 8 bytes are saved.

Thought?

Thank you.