From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:53420)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <harsh@linux.vnet.ibm.com>) id 1RnSxr-0006jj-UR
	for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:42:05 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <harsh@linux.vnet.ibm.com>) id 1RnSxn-0000dt-Nk
	for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:41:59 -0500
Received: from e23smtp05.au.ibm.com ([202.81.31.147]:52300)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <harsh@linux.vnet.ibm.com>) id 1RnSxn-0000di-1U
	for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:41:55 -0500
Received: from /spool/local
	by e23smtp05.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <harsh@linux.vnet.ibm.com>;
	Wed, 18 Jan 2012 10:38:36 +1000
Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	q0IAfQbq5165296
	for <qemu-devel@nongnu.org>; Wed, 18 Jan 2012 21:41:33 +1100
Received: from d23av03.au.ibm.com (loopback [127.0.0.1])
	by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	q0IAfPDj022899
	for <qemu-devel@nongnu.org>; Wed, 18 Jan 2012 21:41:26 +1100
Message-ID: <4F16A1D3.7070205@linux.vnet.ibm.com>
Date: Wed, 18 Jan 2012 16:11:23 +0530
From: Harsh Bora <harsh@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1326193179-19677-1-git-send-email-harsh@linux.vnet.ibm.com>
	<1326193179-19677-3-git-send-email-harsh@linux.vnet.ibm.com>
	<CAJSP0QWVSENstH8HtBhcAf43Wr2PuTrpnf7P7ZWodE8t12uUFA@mail.gmail.com>
	<4F168D80.8000707@linux.vnet.ibm.com>
	<CAJSP0QWi8e8kzJDMKGGnVdo7b8F_GTtR64OBH5wJDAX7nFgB3A@mail.gmail.com>
In-Reply-To: <CAJSP0QWi8e8kzJDMKGGnVdo7b8F_GTtR64OBH5wJDAX7nFgB3A@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v3 2/3] simpletrace-v2: Handle variable
 number/size of elements per trace record.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org, aneesh.kumar@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com, vilanova@ac.upc.edu

On 01/18/2012 04:01 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 18, 2012 at 9:14 AM, Harsh Bora<harsh@linux.vnet.ibm.com>  wrote:
>> On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote:
>>>> +            unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp);
>>>>              writeout_idx += num_available;
>>>>          }
>>>>
>>>>          idx = writeout_idx % TRACE_BUF_LEN;
>>>> -        while (get_trace_record(idx,&record)) {
>>>> -            trace_buf[idx].event = 0; /* clear valid bit */
>>>> -            unused = fwrite(&record, sizeof(record), 1, trace_fp);
>>>> -            idx = ++writeout_idx % TRACE_BUF_LEN;
>>>> +        while (get_trace_record(idx,&recordptr)) {
>>>> +            unused = fwrite(recordptr, recordptr->length, 1, trace_fp);
>>>> +            writeout_idx += recordptr->length;
>>>> +            g_free(recordptr);
>>>> +            recordptr = (TraceRecord *)&trace_buf[idx];
>>>>
>>>> +            recordptr->event = 0;
>>>> +            idx = writeout_idx % TRACE_BUF_LEN;
>>>>          }
>>>
>>>
>>> I'm wondering if it's worth using a different approach here.  Writing
>>> out individual records has bothered me.
>>>
>>> If we have a second buffer, as big as trace_buf[], then a function can
>>> copy out all records and make them available in trace_buf again:
>>>
>>> /**
>>>   * Copy completed trace records out of the ring buffer
>>>   *
>>>   * @idx    offset into trace_buf[]
>>>   * @buf    destination buffer
>>>   * @len    size of destination buffer
>>>   * @ret    the number of bytes consumed
>>>   */
>>> size_t consume_trace_records(unsigned int idx, void *buf, size_t len);
>>>
>>> That means consume gobbles up as many records as it can:
>>>   * Until it reaches an invalid record which has not been written yet
>>>   * Until it reaches the end of trace_buf[], the caller can call again
>>> with idx wrapped to 0
>>>
>>> After copying into buf[] it clears the valid bit in trace_buf[].
>>>
>>> Then the loop which calls consume_trace_records() does a single
>>> fwrite(3) and increments idx/writeout_idx.
>>>
>>> The advantage to this is that we write many records in one go and I
>>> think it splits up the writeout steps a little nicer than what we've
>>> previously done.
>>>
>>
>> I think this optimization can be introduced as a separate patch later.
>> Let me know if you think otherwise.
>
> Yes, that could be done later.  However there is something incorrect
> here.  Threads will continue to write trace records into the
> ringbuffer while the write-out thread is doing I/O.  Think about what
> happens when threads overtake the write-out index modulo ringbuffer
> size.  Since records are variable-length the write-out thread's next
> index could point into the middle of an overwritten record.  And that
> means the ->length field is junk - we may crash if we use it.

In case of overwritten records, the valid bit of event id will also be 
overwritten, and therefore we will not consider that record. Moreover, 
the writeout thread will further get to know that some events were 
dropped and will start with the latest trace_idx, right ?

However, to handle the extreme rare case of having an overwritten value 
such that its valid bit appears to be set, we can put a check for <
NR_TRACE_EVENTS. Even better to have a magic byte for events also ?

Harsh

>
>>>>
>>>>          fflush(trace_fp);
>>>> @@ -231,7 +196,7 @@ void st_set_trace_file_enabled(bool enable)
>>>>          static const TraceRecord header = {
>>>>              .event = HEADER_EVENT_ID,
>>>>              .timestamp_ns = HEADER_MAGIC,
>>>> -            .x1 = HEADER_VERSION,
>>>> +            .length = HEADER_VERSION,
>>>
>>>
>>> Hmm...this is kind of ugly (see comment about using .length above) but
>>> in this case most parsers will have a special-case anyway to check the
>>> magic number.  We need to use the .length field because historically
>>> that's where the version is located.
>>>
>>
>> So, lets keep the version here only, right ?
>
> Yes, it's necessary to do .length = HEADER_VERSION.
>
> Stefan
>