From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:42875) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnT8b-00018k-1L for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:53:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RnT8U-0003GK-U9 for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:53:05 -0500 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:50622) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnT8U-0003Fw-0V for qemu-devel@nongnu.org; Wed, 18 Jan 2012 05:52:58 -0500 Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 18 Jan 2012 16:22:53 +0530 Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q0IAqW7v3694718 for ; Wed, 18 Jan 2012 16:22:33 +0530 Received: from d28av05.in.ibm.com (loopback [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q0IAqVou017002 for ; Wed, 18 Jan 2012 21:52:31 +1100 Message-ID: <4F16A46F.8070708@linux.vnet.ibm.com> Date: Wed, 18 Jan 2012 16:22:31 +0530 From: Harsh Bora MIME-Version: 1.0 References: <1326193179-19677-1-git-send-email-harsh@linux.vnet.ibm.com> <1326193179-19677-3-git-send-email-harsh@linux.vnet.ibm.com> <4F168D80.8000707@linux.vnet.ibm.com> <4F16A1D3.7070205@linux.vnet.ibm.com> In-Reply-To: <4F16A1D3.7070205@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v3 2/3] simpletrace-v2: Handle variable number/size of elements per trace record. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: vilanova@ac.upc.edu, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com, aneesh.kumar@linux.vnet.ibm.com On 01/18/2012 04:11 PM, Harsh Bora wrote: > On 01/18/2012 04:01 PM, Stefan Hajnoczi wrote: >> On Wed, Jan 18, 2012 at 9:14 AM, Harsh Bora >> wrote: >>> On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote: >>>>> + unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp); >>>>> writeout_idx += num_available; >>>>> } >>>>> >>>>> idx = writeout_idx % TRACE_BUF_LEN; >>>>> - while (get_trace_record(idx,&record)) { >>>>> - trace_buf[idx].event = 0; /* clear valid bit */ >>>>> - unused = fwrite(&record, sizeof(record), 1, trace_fp); >>>>> - idx = ++writeout_idx % TRACE_BUF_LEN; >>>>> + while (get_trace_record(idx,&recordptr)) { >>>>> + unused = fwrite(recordptr, recordptr->length, 1, trace_fp); >>>>> + writeout_idx += recordptr->length; >>>>> + g_free(recordptr); >>>>> + recordptr = (TraceRecord *)&trace_buf[idx]; >>>>> >>>>> + recordptr->event = 0; >>>>> + idx = writeout_idx % TRACE_BUF_LEN; >>>>> } >>>> >>>> >>>> I'm wondering if it's worth using a different approach here. Writing >>>> out individual records has bothered me. >>>> >>>> If we have a second buffer, as big as trace_buf[], then a function can >>>> copy out all records and make them available in trace_buf again: >>>> >>>> /** >>>> * Copy completed trace records out of the ring buffer >>>> * >>>> * @idx offset into trace_buf[] >>>> * @buf destination buffer >>>> * @len size of destination buffer >>>> * @ret the number of bytes consumed >>>> */ >>>> size_t consume_trace_records(unsigned int idx, void *buf, size_t len); >>>> >>>> That means consume gobbles up as many records as it can: >>>> * Until it reaches an invalid record which has not been written yet >>>> * Until it reaches the end of trace_buf[], the caller can call again >>>> with idx wrapped to 0 >>>> >>>> After copying into buf[] it clears the valid bit in trace_buf[]. >>>> >>>> Then the loop which calls consume_trace_records() does a single >>>> fwrite(3) and increments idx/writeout_idx. >>>> >>>> The advantage to this is that we write many records in one go and I >>>> think it splits up the writeout steps a little nicer than what we've >>>> previously done. >>>> >>> >>> I think this optimization can be introduced as a separate patch later. >>> Let me know if you think otherwise. >> >> Yes, that could be done later. However there is something incorrect >> here. Threads will continue to write trace records into the >> ringbuffer while the write-out thread is doing I/O. Think about what >> happens when threads overtake the write-out index modulo ringbuffer >> size. Since records are variable-length the write-out thread's next >> index could point into the middle of an overwritten record. And that >> means the ->length field is junk - we may crash if we use it. > > In case of overwritten records, the valid bit of event id will also be > overwritten, and therefore we will not consider that record. Moreover, > the writeout thread will further get to know that some events were > dropped and will start with the latest trace_idx, right ? > > However, to handle the extreme rare case of having an overwritten value > such that its valid bit appears to be set, we can put a check for < > NR_TRACE_EVENTS. Even better to have a magic byte for events also ? > > Harsh Also, I just realized that we need to put buffer boundary checks while copying a trace record: >> -static bool get_trace_record(unsigned int idx, TraceRecord *record) >> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr) >> { >> - if (!(trace_buf[idx].event& TRACE_RECORD_VALID)) { >> + TraceRecord *record = (TraceRecord *)&trace_buf[idx]; >> + if (!(record->event& TRACE_RECORD_VALID)) { >> return false; >> } >> >> __sync_synchronize(); /* read memory barrier before accessing record */ >> >> - *record = trace_buf[idx]; >> - record->event&= ~TRACE_RECORD_VALID; >> + *recordptr = g_malloc(record->length); >> + /* make a copy of record to avoid being overwritten */ >> + memcpy(*recordptr, record, record->length); So, I need to use a wrapper which should copy byte by byte taking care of boundary overlaps, and that will also save us from crashing (as we will never be crossing buffer boundaries), right ? Harsh >> + (*recordptr)->event&= ~TRACE_RECORD_VALID; >> return true; >> } > >> >>>>> >>>>> fflush(trace_fp); >>>>> @@ -231,7 +196,7 @@ void st_set_trace_file_enabled(bool enable) >>>>> static const TraceRecord header = { >>>>> .event = HEADER_EVENT_ID, >>>>> .timestamp_ns = HEADER_MAGIC, >>>>> - .x1 = HEADER_VERSION, >>>>> + .length = HEADER_VERSION, >>>> >>>> >>>> Hmm...this is kind of ugly (see comment about using .length above) but >>>> in this case most parsers will have a special-case anyway to check the >>>> magic number. We need to use the .length field because historically >>>> that's where the version is located. >>>> >>> >>> So, lets keep the version here only, right ? >> >> Yes, it's necessary to do .length = HEADER_VERSION. >> >> Stefan >> > >