From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:53452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnTOg-0006dg-Cu for qemu-devel@nongnu.org; Wed, 18 Jan 2012 06:09:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RnTOR-0007U4-0x for qemu-devel@nongnu.org; Wed, 18 Jan 2012 06:09:42 -0500 Received: from e28smtp05.in.ibm.com ([122.248.162.5]:37435) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnTOQ-0007T6-4y for qemu-devel@nongnu.org; Wed, 18 Jan 2012 06:09:26 -0500 Received: from /spool/local by e28smtp05.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 18 Jan 2012 16:39:21 +0530 Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q0IB9HVd4292760 for ; Wed, 18 Jan 2012 16:39:17 +0530 Received: from d28av03.in.ibm.com (loopback [127.0.0.1]) by d28av03.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q0IB9GD7004714 for ; Wed, 18 Jan 2012 22:09:17 +1100 Message-ID: <4F16A85C.8070301@linux.vnet.ibm.com> Date: Wed, 18 Jan 2012 16:39:16 +0530 From: Harsh Bora MIME-Version: 1.0 References: <1326193179-19677-1-git-send-email-harsh@linux.vnet.ibm.com> <1326193179-19677-3-git-send-email-harsh@linux.vnet.ibm.com> <4F168D80.8000707@linux.vnet.ibm.com> <4F16A1D3.7070205@linux.vnet.ibm.com> <4F16A46F.8070708@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v3 2/3] simpletrace-v2: Handle variable number/size of elements per trace record. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: aneesh.kumar@linux.vnet.ibm.com, vilanova@ac.upc.edu, stefanha@linux.vnet.ibm.com, qemu-devel@nongnu.org On 01/18/2012 04:29 PM, Stefan Hajnoczi wrote: > On Wed, Jan 18, 2012 at 10:52 AM, Harsh Bora wrote: >> On 01/18/2012 04:11 PM, Harsh Bora wrote: >>> >>> On 01/18/2012 04:01 PM, Stefan Hajnoczi wrote: >>>> >>>> On Wed, Jan 18, 2012 at 9:14 AM, Harsh Bora >>>> wrote: >>>>> >>>>> On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote: >>>>>>> >>>>>>> + unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp); >>>>>>> writeout_idx += num_available; >>>>>>> } >>>>>>> >>>>>>> idx = writeout_idx % TRACE_BUF_LEN; >>>>>>> - while (get_trace_record(idx,&record)) { >>>>>>> - trace_buf[idx].event = 0; /* clear valid bit */ >>>>>>> - unused = fwrite(&record, sizeof(record), 1, trace_fp); >>>>>>> - idx = ++writeout_idx % TRACE_BUF_LEN; >>>>>>> + while (get_trace_record(idx,&recordptr)) { >>>>>>> + unused = fwrite(recordptr, recordptr->length, 1, trace_fp); >>>>>>> + writeout_idx += recordptr->length; >>>>>>> + g_free(recordptr); >>>>>>> + recordptr = (TraceRecord *)&trace_buf[idx]; >>>>>>> >>>>>>> + recordptr->event = 0; >>>>>>> + idx = writeout_idx % TRACE_BUF_LEN; >>>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> I'm wondering if it's worth using a different approach here. Writing >>>>>> out individual records has bothered me. >>>>>> >>>>>> If we have a second buffer, as big as trace_buf[], then a function can >>>>>> copy out all records and make them available in trace_buf again: >>>>>> >>>>>> /** >>>>>> * Copy completed trace records out of the ring buffer >>>>>> * >>>>>> * @idx offset into trace_buf[] >>>>>> * @buf destination buffer >>>>>> * @len size of destination buffer >>>>>> * @ret the number of bytes consumed >>>>>> */ >>>>>> size_t consume_trace_records(unsigned int idx, void *buf, size_t len); >>>>>> >>>>>> That means consume gobbles up as many records as it can: >>>>>> * Until it reaches an invalid record which has not been written yet >>>>>> * Until it reaches the end of trace_buf[], the caller can call again >>>>>> with idx wrapped to 0 >>>>>> >>>>>> After copying into buf[] it clears the valid bit in trace_buf[]. >>>>>> >>>>>> Then the loop which calls consume_trace_records() does a single >>>>>> fwrite(3) and increments idx/writeout_idx. >>>>>> >>>>>> The advantage to this is that we write many records in one go and I >>>>>> think it splits up the writeout steps a little nicer than what we've >>>>>> previously done. >>>>>> >>>>> >>>>> I think this optimization can be introduced as a separate patch later. >>>>> Let me know if you think otherwise. >>>> >>>> >>>> Yes, that could be done later. However there is something incorrect >>>> here. Threads will continue to write trace records into the >>>> ringbuffer while the write-out thread is doing I/O. Think about what >>>> happens when threads overtake the write-out index modulo ringbuffer >>>> size. Since records are variable-length the write-out thread's next >>>> index could point into the middle of an overwritten record. And that >>>> means the ->length field is junk - we may crash if we use it. >>> >>> >>> In case of overwritten records, the valid bit of event id will also be >>> overwritten, and therefore we will not consider that record. Moreover, >>> the writeout thread will further get to know that some events were >>> dropped and will start with the latest trace_idx, right ? >>> >>> However, to handle the extreme rare case of having an overwritten value >>> such that its valid bit appears to be set, we can put a check for< >>> NR_TRACE_EVENTS. Even better to have a magic byte for events also ? >>> >>> Harsh >> >> >> Also, I just realized that we need to put buffer boundary checks while >> copying a trace record: >> >> >>>> -static bool get_trace_record(unsigned int idx, TraceRecord *record) >>>> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr) >>>> { >>>> - if (!(trace_buf[idx].event& TRACE_RECORD_VALID)) { >>>> + TraceRecord *record = (TraceRecord *)&trace_buf[idx]; >>>> + if (!(record->event& TRACE_RECORD_VALID)) { >>>> return false; >>>> } >>>> >>>> __sync_synchronize(); /* read memory barrier before accessing record >>>> */ >>>> >>>> - *record = trace_buf[idx]; >>>> - record->event&= ~TRACE_RECORD_VALID; >>>> + *recordptr = g_malloc(record->length); >>>> + /* make a copy of record to avoid being overwritten */ >>>> + memcpy(*recordptr, record, record->length); >> >> So, I need to use a wrapper which should copy byte by byte taking care of >> boundary overlaps, and that will also save us from crashing (as we will >> never be crossing buffer boundaries), right ? > > It won't be enough if length has a junk value like 2^32 - 1. We don't > want to allocate/copy 4 GB :). So, whats the max limit that we may want to propose for a record length ? Obviously less than buffer length, or TRACE_BUF_LEN / 2 ? Hope, a check for max buffer length with byte-wise copy will help ? Harsh > > Stefan >