From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:59367) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnRcB-0001Xm-7x for qemu-devel@nongnu.org; Wed, 18 Jan 2012 04:15:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RnRc4-0005GW-7U for qemu-devel@nongnu.org; Wed, 18 Jan 2012 04:15:31 -0500 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:47688) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RnRc3-0005GD-CA for qemu-devel@nongnu.org; Wed, 18 Jan 2012 04:15:24 -0500 Received: from /spool/local by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 18 Jan 2012 09:09:36 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q0I9EgTV4632608 for ; Wed, 18 Jan 2012 20:14:43 +1100 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q0I9Eg59016361 for ; Wed, 18 Jan 2012 20:14:42 +1100 Message-ID: <4F168D80.8000707@linux.vnet.ibm.com> Date: Wed, 18 Jan 2012 14:44:40 +0530 From: Harsh Bora MIME-Version: 1.0 References: <1326193179-19677-1-git-send-email-harsh@linux.vnet.ibm.com> <1326193179-19677-3-git-send-email-harsh@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v3 2/3] simpletrace-v2: Handle variable number/size of elements per trace record. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , stefanha@linux.vnet.ibm.com Cc: vilanova@ac.upc.edu, qemu-devel@nongnu.org, aneesh.kumar@linux.vnet.ibm.com On 01/10/2012 10:11 PM, Stefan Hajnoczi wrote: > On Tue, Jan 10, 2012 at 10:59 AM, Harsh Prateek Bora > wrote: >> Advantages over existing simpletrace backend: >> - More than 6 elements (vitually unlimited) arguments can be traced. >> - This allows to trace strings (variable size element) as well. >> >> Signed-off-by: Harsh Prateek Bora >> --- >> monitor.c | 2 +- >> trace/simple.c | 178 ++++++++++++++++++++------------------------------------ >> trace/simple.h | 31 ++++++++-- >> 3 files changed, 88 insertions(+), 123 deletions(-) >> >> diff --git a/monitor.c b/monitor.c >> index ffda0fe..b6f85d1 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -945,7 +945,7 @@ static void do_info_cpu_stats(Monitor *mon) >> #if defined(CONFIG_TRACE_SIMPLE) >> static void do_info_trace(Monitor *mon) >> { >> - st_print_trace((FILE *)mon,&monitor_fprintf); >> + /* FIXME: st_print_trace((FILE *)mon,&monitor_fprintf); */ > > This command is only available from the human monitor. It's not very > useful because it historically hasn't been able to pretty-print events > or show them in the right order (we use a ringbuffer but it prints > them out from index 0). > > Therefore, I don't think we're under any obligation to keep this > command around. No one has complained about it's limitations - I > think this is a sign that no one has used it. I'd be okay with a > patch that removes it. > >> } >> #endif >> >> diff --git a/trace/simple.c b/trace/simple.c >> index b639dda..65cafe2 100644 >> --- a/trace/simple.c >> +++ b/trace/simple.c >> @@ -27,7 +27,7 @@ >> #define HEADER_MAGIC 0xf2b177cb0aa429b4ULL >> >> /** Trace file version number, bump if format changes */ >> -#define HEADER_VERSION 0 >> +#define HEADER_VERSION 2 >> >> /** Records were dropped event ID */ >> #define DROPPED_EVENT_ID (~(uint64_t)0 - 1) >> @@ -35,23 +35,6 @@ >> /** Trace record is valid */ >> #define TRACE_RECORD_VALID ((uint64_t)1<< 63) >> >> -/** Trace buffer entry */ >> -typedef struct { >> - uint64_t event; >> - uint64_t timestamp_ns; >> - uint64_t x1; >> - uint64_t x2; >> - uint64_t x3; >> - uint64_t x4; >> - uint64_t x5; >> - uint64_t x6; >> -} TraceRecord; >> - >> -enum { >> - TRACE_BUF_LEN = 4096, >> - TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4, >> -}; >> - >> /* >> * Trace records are written out by a dedicated thread. The thread waits for >> * records to become available, writes them out, and then waits again. >> @@ -62,11 +45,12 @@ static GCond *trace_empty_cond; >> static bool trace_available; >> static bool trace_writeout_enabled; >> >> -static TraceRecord trace_buf[TRACE_BUF_LEN]; >> +uint8_t trace_buf[TRACE_BUF_LEN]; >> static unsigned int trace_idx; >> static FILE *trace_fp; >> static char *trace_file_name = NULL; >> >> + > > Spurious newline. > >> /** >> * Read a trace record from the trace buffer >> * >> @@ -75,16 +59,19 @@ static char *trace_file_name = NULL; >> * >> * Returns false if the record is not valid. >> */ >> -static bool get_trace_record(unsigned int idx, TraceRecord *record) >> +static bool get_trace_record(unsigned int idx, TraceRecord **recordptr) >> { >> - if (!(trace_buf[idx].event& TRACE_RECORD_VALID)) { >> + TraceRecord *record = (TraceRecord *)&trace_buf[idx]; >> + if (!(record->event& TRACE_RECORD_VALID)) { >> return false; >> } >> >> __sync_synchronize(); /* read memory barrier before accessing record */ >> >> - *record = trace_buf[idx]; >> - record->event&= ~TRACE_RECORD_VALID; >> + *recordptr = g_malloc(record->length); >> + /* make a copy of record to avoid being overwritten */ >> + memcpy(*recordptr, record, record->length); >> + (*recordptr)->event&= ~TRACE_RECORD_VALID; >> return true; >> } >> >> @@ -106,6 +93,47 @@ static void flush_trace_file(bool wait) >> g_static_mutex_unlock(&trace_lock); >> } >> >> +unsigned int trace_alloc_record(TraceEventID event, uint32_t datasize) >> +{ >> + unsigned int idx, rec_off; >> + uint32_t rec_len = ST_V2_REC_HDR_LEN + datasize; >> + uint64_t timestamp_ns = get_clock(); >> + >> + idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, rec_len) % TRACE_BUF_LEN; >> + rec_off = idx; >> + write_to_buffer(rec_off, (uint8_t*)&event, sizeof(event)); >> + rec_off += sizeof(event); >> + write_to_buffer(rec_off, (uint8_t*)×tamp_ns, sizeof(timestamp_ns)); >> + rec_off += sizeof(timestamp_ns); >> + write_to_buffer(rec_off, (uint8_t*)&rec_len, sizeof(rec_len)); >> + rec_off += sizeof(rec_len); >> + return idx; >> +} >> + >> +void write_to_buffer(unsigned int idx, uint8_t *dataptr, uint32_t size) >> +{ >> + uint32_t x = 0; >> + while (x< size) >> + { >> + if (idx>= TRACE_BUF_LEN) { >> + idx = idx % TRACE_BUF_LEN; >> + } >> + trace_buf[idx++] = dataptr[x++]; >> + } >> +} >> + >> +void trace_mark_record_complete(unsigned int idx) >> +{ >> + TraceRecord *record = (TraceRecord*)&trace_buf[idx]; >> + >> + __sync_synchronize(); /* write barrier before marking as valid */ >> + record->event |= TRACE_RECORD_VALID; >> + >> + if (idx> TRACE_BUF_FLUSH_THRESHOLD) { >> + flush_trace_file(false); >> + } > > This looks suspicious. We want to kick the writeout thread when the > buffer reaches the threshold but not repeatedly after that. Also > because it's a ring buffer idx could "start" above the threshold. > > Perhaps something like: > if (idx> TRACE_BUF_FLUSH_THRESHOLD&& old_idx<= TRACE_BUF_FLUSH_THRESHOLD) { > flush_trace_file(false); > } > >> +} >> + >> static void wait_for_trace_records_available(void) >> { >> g_static_mutex_lock(&trace_lock); >> @@ -120,7 +148,7 @@ static void wait_for_trace_records_available(void) >> >> static gpointer writeout_thread(gpointer opaque) >> { >> - TraceRecord record; >> + TraceRecord record, *recordptr; >> unsigned int writeout_idx = 0; >> unsigned int num_available, idx; >> size_t unused __attribute__ ((unused)); >> @@ -130,19 +158,20 @@ static gpointer writeout_thread(gpointer opaque) >> >> num_available = trace_idx - writeout_idx; >> if (num_available> TRACE_BUF_LEN) { >> - record = (TraceRecord){ >> - .event = DROPPED_EVENT_ID, >> - .x1 = num_available, >> - }; >> - unused = fwrite(&record, sizeof(record), 1, trace_fp); >> + record.event = DROPPED_EVENT_ID, >> + record.length = num_available, > > Please don't use .length for the number of bytes dropped because this > makes life harder for parsing scripts like simpletrace.py. They would > need a special case to check for DROPPED_EVENT_ID and then treat > .length as an argument. Instead the number of bytes should be an > argument. > > Also, by removing the struct literal assignment you lose the automatic > zeroing of fields. We'll write out junk data - it would be nicer to > clear it. > >> + unused = fwrite(&record, ST_V2_REC_HDR_LEN, 1, trace_fp); >> writeout_idx += num_available; >> } >> >> idx = writeout_idx % TRACE_BUF_LEN; >> - while (get_trace_record(idx,&record)) { >> - trace_buf[idx].event = 0; /* clear valid bit */ >> - unused = fwrite(&record, sizeof(record), 1, trace_fp); >> - idx = ++writeout_idx % TRACE_BUF_LEN; >> + while (get_trace_record(idx,&recordptr)) { >> + unused = fwrite(recordptr, recordptr->length, 1, trace_fp); >> + writeout_idx += recordptr->length; >> + g_free(recordptr); >> + recordptr = (TraceRecord *)&trace_buf[idx]; >> + recordptr->event = 0; >> + idx = writeout_idx % TRACE_BUF_LEN; >> } > > I'm wondering if it's worth using a different approach here. Writing > out individual records has bothered me. > > If we have a second buffer, as big as trace_buf[], then a function can > copy out all records and make them available in trace_buf again: > > /** > * Copy completed trace records out of the ring buffer > * > * @idx offset into trace_buf[] > * @buf destination buffer > * @len size of destination buffer > * @ret the number of bytes consumed > */ > size_t consume_trace_records(unsigned int idx, void *buf, size_t len); > > That means consume gobbles up as many records as it can: > * Until it reaches an invalid record which has not been written yet > * Until it reaches the end of trace_buf[], the caller can call again > with idx wrapped to 0 > > After copying into buf[] it clears the valid bit in trace_buf[]. > > Then the loop which calls consume_trace_records() does a single > fwrite(3) and increments idx/writeout_idx. > > The advantage to this is that we write many records in one go and I > think it splits up the writeout steps a little nicer than what we've > previously done. > I think this optimization can be introduced as a separate patch later. Let me know if you think otherwise. >> >> fflush(trace_fp); >> @@ -231,7 +196,7 @@ void st_set_trace_file_enabled(bool enable) >> static const TraceRecord header = { >> .event = HEADER_EVENT_ID, >> .timestamp_ns = HEADER_MAGIC, >> - .x1 = HEADER_VERSION, >> + .length = HEADER_VERSION, > > Hmm...this is kind of ugly (see comment about using .length above) but > in this case most parsers will have a special-case anyway to check the > magic number. We need to use the .length field because historically > that's where the version is located. > So, lets keep the version here only, right ? - Harsh >> +/* Interfaces for simpletrace v2 */ > > Please drop this comment. All trace/simple.[ch] code is for v2, this > makes it seems like we support both versions at the same time. > >> + >> +/** Trace buffer entry */ >> +typedef struct { >> + uint64_t event; /* TraceEventID */ >> + uint64_t timestamp_ns; >> + uint32_t length; /* in bytes */ >> + uint32_t reserved; /* unused */ >> + uint8_t arguments[]; /* arguments position affects ST_V2_REC_HDR_LEN */ >> +} TraceRecord; > > Does this need to be visible? I think it can live in trace/simple.c > >> + >> +#define ST_V2_REC_HDR_LEN 24 > > Does not need to be visible. Also why calculate manually, instead of > a macro I would use sizeof(TraceRecord) (the array without elements is > not counted by sizeof). > >> + >> +enum { >> + TRACE_BUF_LEN = 4096 * 64, >> + TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4, >> +}; > > Does not need to be visible. > >> + >> +extern uint8_t trace_buf[TRACE_BUF_LEN]; > > Does not need to be visible. >