From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: * X-Spam-Status: No, score=1.0 required=3.0 tests=DKIM_SIGNED,FSL_HELO_FAKE, MAILING_LIST_MULTI,SPF_PASS,T_DKIM_INVALID,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F7B2C433F5 for ; Mon, 10 Sep 2018 12:06:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2D90E20866 for ; Mon, 10 Sep 2018 12:06:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lj6d/tFx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D90E20866 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728314AbeIJRAd (ORCPT ); Mon, 10 Sep 2018 13:00:33 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:41092 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727877AbeIJRAd (ORCPT ); Mon, 10 Sep 2018 13:00:33 -0400 Received: by mail-wr1-f67.google.com with SMTP id z96-v6so21654033wrb.8 for ; Mon, 10 Sep 2018 05:06:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=i4oYvPpyhGbaZbTzF7HKhVMHFDnPFFErzh4WQuUMuOM=; b=lj6d/tFxWuzPLxeZXMqyAyxWweb65ycSgZvB/oQ+Z7ExQf8eFgz5PqasKyK14/pLT0 ooeS9tGP02LO+uCD2cYae4atEIuulQJj3QLmGO4W7K2CWgYU11NRnFelAb7OOoXLGLfe iltV9os0ZEY5UZ8LBROuXvUO9BamB0+TsUdtmyJ0UVioCg3i0jy2Z1MQMwIaXMZO6gF6 TeqBHeFeuuqqvH/THwr+e5zTfJaBALRtQXSNS6t4znTpLBuWtPngKFmn2SVyutgHQ3Fk qAQgqGhJSAQmSb0i6n74FEiZFqXRfhjpxSfTUM7UuRSbcXZtzp8SyxyUv+6VlQjw48xu L8mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=i4oYvPpyhGbaZbTzF7HKhVMHFDnPFFErzh4WQuUMuOM=; b=M0KO5yLPv4bs77iCl1nRlCtz1kOy0kALwoMk621h2YR/dqCPDJwR7BgIhrGd2hhagP UBKY47pR5lAcvOcyI5EuLvujuDwdfAidAo3o/MSGK5pw2T1N4eOeicrb7y1vyfgkcnBa NcPrUgfdsApBTR/md8TLavc45auB+90JY6PKNgttzykVRQPl0SGDiYzwjXh7djQDya2W pOkDr+A43czJwvDPxg75SJdxFsbOze87gYvV5noa+z3yk2H3LP0TEbopOUlKiErGWGaY +9pdoxJEiQ3GOU0aUjVBEfsUMV/xqAnVMmhH8i834N5DKdt3P84PQ+Nm/2hUfpV47EsW 7nqQ== X-Gm-Message-State: APzg51BaMkVGnaIHeSMG7r0lIwOzK/Dsv50Bq6DxSiZKWaMkSPT7OGIe yaIffxPCvW80g/pNJmWLZb0= X-Google-Smtp-Source: ANB0VdZ3sDa/uEFGTqwSAL+snkArlNGVZXlGwTvdXx2z0q4F/LOq5NTtQYvAud6siPd3j2qMrOWN6g== X-Received: by 2002:adf:c08c:: with SMTP id d12-v6mr14571212wrf.268.1536581206620; Mon, 10 Sep 2018 05:06:46 -0700 (PDT) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id 200-v6sm28084860wmv.6.2018.09.10.05.06.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 10 Sep 2018 05:06:45 -0700 (PDT) Date: Mon, 10 Sep 2018 14:06:43 +0200 From: Ingo Molnar To: Alexey Budankov Cc: Peter Zijlstra , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Andi Kleen , linux-kernel Subject: Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly parallel CPU bound workloads Message-ID: <20180910120643.GA4217@gmail.com> References: <20180910091841.GA4664@gmail.com> <2c5d4b01-0eb8-f97e-6a70-44be7961d7f8@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2c5d4b01-0eb8-f97e-6a70-44be7961d7f8@linux.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Alexey Budankov wrote: > Hi Ingo, > > On 10.09.2018 12:18, Ingo Molnar wrote: > > > > * Alexey Budankov wrote: > > > >> > >> Currently in record mode the tool implements trace writing serially. > >> The algorithm loops over mapped per-cpu data buffers and stores > >> ready data chunks into a trace file using write() system call. > >> > >> At some circumstances the kernel may lack free space in a buffer > >> because the other buffer's half is not yet written to disk due to > >> some other buffer's data writing by the tool at the moment. > >> > >> Thus serial trace writing implementation may cause the kernel > >> to loose profiling data and that is what observed when profiling > >> highly parallel CPU bound workloads on machines with big number > >> of cores. > > > > Yay! I saw this frequently on a 120-CPU box (hw is broken now). > > > >> Data loss metrics is the ratio lost_time/elapsed_time where > >> lost_time is the sum of time intervals containing PERF_RECORD_LOST > >> records and elapsed_time is the elapsed application run time > >> under profiling. > >> > >> Applying asynchronous trace streaming thru Posix AIO API > >> (http://man7.org/linux/man-pages/man7/aio.7.html) > >> lowers data loss metrics value providing 2x improvement - > >> lowering 98% loss to almost 0%. > > > > Hm, instead of AIO why don't we use explicit threads instead? I think Posix AIO will fall back > > to threads anyway when there's no kernel AIO support (which there probably isn't for perf > > events). > > Explicit threading is surely an option but having more threads > in the tool that stream performance data is a considerable > design complication. > > Luckily, glibc AIO implementation is already based on pthreads, > but having a writing thread for every distinct fd only. My argument is, we don't want to rely on glibc's choices here. They might use a different threading design in the future, or it might differ between libc versions. The basic flow of tracing/profiling data is something we should control explicitly, via explicit threading. BTW., the usecase I was primarily concentrating on was a simpler one: 'perf record -a', not inherited workflow tracing. For system-wide profiling the ideal tracing setup is clean per-CPU separation, i.e. per CPU event fds, per CPU threads that read and then write into separate per-CPU files. Thanks, Ingo