From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EF495A10B for ; Wed, 7 Feb 2024 16:31:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.188.123 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707323470; cv=none; b=ZvTALQ6ch2N8zWxxmchyuHTOx8zen4lW6ZrPBZ17MPYYD2qHVI3azqQS73GYd59+YhDDD0g9glVyJHuF6W1bRWFeEGeRGe65DvkRfBDRrADPmDgw8fSGCbNmiPpLPdV/ERyTZUi4QCbARH9i5yb0jDN3i6BDZjjXveaCnibtYr8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707323470; c=relaxed/simple; bh=4Ow+hZEZVXZ1hL8+4TGPWGjfkGfocliWjZCVVMVaEMw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Ojz6YrAh/QqZzhN1AqS4Ulsy+S6yRx3Onp7FBuRj37p20MKuVTPxEN+cYB/kbr+JNW7LZ2sh+87NyA8t2jwT7zNPM3IRYbGoV0iea9z+w7TGAzN7PNoiadshlxiuaEIOKvNso/N3gF8cCT+YBa0k0dotpWjj8nEQ6aXIKA8XVIU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com; spf=pass smtp.mailfrom=canonical.com; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b=sEPkYZdm; arc=none smtp.client-ip=185.125.188.123 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=canonical.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="sEPkYZdm" Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 2F6C040606 for ; Wed, 7 Feb 2024 16:31:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1707323465; bh=MZfq+RqPUpdCf28LWQdKtzFBG6Lk/NfNn4byOmQIDZ0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=sEPkYZdmtxad1lLv7mLrH1hg/t8nxsOJo28Ok9A8jKDvDNzdYWZFdFTn0aUtqfSek QwA+ZVgnTd7ckq85nacwepn+71R0T5p0xSf+lPuJEFtpEZVRmXLaqxsivdSNjNj0W9 qB/+scvUanuaxdjboFgQlCCIGVzejmwlkcu211UxyklQIkoxKtOE8zYVVXShipkbIv fsJTrumPIg21XNJlmlkRHWmeiB+LbqPqiYK5dKg0un+E6rLhPQfL0W7dN0Vq9im+B0 YIQu4FFm6Yx6pPkZDAOsK381JIQn/GzuubHCuGrRs+fum2bgK0KnXQ0L3n2K+MS9B1 VF9OxL/KRhzuA== Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-a3845827fdaso36371366b.1 for ; Wed, 07 Feb 2024 08:31:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707323465; x=1707928265; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MZfq+RqPUpdCf28LWQdKtzFBG6Lk/NfNn4byOmQIDZ0=; b=YSc7vxOuUi66r3NPCBTuSwSlu7lq4SHBu74DkCXeTfj41uc8W0GjmxUe53e4MWBCAW kvlFq16cK0Xp8hIWJUQKVDEy8ogOgBSJwu4ZMjvrpJwQ0oehWxzBBFPxWd8tZi1GGj9k SWGD0t97D2qXllLxTEhNxU3GxIKv1Qia8v13BIWFUZSJxzkKfDTPBMOtL8lMEMwob7CQ S+iB5DEkVvCCsUlZ/skA45qvn9E2C7pl0TtsvU4GTTk5As2OlLc8D7s1vGJphE6uoRW0 0bW0tSOoECSA1mZbu4CItdtyc8oCE/NPu/AScki/MvX8FK0x2mVGpmQhstmcGByG66vn KQsA== X-Forwarded-Encrypted: i=1; AJvYcCWi+5w602iRVwAYSkMjVt6nmAQCRNhpgTHUBzPXKwMk7k+MY5jPiqDOCfB/OkoJzIozqNj9RXUaT2UJyTUdH4+Djl6g X-Gm-Message-State: AOJu0YxAHmHRch4Olb5HLbn7ofGx19jU6KjLqDHVdgJ5BYN4Xkku7Q4O hZkWtG5vDhe8dBajY04R3QDpahtUcju+6+SAEh8GxbVwgQXIU7anVyrJqsvFK1Ee3vBxoW6372n PBTQr51pFsmRxfChoMTKdB/rMJoZTzX4z01oVgroLGZEpBotStbT6/tI2w8u40wNIEg== X-Received: by 2002:a17:906:5297:b0:a38:a44f:d6df with SMTP id c23-20020a170906529700b00a38a44fd6dfmr298890ejm.0.1707323464705; Wed, 07 Feb 2024 08:31:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IEN3pwtePqW/o+feWySNjMEkRtWv2/jYaUxgoPaEOvz/ifOu488RTCgq8rR/bhlZM7+aIhIAQ== X-Received: by 2002:a17:906:5297:b0:a38:a44f:d6df with SMTP id c23-20020a170906529700b00a38a44fd6dfmr298876ejm.0.1707323464342; Wed, 07 Feb 2024 08:31:04 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCWFADFk6bxsMytMu1B33RYQ70FfK3/yV8c4HpIDchyK83DycnKRI1WcuZ6DEuIurLZvJzKdKf3AhFBPnl7i1WsgBKnKNOMU0gCxLpNiQu7CdAWq5koCaJPbs57yYiqHCIv2CJROkEi/lK3todsH8QHwt+78PtE9dcRZwso+ Received: from localhost (host-79-53-78-93.retail.telecomitalia.it. [79.53.78.93]) by smtp.gmail.com with ESMTPSA id oz14-20020a170906cd0e00b00a38620c3b3dsm913512ejb.198.2024.02.07.08.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Feb 2024 08:31:03 -0800 (PST) Date: Wed, 7 Feb 2024 17:31:02 +0100 From: Andrea Righi To: Joel Fernandes Cc: "Paul E. McKenney" , Uladzislau Rezki , Joel Fernandes , rcu@vger.kernel.org, "Cc: Frederic Weisbecker" Subject: Re: Observation on NOHZ_FULL Message-ID: References: <6a2b6857-57e7-4321-adae-132ba69a4fff@paulmck-laptop> <0e15e91e-da47-45dd-b7de-7f89b7b6002b@joelfernandes.org> <9f9b6885-6d9d-4f30-9635-49443e1615e9@joelfernandes.org> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9f9b6885-6d9d-4f30-9635-49443e1615e9@joelfernandes.org> On Wed, Feb 07, 2024 at 10:48:10AM -0500, Joel Fernandes wrote: > > > On 2/6/2024 12:51 PM, Andrea Righi wrote: > > - stress-ng --matrix seems quite unpredictable to be used a benchmarks > > in this scenario (the bogo-ops/s are very susceptible to any kind of > > interference, so even if in the long runs NO_HZ_FULL still seems to > > provide some benefits looking at the average, we also need to > > consider that there might be a significant error in the measurements, > > standard deviation was pretty high) > > > > Ack on the bogo-ops disclaimers as also mentioned in the stress-ng docs. Agreed > a better metric for perf is helpful. > > I am assuming you also have RCU_NOCB enabled for this test? yep, booting with rcu_nocb=all. > > > - fio doing short writes (in page cache) seems to perform like 2x > > better in terms of iops with nohz_full, respect to the other cases > > and it performs 2x slower with large IO writes (not sure why... need > > to investigate more) > > This is interesting, it could be worth counting how many kernel entries/exits > occur for large IO vs small IO. I'd imagine for large IO we have fewer syscalls > and hence lower entry/exit overhead. But if there more interrupts for whatever > reason with large IO, then that also implies more kernel entries/exits. As > Frederic was saying, NOHZ_FULL has higher overhead on kernel entry/exit. Good point. I was expecting to see a reduced performance due to the kernel entry/exit overhead, assuming we had more syscalls in the short writes case, that's why I was confused. But, as you correctly pointed out, we may actually have more entry/exit events due to interrupts. I'll repeat the test counting the kernel entries/exits to figure out what's happening exactly, thanks for the hint! > > > > > - with lazy RCU enabled hrtimer_interrupt() takes like 2x more to > > return, respect to the other cases (is this expected?) > > It depends on which hrtimer_interrupt() instance? There must be several in the > trace due to unrelated timers. Are you saying the worst case or it is always 2x > more? We do queue a timer for Lazy RCU to flush the RCU work but it is set to 10 > seconds and should be canceled most of the time (Its just a guard rail). It is > possible there is lock contention on ->nocb_gp_lock whizxzch is causing the timer > handler execution to be slow. We have several trace_rcu_nocb* trace points, > including for the timer. Perhaps you could enable those and we dig deeper? So, my plan was to (try to) measure the potential overhead that we may introduce by enabling the new config options in the tick handler, therefore I tried to tace the time distribution of the different hrtimer_interrupt() invocations, using this: --- #!/usr/bin/bpftrace kprobe:hrtimer_interrupt { @start[tid] = nsecs; } kretprobe:hrtimer_interrupt /@start[tid]/ { @elapsed = hist(nsecs - @start[tid]); delete(@start[tid]); } --- The time distribution is always ~2x with lazy RCU enabled, but honestly I don't know if this metric is really relevant. In terms of performance it is probably more relevant the fact that we move the RCU callbacks from softirq to kthread context and process them in batch. This for sure has a bigger impact respect to the overhead that we may add to the tick handler. But I'll try to enable the trace_rcu_nocb* tracepoints and see if we can get some interesting details of what is happening. > > Further, it is interesting to see if it is only the hrtimer_interrupt() instance > that also results in a call to do_nocb_deferred_wakeup_timer() via say function > tracing. That will confirm that it is the lazy timer that is slow for you. ok > > The actual number of callbacks should not be causing specifically the > hrtimer_interrupt() to take too long to run, AFAICS. But RCU's lazy feature does > increase the number of timer interrupts. > > Further still, it depends on how much hrtimer_interrupt() takes with lazy RCU to > call it a problem IMO. Some numbers with units will be nice. This is what I see (this is a single run, but the other runs are similar), unit is nanosec, with lazy RCU enabled hrtimer_interrupt() takes around 4K-16K ns, with lazy RCU off most of the times it takes 2K-4K ns: - lazy rcu off: [1K, 2K) 88307 |@@@@@@@@@@@@ | [2K, 4K) 380695 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4K, 8K) 194 | | - lazy rcu on: [2K, 4K) 3094 | | [4K, 8K) 265763 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 182341 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16K, 32K) 3422 | | Again, I'm not sure if this is really a problem or not, or if it is even a relevant metric for the overall performance, I was just curious to understand why it is different. Thanks, -Andrea