From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46AAB481A3 for ; Fri, 18 Oct 2024 02:17:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729217851; cv=none; b=p9cSc2v3LdHCXKId+eQTiA3HvI2lJzABR3yLvfrzalH/63DbYM5Pwrpoe0dlfcyc0EmrU4gvQ07qINcK9R1vACq5FPgkb9QpN/oCZAZw4K9Czp8xRdPhsBi9n8otOiTymqY0Yc7bfl2WbYV0htIlMjaHZ9OK0XPcwfZ2T9DrMh8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729217851; c=relaxed/simple; bh=cXmFuU3hl80AWpI9Sbnzob5ySx9oCG3xby6WuWbqamg=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=Ttp2fyBhnPUgxOl9sLoaYeMRldoohuCZTNCKcR8DkQmMuwiB8RECvjntmu/5kMEdEpZPt7FYFpdQYzFtZI70vWlNqUq4IFRiTunA2Gk1HHcGXztvcmN1smeI7svOCQCs0ntroZ5ap8t+EAJqpt4v21LZlj31Z/7o057iZzZfSQw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LQLV9Xxv; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LQLV9Xxv" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-71e953f4e7cso927351b3a.3 for ; Thu, 17 Oct 2024 19:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729217849; x=1729822649; darn=vger.kernel.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=sY1tk31eY5PmrWYW+0qrwmPB9QN1df6dKOqnGK0mPto=; b=LQLV9XxvfcEvPCPshfBtpp8jq8J7yAW6cKtS0JG8FVw1Rz8CfBeosrSMl6I2aRflIm AWrUpKiS1NncYR9sK3XFOxIYuzCQgVGHGxPnsmKPg/L/02VXBkez0jmP6kxq5fS3nTXa tKSTZhiDWU1+IU4ZxEdgAYBynwrGtCjQtfGh+fPmaxnJFiYjwMIMgcDc8Zo100X2HMw7 b7JeiJf+5yYQD7taaIwtP+xmeHJlVIpH4CX9EKU+cq3MY41CkYqtbflOQnMQwUyE6Ttz qdoMIXSToGbJcOJNLiwlYS+WciMUVXPr1885F3T9XoRKzEqVVNhl3FSdnPaRFNzNTi5o 3WaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729217849; x=1729822649; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sY1tk31eY5PmrWYW+0qrwmPB9QN1df6dKOqnGK0mPto=; b=ddePyfS9zi9F/mYhQACZvn/OXGINfKBYwoYX9bPnD6RMab6t1HUR1OyIxG53dN/xWT 9hh55hoG66JCkPuU31qcbBqjEACRGWpfXHncA3NQ8GTrasiQvzBkiZlUIlma5pquNtOS XiszvMo1SQmQnKSpBgHOjLHxnHVytm/Hfo+PIt954HsSvr+WSFnrJ0ewZ2KDqNDjv9x8 0GEN1/qnPM6E6rclJO+JdO1yU3PeOt0cNpcGt7Y8zZzwsiuM6RUcA2uzXlQqKjyYZdvb KpxbAPPAGPiedUMQHun3LYKeq82LJtS2RpleQwxRw8pBhDuQfnAIzLF9YculdzLqKZHd BR+w== X-Forwarded-Encrypted: i=1; AJvYcCUjTz1Sh/x4VKlAwn68U9wlVDcGIvylk+Nz5WG/Kl3ml2wnLZIhTMarDmg9iTZleqdrfqI=@vger.kernel.org X-Gm-Message-State: AOJu0Yz3bjZTo7kxwLZlkfgxiw9+VkJbi26FyOnw9F8WifA5EdsmSI01 sxG3v8GYxLpPyfHpt+g69w7af+hh2DRpJoHcxDrFats/rtDUwNwl X-Google-Smtp-Source: AGHT+IGyym2pWaBrdXlf4WBwIysWVQ/TNG7KgzqrBSM4igPy/2DBqQS3YMzlvJy3jnq7gXRbZmqO3g== X-Received: by 2002:a05:6a00:3929:b0:71d:fbf3:f769 with SMTP id d2e1a72fcca58-71ea322d35bmr1755259b3a.28.1729217849395; Thu, 17 Oct 2024 19:17:29 -0700 (PDT) Received: from [192.168.0.235] ([38.34.87.7]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71ea34ab98fsm352323b3a.193.2024.10.17.19.17.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Oct 2024 19:17:28 -0700 (PDT) Message-ID: <1019fe0ba9fecc7025754de1cb732befdf237bd0.camel@gmail.com> Subject: Re: [PATCH bpf-next v1 1/2] bpf: force checkpoints at loop back-edges From: Eduard Zingerman To: Alexei Starovoitov Cc: Andrii Nakryiko , bpf , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Kernel Team , Yonghong Song Date: Thu, 17 Oct 2024 19:17:24 -0700 In-Reply-To: References: <20241009021254.2805446-1-eddyz87@gmail.com> <46ff5f908c2ba69ebfa2033456425632c5f74c6f.camel@gmail.com> <5c4eca8da640c4be42edca1fc3ffcd0650f69b08.camel@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.4 (3.52.4-1.fc40) Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Thu, 2024-10-10 at 16:23 -0700, Alexei Starovoitov wrote: > On Thu, Oct 10, 2024 at 3:52=E2=80=AFPM Eduard Zingerman wrote: > >=20 > > On Thu, 2024-10-10 at 15:40 -0700, Eduard Zingerman wrote: > >=20 > > > I think this would bring significant speedup. > > > Not sure it would completely fix the issue at hand, > > > as mark_chain_precision() walks like 100 instructions back on each > > > iteration of the loop, but it might be a step in the right direction. > >=20 > > In theory, we can do mark_chain_precision() lazily: > > - mark that certain registers need precision for an instruction; > > - wait till next checkpoint is created; > > - do the walk for marked registers. > >=20 > > This should be simpler to implement on top of what Andrii suggests. >=20 > Indeed it can wait until the next new_state, since > when the next iteration of the loop starts the sl->state > are guaranteed to be some olderstates. Though the verifier may have done > multiple loop iteration traversals, but it's still in 'cur' state, > so states_equal(env, &sl->state, cur...) should still be correct. After spending absolutely unreasonable amount of time on this I finally identified the missing piece of the puzzle. The example program takes so much time to verify because it accumulates a huge jmp_history array (262145 entries) within a single state and constantly executes mark_chain_precision() against it. The reason for such huge jmp_history is an interplay between two conditions: if (env->jmps_processed - env->prev_jmps_processed >=3D 2 && env->insn_processed - env->prev_insn_processed >=3D 8) add_new_state =3D true; and if (!force_new_state && env->jmps_processed - env->prev_jmps_processed < 20 && env->insn_processed - env->prev_insn_processed < 100) add_new_state =3D false; I try to explain it in the commit message for [1]. If one waits long enough, the verifier eventually finishes with -ENOMEM, as it fails to allocate big enough jmp_history array (one has to wait 15 minutes on master, and about a minute with my patch [2]= , the patch has not changed since our discussion on Wednesday, the error message is the same as we discussed on Wednesday). The discussed change to lazy precision marks propagation turned out to be a dud: - it does not solve the issue at hand, verifier still exits with ENOMEM, just faster; - overall verification speedup for regular programs is not big: $ ./veristat -e file,prog,duration -f 'duration>100000' -C master-baseline.= log current.log=20 File Program Duration (us) (A) Duration (us= ) (B) Duration (us) (DIFF) ------------------------ ---------------- ----------------- ------------= ----- -------------------- loop1.bpf.o nested_loops 124334 1= 17382 -6952 (-5.59%) loop3.bpf.o while_true 318165 2= 53264 -64901 (-20.40%) pyperf100.bpf.o on_event 704852 6= 92300 -12552 (-1.78%) pyperf180.bpf.o on_event 2304115 22= 51502 -52613 (-2.28%) pyperf50.bpf.o on_event 183971 1= 84124 +153 (+0.08%) pyperf600.bpf.o on_event 2076073 20= 51733 -24340 (-1.17%) pyperf600_nounroll.bpf.o on_event 394093 4= 02552 +8459 (+2.15%) strobemeta.bpf.o on_event 169218 1= 69835 +617 (+0.36%) test_verif_scale1.bpf.o balancer_ingress 144779 1= 46486 +1707 (+1.18%) test_verif_scale2.bpf.o balancer_ingress 147078 1= 51902 +4824 (+3.28%) test_verif_scale3.bpf.o balancer_ingress 184959 1= 88723 +3764 (+2.04%) Here only loop3 demonstrates significant speedup (re-run for loop3 several = times), but it does not warrant the code change in my opinion. [1] https://lore.kernel.org/bpf/20241018020307.1766906-1-eddyz87@gmail.com/ [2] https://github.com/eddyz87/bpf/tree/lazy-mark-chain-precision