From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EF89C61D97 for ; Sun, 29 Jan 2023 13:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232549AbjA2Ngk (ORCPT ); Sun, 29 Jan 2023 08:36:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232422AbjA2Ngj (ORCPT ); Sun, 29 Jan 2023 08:36:39 -0500 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 094A01CF4C for ; Sun, 29 Jan 2023 05:36:38 -0800 (PST) Received: by mail-ed1-x535.google.com with SMTP id v10so8603430edi.8 for ; Sun, 29 Jan 2023 05:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=grsecurity.net; s=grsec; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=TH4A0epdKZGSczFkZaVzP1kc5upecdDt0S+P8DTQiA0=; b=llq95dqRk/IljFyJEXScdIZyW7TSvJEEuT7ws/ub0UgH6Bs/mBoPlfqLszgJwMZGjS w9ATUTasliOjQpvkEEE+gTzaifgCCV0LRBzHR7HFZ+mvs2fQBdhhtvzAc3O/7y4ozeEL TZaZ+bm7haynSkoVtPXFqial6St8Y/iqIn2PzJOS7J7pz2I60JnbW37YfK7gCXBPU+x4 JtmH3F/LD/eqf98r5SMubT0EaEEV5S5skkDSQ1dEebo/nbo7QVc1oZVEbNfkcIjU8dR+ K0D7NOFbYjZR+1R44JKstuPZaa8aKYDEvO87YYX/Yi/nIV1zfN/B1Josgsgy/8Ckkpa8 OYow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TH4A0epdKZGSczFkZaVzP1kc5upecdDt0S+P8DTQiA0=; b=sMLnUqHEBmS9DM4EeASqW5PLmY6KxbIAw5C+S3dhYBwNVlDdFNv/RS9xnIw8cQsZa1 rrB6bgkAjWFEXsrPsmAoYObEzvS0V/gD43TrGlpRRnqMNWWHFkDTJblgDRlVdzCGiNhi Q86mU3SGVrPtTZi5DA+edbL1qaUASNv3bfFyY4qHKZujE8vyhUpA3jQ2EiVTdVdiM+RJ eXeCuFluroi3yLrKB78gePwVVJSNQXDlKx0E+/NQbSoLNJXhBB709ZmTfhzBP+jiI68B Z/yMADdM013lnCX2Bb9Xq63sU1sgJhvzL0pblpX2bhT/9d8rPvd1G0YOLqGgao5FsKPY CRBA== X-Gm-Message-State: AO0yUKWTIZ30nnnYb6lW/+GlPNoh3xljora5oQjtJUdRSgahRaaYkOMr 6Dn8m7DN7P5a708x3hLKBHi34g== X-Google-Smtp-Source: AK7set9rxB4k0t3LpCQQVcj591I0ibUlB/+uFmYaGbGLIAyl9CiZZzjpfOZYBpIbEDMfw580xHhZ4g== X-Received: by 2002:a05:6402:b5a:b0:4a2:331e:85f2 with SMTP id bx26-20020a0564020b5a00b004a2331e85f2mr3420798edb.26.1674999396541; Sun, 29 Jan 2023 05:36:36 -0800 (PST) Received: from ?IPV6:2003:f6:af40:8200:19e:ccc3:be2a:f498? (p200300f6af408200019eccc3be2af498.dip0.t-ipconnect.de. [2003:f6:af40:8200:19e:ccc3:be2a:f498]) by smtp.gmail.com with ESMTPSA id cf8-20020a0564020b8800b004a18f2ffb86sm4375723edb.79.2023.01.29.05.36.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Jan 2023 05:36:36 -0800 (PST) Message-ID: Date: Sun, 29 Jan 2023 14:36:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Content-Language: en-US, de-DE To: Junio C Hamano Cc: git@vger.kernel.org, =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= , =?UTF-8?Q?Carlo_Marcelo_Arenas_Bel=c3=b3n?= References: <20221216121557.30714-1-minipli@grsecurity.net> <20230127154952.485913-1-minipli@grsecurity.net> From: Mathias Krause In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 27.01.23 18:39, Junio C Hamano wrote: > Junio C Hamano writes: > >> Yes, the "instead of failing hard, fall back" makes sense. Just >> that I do not see why the runtime test is a good thing to have. In >> short, we are not in the business of catching bugs in pcre2_jit >> implementations, so if they say they cannot compile the pattern (I >> would even say I doubt the point of checking the return code to >> ensure it is NOMEMORY), it would be fine to let the interpreter >> codepath to inspect the pattern and diagnose problems with it, or >> take the slow match without JIT. >> >> What am I missing? > > Note that I've seen and recently re-read the discussion that leads to > https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ Ahh, so ignore my last advise in the previous Email. You already read it. > I suspect that this auto-probe is related to solving "the user > thinks JIT is in use but because of failing JIT the user's pattern > is getting horrible performance" somehow. But I do not think a hard > failure is a good approach to help users in such a situation. Yes, it's exactly trying to detect this case and not cause a regression for users expecting 'git grep' to make use of the JIT. So, beside the W|X memory allocation error, other errors are likely only to point out limitations of the JIT compiler, e.g. the example I gave in https://lore.kernel.org/all/2b04b19a-a2bd-3dd5-6f21-ed0b0ad3e02f@grsecurity.net/, which is, admitted, a made up example that can easily be worked around by manually prefixing it with '(*NO_JIT)'. It would be a pain having to do that for *every* pattern, but only doing it for the pathological cases should be fine. Otherwise more users would have run into it already and complained about it, no? > After such a failure, the user can prefix "(*NO_JIT)" to the pattern > and retry, or give up the operation altogether and not get a useful > result, but wouldn't it be far more helpful to just fallback as if > (*NO_JIT) was on from the beginning? Sure, if it would be required for *every*, i.e. "normal" patterns. But always doing it even for abusive ones doesn't feel right. > Also I notice that p->pcre2_jit_on is per "struct grep_pat", so it > is not like "once we see a pathological pattern, we turn off JIT > completely for other patterns", right? That is, if you have > > git grep -P -e "$A" -e "$B" > > and we fail to compile "$A" (for whatever reason), we could still > (attempt to) compile "$B". Perhaps $A was too complex or was > incompatible with JIT combined with other options, but $B may be > easy enough to still be JITtable, in which case we would match with > the JITted version of $B with interpreted version of $A, instead of > failing, right? The current version of git would fail hard if it fails to JIT compile "$A". My patch doesn't change that behavior and that's intentional, as I share Ævar's thinking about falling back to the interpreter mode for "complex patterns" (which means pathological cases, really) is a bad idea. While we might be able to compile the pattern and run it in interpreter mode, it'll likely have a *much* higher runtime. Just to get you a glimpse of how much longer, this is what it takes running the pathological pattern from above's example on git.git: $ time git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" Binary file git-gui/macosx/git-gui.icns matches Binary file t/t5000/pax.tar matches Binary file t/t5004/big-pack.zip matches Binary file t/t5004/empty-with-pax-header.tar matches real 44m42,150s user 577m14,623s sys 0m46,210s So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to fall back to something like this for the pathological cases? ...Yeah, I don't think so either. Thanks, Mathias