From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85F6EC433EF for ; Mon, 18 Oct 2021 15:10:44 +0000 (UTC) Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 04D4F610C7 for ; Mon, 18 Oct 2021 15:10:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 04D4F610C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=vt.edu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernelnewbies.org Received: from localhost ([::1] helo=shelob.surriel.com) by shelob.surriel.com with esmtp (Exim 4.94.2) (envelope-from ) id 1mcUHS-0007HV-Pc; Mon, 18 Oct 2021 11:10:26 -0400 Received: from mail-qv1-xf34.google.com ([2607:f8b0:4864:20::f34]) by shelob.surriel.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1mcUHP-0007GH-JO for kernelnewbies@kernelnewbies.org; Mon, 18 Oct 2021 11:10:23 -0400 Received: by mail-qv1-xf34.google.com with SMTP id z15so10405457qvj.7 for ; Mon, 18 Oct 2021 08:10:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vt-edu.20210112.gappssmtp.com; s=20210112; h=sender:from:to:cc:subject:in-reply-to:references:mime-version :content-transfer-encoding:date:message-id; bh=bjUQsS/JhsFJ7A/LEPaXRPi0vFqsL9QJaIIVS2Ss/90=; b=tQE6cJE0pnAsHfyrGzoPORr9usUSs4bYLfxYaZXCWEWCfUev0+FAX9wWRn2z3hxFiJ IMRvu72Udfno46/tccNewEb3SPOZKyD93p2DdD2U5M/phSeC8Xs2tc6sXwyE9A6pzLaW g093t8LCJRNcUhvVFUM7JjwL0ldbok/F3KIdCnr/JhbDz0dPUtPP/jARynNCS7a4rmBJ JbD+xOYf4nJxoyjatr/Z0GN7moliDxxUwHf0s3ZS4SVkUEK7Jp7hNq2wjykQgnNVECxJ Hpzj3tVyJc6VXVI7GhZoXi7cTrNNlWVl/i6VI9HKLPOZvis7fJ3li4Ml9FdhBLI9f0g3 ON2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to:references :mime-version:content-transfer-encoding:date:message-id; bh=bjUQsS/JhsFJ7A/LEPaXRPi0vFqsL9QJaIIVS2Ss/90=; b=kJnDphYOLhbRXN3JeolFH6C2L5iQ8T4QI0+SLY1JF5EkrwA7zm+2eomcuesN7p3X0n j2TeGh4g5rplkNTGvQJiwBm0oJ1W1suQ0dKkNOBLOLys9hMC0lS2V0yOjEYp9H+sfS3P /1HIN5FK9SJho0EuFP2TGe49rRyv5Klt69XoxCAUClxWrsOxxMwPEzwMlcylMAEf2lar ILUTTNv5lsF3xO6QuJ2m14HxP11fzTa59bSvUKnv2cQQGn3WiyW2SEVhZSu3YESpLme7 PYoS4V3KyDfRAuCLLlxEhA2qCqxY7FB2NT2jmlsXqy4ghe2s2i+S36VAj+y7P4XBCsVP 8HNQ== X-Gm-Message-State: AOAM5312rjyn0M6zofW0yD+PIimuLdzaG4G4zVp5Ok/Wl4s/kpqBxrGs 2Qb+cj1Vob4fFcp1Y7RcIa+BPw== X-Google-Smtp-Source: ABdhPJxCgJ9x/5CETUThrOojYpCRJrGzCCvX3YPSssc5SYgXzwTMWfby3XSl47MeFHr0w5mPqjdE1w== X-Received: by 2002:a05:6214:c26:: with SMTP id a6mr25938114qvd.40.1634569820662; Mon, 18 Oct 2021 08:10:20 -0700 (PDT) Received: from turing-police ([2601:5c0:c380:d61::359]) by smtp.gmail.com with ESMTPSA id p22sm6308436qtl.83.2021.10.18.08.10.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Oct 2021 08:10:20 -0700 (PDT) From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" X-Google-Original-From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" X-Mailer: exmh version 2.10.0-pre 07/05/2021 with nmh-1.7+dev To: Dongliang Mu Subject: Re: Any tracing mechanism can track the executed instructions of a user process in the kernel? In-Reply-To: References: Mime-Version: 1.0 Date: Mon, 18 Oct 2021 11:10:19 -0400 Message-ID: <104502.1634569819@turing-police> Cc: kernelnewbies , Greg KH , linux-kernel , Pavel Skripkin , FMDF , Dan Carpenter X-BeenThere: kernelnewbies@kernelnewbies.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Learn about the Linux kernel List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============3138499108892576771==" Errors-To: kernelnewbies-bounces@kernelnewbies.org --===============3138499108892576771== Content-Type: multipart/signed; boundary="==_Exmh_1634569818_99419P"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit --==_Exmh_1634569818_99419P Content-Type: text/plain; charset=us-ascii On Mon, 18 Oct 2021 16:41:14 +0800, Dongliang Mu said: > I want to log all the executed instructions of a user process (e.g., > poc.c in syzkaller) in the kernel mode and then would like to leverage > backward analysis to capture the root cause of kernel panic/crash. > Therefore, I need the instruction-level tracing mechanisms or tools. Tracing just the instructions won't get you where you want to be if you're going through this approach. You *also* need to track all the data - the instruction path inside two different runs of syzkaller may be essentially identical, but pass 2 different values as the 3rd parameter of a syscall. You may also have to deal with insane amounts of data - the actual error could have been minutes or even hours before, or the interaction between two different processes. You probably want to take a *really* close look at how prof and friends avoid infinite regress when code execution drops inside the prof code, because you're going to hit the same issues. Or.... You can work smarter rather than harder, and ask yourself what's the minimum amount and type of additional information to make a significant improvement in the debugging of system crashes. For example, 95% of the time, you can figure out what the bug is by merely looking at the stack traceback. For most of the rest of the cases, simply capturing the parameter values from the syscall and the basic info for page faults and other interrupts is probably sufficient, and you can probably leverage the audit subsystem for most of that. It can already record syscall parameters, while logging page faults and other interrupts can probably be done with prof. At that point, you don't actually *need* every instruction - only tracing branch and call instructions is sufficient, because you already know that each instruction between the target of a branch/call and the next branch/call will be executed. Similarly, the lockdep code will catch most locking issues. But it won't flag issues with data that should be protected with a lock, but are bereft of any locking. So ask yourself: What ways are there to analyze the code and detect critical sections prone to race conditions? Is there a sparse-on-steroids approach that wil do the heavy lifting for those? (Note that this isn't an easy task for the general case, but identifying two or three specific common patterns and finding a way to detect them may be worthwhile) And many of the rest of crashes are timing related, and "let's trace every single instruction" is almost guaranteed to make things slow enough to change/bypass the timing issue. So... What's left that would be the most helpful with the least amount of data? Go look at some threads on linux-kernel. Look at the kernel bugs that were the result of a Homer Simpson "D'oh!" moment. What can we do to make those bugs less likely to make it into the code in the first place? For the more subtle bugs, what data finally made the debugging come together? --==_Exmh_1634569818_99419P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Comment: Exmh version 2.9.0 11/07/2018 iQEcBAEBCAAGBQJhbY5aAAoJEI0DS38y7CIc4OkH/ROXT+P1YCrstg2hlsp5PFd8 L23nVJEO3BM+1vp7B/GLKZ/cGuRc8vdNWRWOnvAnCPFzlk/DfSUSkL8f2qaI12qB uDqzhjGhaAnoSbQuiwWlaqYUMIklF0LFui7r1kMHWG7ZjjoFR07cdRkOykhZrD1D NfLZ/P6v4ySQeGCUa/QcYKWHMLaHzNFdSytS/4LKtB6UCbnNy/zLc3x/LFw1LtVj ySBkjIUSFWsV/KVdg7mE8ADm6u8vMFe93W6eQUNUMPHay1Nsj7pne/Db+5wCwDSU M8krCrGeHesN8EaQJvgKbj5oXXoCbU35ViLQ7+ih1/nQ4iGAkc0YJfG91IDSduM= =Y//A -----END PGP SIGNATURE----- --==_Exmh_1634569818_99419P-- --===============3138499108892576771== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies --===============3138499108892576771==-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 197ACC433EF for ; Mon, 18 Oct 2021 15:10:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00875610A3 for ; Mon, 18 Oct 2021 15:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232538AbhJRPMt (ORCPT ); Mon, 18 Oct 2021 11:12:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232890AbhJRPMf (ORCPT ); Mon, 18 Oct 2021 11:12:35 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8615AC06161C for ; Mon, 18 Oct 2021 08:10:21 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id d20so10400383qvm.8 for ; Mon, 18 Oct 2021 08:10:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vt-edu.20210112.gappssmtp.com; s=20210112; h=sender:from:to:cc:subject:in-reply-to:references:mime-version :content-transfer-encoding:date:message-id; bh=bjUQsS/JhsFJ7A/LEPaXRPi0vFqsL9QJaIIVS2Ss/90=; b=tQE6cJE0pnAsHfyrGzoPORr9usUSs4bYLfxYaZXCWEWCfUev0+FAX9wWRn2z3hxFiJ IMRvu72Udfno46/tccNewEb3SPOZKyD93p2DdD2U5M/phSeC8Xs2tc6sXwyE9A6pzLaW g093t8LCJRNcUhvVFUM7JjwL0ldbok/F3KIdCnr/JhbDz0dPUtPP/jARynNCS7a4rmBJ JbD+xOYf4nJxoyjatr/Z0GN7moliDxxUwHf0s3ZS4SVkUEK7Jp7hNq2wjykQgnNVECxJ Hpzj3tVyJc6VXVI7GhZoXi7cTrNNlWVl/i6VI9HKLPOZvis7fJ3li4Ml9FdhBLI9f0g3 ON2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to:references :mime-version:content-transfer-encoding:date:message-id; bh=bjUQsS/JhsFJ7A/LEPaXRPi0vFqsL9QJaIIVS2Ss/90=; b=swmYKP4RPK50CXXUTCFQKOocS6BTONpWb1DcTYHCimSrJ93wN30D7wU7fYdwqQaJ5O hfDYMxSwlQNV0jQziL+aJZVruElv3hJ+E3lG/rIhw3rHZxDFxdqCUZ9ZsxDZItM+j+a2 etLIebLI6bcSLY40s3oL/gq6Hi5J0Ij62rBtPXHS2c1b1DzCTBwHc0qQDf7zOn+RUXxU 9EC19eGZefTllxB8PBLqXkxQSBCD8Pcyx+NqtKOOSR0a1iOVcUTzZMMb1EMmOTr5gOAT GaDWNjSJ/gYH0wr2Dz4xYDZSQI3F+nd8rvGJA91cJCNkIxV5rCaXajjarUvK0zSWWqgW RZuA== X-Gm-Message-State: AOAM533dADl4pV0j5HaDzDqWo5K2owTcI9sPrIuStbQsxhTuEpdqR2Zm H7EWXX+Y5bkm0/x6G0WHwi4cNgQGNC9tSQ== X-Google-Smtp-Source: ABdhPJxCgJ9x/5CETUThrOojYpCRJrGzCCvX3YPSssc5SYgXzwTMWfby3XSl47MeFHr0w5mPqjdE1w== X-Received: by 2002:a05:6214:c26:: with SMTP id a6mr25938114qvd.40.1634569820662; Mon, 18 Oct 2021 08:10:20 -0700 (PDT) Received: from turing-police ([2601:5c0:c380:d61::359]) by smtp.gmail.com with ESMTPSA id p22sm6308436qtl.83.2021.10.18.08.10.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Oct 2021 08:10:20 -0700 (PDT) Sender: Valdis Kletnieks From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" X-Google-Original-From: "Valdis Kl=?utf-8?Q?=c4=93?=tnieks" X-Mailer: exmh version 2.10.0-pre 07/05/2021 with nmh-1.7+dev To: Dongliang Mu Cc: FMDF , kernelnewbies , linux-kernel , Greg KH , Dan Carpenter , Pavel Skripkin Subject: Re: Any tracing mechanism can track the executed instructions of a user process in the kernel? In-Reply-To: References: Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1634569818_99419P"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Mon, 18 Oct 2021 11:10:19 -0400 Message-ID: <104502.1634569819@turing-police> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --==_Exmh_1634569818_99419P Content-Type: text/plain; charset=us-ascii On Mon, 18 Oct 2021 16:41:14 +0800, Dongliang Mu said: > I want to log all the executed instructions of a user process (e.g., > poc.c in syzkaller) in the kernel mode and then would like to leverage > backward analysis to capture the root cause of kernel panic/crash. > Therefore, I need the instruction-level tracing mechanisms or tools. Tracing just the instructions won't get you where you want to be if you're going through this approach. You *also* need to track all the data - the instruction path inside two different runs of syzkaller may be essentially identical, but pass 2 different values as the 3rd parameter of a syscall. You may also have to deal with insane amounts of data - the actual error could have been minutes or even hours before, or the interaction between two different processes. You probably want to take a *really* close look at how prof and friends avoid infinite regress when code execution drops inside the prof code, because you're going to hit the same issues. Or.... You can work smarter rather than harder, and ask yourself what's the minimum amount and type of additional information to make a significant improvement in the debugging of system crashes. For example, 95% of the time, you can figure out what the bug is by merely looking at the stack traceback. For most of the rest of the cases, simply capturing the parameter values from the syscall and the basic info for page faults and other interrupts is probably sufficient, and you can probably leverage the audit subsystem for most of that. It can already record syscall parameters, while logging page faults and other interrupts can probably be done with prof. At that point, you don't actually *need* every instruction - only tracing branch and call instructions is sufficient, because you already know that each instruction between the target of a branch/call and the next branch/call will be executed. Similarly, the lockdep code will catch most locking issues. But it won't flag issues with data that should be protected with a lock, but are bereft of any locking. So ask yourself: What ways are there to analyze the code and detect critical sections prone to race conditions? Is there a sparse-on-steroids approach that wil do the heavy lifting for those? (Note that this isn't an easy task for the general case, but identifying two or three specific common patterns and finding a way to detect them may be worthwhile) And many of the rest of crashes are timing related, and "let's trace every single instruction" is almost guaranteed to make things slow enough to change/bypass the timing issue. So... What's left that would be the most helpful with the least amount of data? Go look at some threads on linux-kernel. Look at the kernel bugs that were the result of a Homer Simpson "D'oh!" moment. What can we do to make those bugs less likely to make it into the code in the first place? For the more subtle bugs, what data finally made the debugging come together? --==_Exmh_1634569818_99419P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Comment: Exmh version 2.9.0 11/07/2018 iQEcBAEBCAAGBQJhbY5aAAoJEI0DS38y7CIc4OkH/ROXT+P1YCrstg2hlsp5PFd8 L23nVJEO3BM+1vp7B/GLKZ/cGuRc8vdNWRWOnvAnCPFzlk/DfSUSkL8f2qaI12qB uDqzhjGhaAnoSbQuiwWlaqYUMIklF0LFui7r1kMHWG7ZjjoFR07cdRkOykhZrD1D NfLZ/P6v4ySQeGCUa/QcYKWHMLaHzNFdSytS/4LKtB6UCbnNy/zLc3x/LFw1LtVj ySBkjIUSFWsV/KVdg7mE8ADm6u8vMFe93W6eQUNUMPHay1Nsj7pne/Db+5wCwDSU M8krCrGeHesN8EaQJvgKbj5oXXoCbU35ViLQ7+ih1/nQ4iGAkc0YJfG91IDSduM= =Y//A -----END PGP SIGNATURE----- --==_Exmh_1634569818_99419P--