From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E557EB64DD for ; Thu, 27 Jul 2023 22:59:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229764AbjG0W7E (ORCPT ); Thu, 27 Jul 2023 18:59:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32826 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229621AbjG0W7D (ORCPT ); Thu, 27 Jul 2023 18:59:03 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1629C94; Thu, 27 Jul 2023 15:59:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 571AF61F78; Thu, 27 Jul 2023 22:59:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D062C433C7; Thu, 27 Jul 2023 22:59:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690498740; bh=7YZbeoQqfKNEx+jwMo1/b4aGKcS1iuXyOx+y/zN5Zzo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hJ6iB/tYWfVsHajByyC28Q7EemdgjIMF6uLfu9+WgDtg2ewJu2I/4RuqBensexXAf FDDEk9ibYhSToWf2z2KL+m6ua2Eid74tC26XQxdAGdDyuhJauZOYrO+rSPxa76G3C8 JM8crmumVVgBtDC38fV2RDMVcxBvwdD45lwBhEddEORux7bg7/o6DgRHlk1qTcLQfL DpzfQqSxW2XolqDRG8BB/4q1ZDERYghJA4A7XHyfwQH8iDKwXyZzLYDZHNAIw2TPal V0qSx/dLBn64aNDLJOH0GqdY6JJLnw9NBtfFrhnafz2W+ZNJQz+4SP8SRFHsnnsNW6 qwdkUtESktMGA== Date: Thu, 27 Jul 2023 15:58:59 -0700 From: "Darrick J. Wong" To: Luis Chamberlain Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Pankaj Raghav , Chandan Babu R Subject: Re: xfs kdevops baseline for next-20230725 Message-ID: <20230727225859.GF11352@frogsfrogsfrogs> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Jul 27, 2023 at 11:21:35AM -0700, Luis Chamberlain wrote: > I'd like to see if this is useful so feedback is welcomed. > > I recently had a reason to establish a baseline for XFS as we start > testing some new fatures we've been working on to ensure we don't create > regressions. I've been using kdevops for this work, its on github [0] and > on gitlab for those that prefer that [1] and tested against linux-next > tag next-20230725, with its respective generic kernel configuration > which has evolved over time for kdevops which let's us test with kdevops > with qemu / virtualbox / all cloud providers [2]. We fork fstests [3] so > have a small delta, mostly reverts to help stability on testing as > Chandan found regressions in some new fstest changes. > > [0] https://github.com/linux-kdevops/kdevops.git > [1] https://gitlab.com/linux-kdevops/kdevops.git > [2] https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/bootlinux/templates/config-next-20230725 > [3] https://github.com/linux-kdevops/fstests > > The sections tested for are: > > xfs_crc > xfs_reflink > xfs_reflink-normapbt > xfs_reflink_1024 > xfs_reflink_2k > xfs_reflink_4k > xfs_nocrc > xfs_nocrc_512 > xfs_nocrc_1k > xfs_nocrc_2k > xfs_nocrc_4k > xfs_logdev > xfs_rtdev > xfs_rtlogdev Question: Have you turned on gcov to determine how much of fs/xfs/ and fs/iomap/ are actually getting exercised by these configurations? I have for my fstests fleet; it's about ~90% for iomap, ~87% for xfs/libxfs, ~84% for the pagecache, and ~80% for xfs/scrub. Was wondering what everyone else got on the test. --D > You can see what these sections represent in terms of xfs here: > > https://github.com/linux-kdevops/kdevops/blob/master/playbooks/roles/fstests/templates/xfs/xfs.config > > The first order of business before even considering a set of changes is > getting baseline and building a high confidence in that baseline. We had > a technical debt as it's been a while before we get to establish and > publish a baseline with high confidence for XFS for linux-next. Hopefully this > will help us keep it moving forward. > > The kdevops configuration used for this can be found here: > > https://github.com/linux-kdevops/kdevops/blob/master/workflows/fstests/results/mcgrof/libvirt-qemu/20230727/kdevops.config > > Worth noting is that virtio drives are used instead of NVMe since virtio > supports io-threads, and so we get less NVMe timouts on the guest which > have proven to cause major false positives for testing for a while as we > have seen on the stable testing. I'll go ahead and make virtio the > default for qemu configurations now. We expect to move back to nvme once > distros pick up release of qemu with io-thread support. > > This is useful truncated / sparsefiles files with loopback file strategy > documented here: > > https://github.com/linux-kdevops/kdevops/blob/master/docs/testing-with-loopback.md > > I'll soon re-test with real NVMe drives though as kdevops now has > support for using them and I have some basic tests with PCIe passthrough > (which kdevops also enables with 'make dynconfig'). > > For now I've just ran one full set of fstests, ie, the confidence is rather > low for my preference. After publishing this I will the tests against one > week's worth of testing to build confidence up to 100 tests. We'll see > if some other tests fail with a lower failure rate after that. > > But for now I figured I'd publish preliminary results on the first run. > Some failures seem like test bugs. Some other failures are likely real and > require investigation. > > Often we just commit into kdevops test results / expunges, but this is > the first time publishing actual results on the mailing list. The commit > logs detail the methodology to collect things results and go step by > step so to help others who may want to try to start baselines with other > filesystems, etc. > > You are more than welcomed to also contribute testing and your own > results in your own kdevops namespace, the more we have the better > (within reason of course). > > The tests found to be common in at least 2 secions go in the all.txt > expunge list. Since at LSFMM we've been requested to store results > this set of results go with results archived in XZ format and > demonstrate how to list files in it, and also get results for failures > out. I provide a simple super cursory review of the test failures as well. > > The test bugs seem related to quotes, but it's not clear to me why > this wasn't detected in other tests before. > > What I'd like to know, is if this email is useful to the XFS development > community. Should we strive to do this more often? > > Here are failures found in at least more than one section: > > cat workflows/fstests/expunges/6.5.0-rc3-next-20230725/xfs/unassigned/all.txt > > # lazy baseline entries are failures found at least once on multiple XFS test > # sections. To see the actual *.bad files and *.dmesg files you can use: > # > # tar -tOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz > # > # For example to see all generic/175 failures: > # > # tar -tOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 2>&1 | grep generic | grep 175 > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/generic/175.out.bad > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/generic/175.full > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/generic/175.dmesg > # 6.5.0-rc3-next-20230725/xfs_reflink/generic/175.out.bad > # 6.5.0-rc3-next-20230725/xfs_reflink/generic/175.full > # 6.5.0-rc3-next-20230725/xfs_reflink/generic/175.dmesg > # 6.5.0-rc3-next-20230725/xfs_reflink_4k/generic/175.out.bad > # 6.5.0-rc3-next-20230725/xfs_reflink_4k/generic/175.full > # 6.5.0-rc3-next-20230725/xfs_reflink_4k/generic/175.dmesg > # 6.5.0-rc3-next-20230725/xfs_rtdev/generic/175.out.bad > # 6.5.0-rc3-next-20230725/xfs_rtdev/generic/175.full > # 6.5.0-rc3-next-20230725/xfs_rtdev/generic/175.dmesg > # > # And now to see one individual file: > # > # tar -xOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/generic/175.out.bad > # tar -xOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/generic/175.dmesg > # > generic/175 # seems like a test bug - lazy baseline - failure found in at least two sections > generic/297 # seems like a test bug - lazy baseline - failure found in at least two sections > generic/298 # seems like a test bug - lazy baseline - failure found in at least two sections > generic/471 # race against loop? - lazy baseline - failure found in at least two sections > generic/563 # needs investigation - lazy baseline - failure found in at least two sections > xfs/157 # needs investigation - lazy baseline - failure found in at least two sections > xfs/188 # unclear - lazy baseline - failure found in at least two sections > xfs/205 # unclear - lazy baseline - failure found in at least two sections > xfs/432 # test bug: blocksize should detect sector size - lazy baseline - failure found in at least two sections > xfs/506 # needs investigation - lazy baseline - failure found in at least two sections > xfs/516 # needs investigation - lazy baseline - failure found in at least two sections > > Here are failures found in just the test section which enables reflinks > but disables rmapbt: > > cat workflows/fstests/expunges/6.5.0-rc3-next-20230725/xfs/unassigned/xfs_reflink_normapbt.txt > > # For exmaple to see these failures: > # > # tar -tOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 2>&1 | grep xfs | grep normap | grep 301 > # > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/xfs/301.out.bad > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/xfs/301.full > # 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/xfs/301.dmesg > > # To see one file output: > # tar -xOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 6.5.0-rc3-next-20230725/xfs_reflink_normapbt/xfs/301.out.bad > xfs/301 # needs investigation > > And here are failures found only on the realtime device, most are likely > test bugs which means we gotta enhance the test to skip the realtime > device or learn to use it, but some seem like real failures: > > cat workflows/fstests/expunges/6.5.0-rc3-next-20230725/xfs/unassigned/xfs_rtdev.txt > > # For example to see rtdev's generic/012 related files: > # > # tar -tOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 2>&1 | grep xfs | grep rtdev | grep 012 > # > # To see the generic/012 out.bad file: > # tar -xOJf workflows/fstests/results/mcgrof/libvirt-qemu/20230727/6.5.0-rc3-next-20230725.xz 6.5.0-rc3-next-20230725/xfs_rtdev/generic/012.out.bad > # > generic/012 # needs investigation > generic/013 # needs investigation > generic/015 # might be a test bug > generic/016 # needs investigation > generic/021 # needs investigation > generic/022 # needs investigation > generic/027 # might be a test bug > generic/058 # needs investigation > generic/060 # needs investigation > generic/061 # needs investigation > generic/063 # needs investigation > generic/074 # needs investigation > generic/075 # needs investigation > generic/077 # might be a test bug > generic/096 # might be a test bug > generic/102 # might be a test bug > generic/112 # needs investigation > generic/113 # needs investigation > generic/171 # might be a test bug > generic/172 # might be a test bug > generic/173 # might be a test bug > generic/174 # might be a test bug > generic/204 # might be a test bug > generic/224 # might be a test bug > generic/226 # might be a test bug > generic/251 # ran out of space and then corruption? > generic/256 # might be a test bug > generic/269 # might be a test bug > generic/270 # might be a test bug > generic/273 # might be a test bug > generic/274 # might be a test bug > generic/275 # might be a test bug > generic/300 # might be a test bug > generic/312 # might be a test bug > generic/361 # might be a test bug > generic/371 # might be a test bug > generic/416 # might be a test bug > generic/427 # might be a test bug > generic/449 # might be a test bug > generic/488 # might be a test bug > generic/511 # might be a test bug > generic/515 # might be a test bug > generic/520 # needs investigation > generic/551 # needs investigation > generic/558 # might be a test bug > generic/562 # never completed > > Luis