* [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework @ 2024-03-19 17:11 Chandra Pratap 2024-03-20 13:17 ` Christian Couder 0 siblings, 1 reply; 8+ messages in thread From: Chandra Pratap @ 2024-03-19 17:11 UTC (permalink / raw) To: git; +Cc: christian.couder, karthik.188, ps, kaartic.sivaraam This is my project proposal for the Google Summer of Code 2024 program. The document version of this proposal can be accessed through this link: https://shorturl.at/ijrTU ---------<8----------<8----------<8----------<8----------<8----------<8 Personal Info ------------- Full name: Chandra Pratap Preferred name: Chand E-mail: chandrapratap3519@gmail.com Phone: (+91)77618-24030 Education: SV National Institute of Technology, Surat, India Year: Sophomore (2nd) Major: Mathematics GitHub: https://github.com/Chand-ra Before GSoC ----------- -----Synopsis----- A new unit testing framework was introduced to the Git mailing list last year with the aim of simplifying testing and improving maintainability. The idea was accepted and merged into master on 09/11/2023. This project aims to extend that work by moving more tests from the current setup to the new unit testing framework. The SoC 2024 Ideas page (link: https://git.github.io/SoC-2019-Ideas/) mentions reftable unit tests migration as a separate project from the general unit test migration project, however, I propose migrating other tests alongside the reftable unit tests as a part of this proposal. The reasoning behind this is explained further down. The difficulty for the project should be medium and it should take somewhat between 175 to 350 hours. -----Contributions----- • apply.c: make git apply respect core.fileMode settings -> Status: merged into master -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@pobox.com/ -> Description: When applying a patch that adds an executable file, git apply ignores the core.fileMode setting (core.fileMode specifies whether the executable bit on files in the working tree are to be honored or not) resulting in false warnings. Fix this by inferring the correct file mode from the existing index entry when core.filemode is set to false. Add a test case that verifies the change and prevents future regression. -> Remarks: This was the first patch I worked on as a contributor to Git. Served me as an essential intro lesson to the community’s working flow and general practices. • tests: Move t0009-prio-queue.sh to the unit testing framework -> Status: merged into master -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/ -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c unit test Git's implementation of a priority queue. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. -> Remarks: Perhaps the most relevant patch of all the ones mentioned here, this patch helped me understand the expectations and workflow for the work to be performed in this project. • write-or-die: make GIT_FLUSH a Boolean environment variable -> Status: merged into master -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com/ -> Description: Among Git's environment variable, the ones marked as "Boolean" accept values like Boolean configuration variables, i.e. values like 'yes', 'on', 'true' and positive numbers are taken as "on" and values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH accept more values besides '0' and '1' by turning it into a Boolean environment variable & update the related documentation. • sideband.c: remoye redundant NEEDSWORK tag -> Status: merged into master -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com/ -> Description: Replace a misleading NEEDSWORK tag in sideband.c that reads ‘replace int with size_t’ with another comment explaining why it is fine to use ‘int’ and the replacement isn’t necessary. • make tig callable from PowerShell/Command Prompt -> Status: merged into main -> link: https://github.com/git-for-windows/MINGW-packages/pull/104 -> Description: Tig is a text mode interface for Git that ships with the standard Git for Windows package but isn’t callable from PowerShell/ Command Prompt by default. Fix this by updating the relevant Makefiles and resource scripts. • fix broken link on Git for Windows’ GitHub wiki -> Status: merged -> link: https://github.com/git-for-windows/git/wiki/Home/_history -> Remarks: A simple fix for a broken link that I stumbled upon while browsing Git for Windows’ wiki looking for some help with the patch mentioned just before this one. • t4129: prevent loss of exit codes due to the use of pipes -> Status: merged into master -> link: https://lore.kernel.org/git/20220311132141.1817-1-shaoxuan.yuan02@gmail.com/ -> Description: Piping the output of git commands like git-ls-files to another command (grep in this case) in t4129 hides the exit code returned by these commands. Prevent this by storing the output of git-ls-files to a temporary file and then "grep-ping" from that file. Replace grep with test_grep as the latter is more verbose when it fails. • t9146: replace test -d/-f with appropriate test_path_is_* function -> Status: merged into master -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@gmail.com/ -> Description: The helper functions test_path_is_* provide better debugging information than test -d/-e/-f. Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls. • regex: update relevant files in compat/regex -> Status: WIP -> link: https://github.com/gitgitgadget/git/pull/1641 -> Description: The RegEx code in compat/regex has been vendored from gawk and was last updated in 2010. This may lead to performance issues like high CPU usage. Fix this by synchronizing the relevant files in compat/regex with the latest version from GNULib and then replaying any changes we made to gawk’s version on top of the new files. -> Remarks: When I started working on this patch, I thought it was an easy fix but the work turned out to be more involved than I anticipated. I had to seek help from the other community members, and we have made some good progress, but there is still a lot of cleaning and changes that need to be done. I haven’t found enough time to commit to this again, but it’s surely something that I want to get done soon. • tests: Move t0032-reftable-unittest.sh to the unit testing framework -> Status: WIP -> link: https://github.com/gitgitgadget/git/pull/1698 -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c unit test Git’s reftable framework. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. -> Remarks: An infant version as of now, I tinkered with this after seeing the project list on 'Git SoC 2024 Ideas' page to get an idea of the kind of work that will be involved in this project. -----Related Work----- Prior works about the idea have been performed by other community members and previous interns which form a good guiding path for my own approach. Some previous example work: i) Port helper/test-ctype.c to unit-tests/t-ctype.c -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@gmail.com/ ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@gmail.com/ iii) Port helper/test-date.c to unit-tests/t-date.c -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@gmail.com/ iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@gmail.com/ v) Integrate a simple strbuf unit test with Git's Makefiles -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@google.com/ vi) t0080: turn t-basic unit test into a helper -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@google.com/ In GSoC ------- -----Plan----- -> Reftable tests: The reftable tests are different from other tests in the test directory because they perform unit testing with the help of a custom test framework rather than the usual ‘helper file + shell script’ combination. Reftable tests do have a helper file and a shell script invoking the helper file, but rather than performing the tests, this combination is used to invoke tests defined in the reftable directory. The reftable directory consists of nine tests: • basics test • record test • block test • tree test • pq test • stack test • merged test • refname test • read-write test Each of these tests is written in C using a custom reftable testing framework defined by reftable/test_framework (also written in C). The framework has four major features utilized in performing the tests: • EXPECT_ERR(c): A function-like macro that takes as input an integer ‘c’ (generally the return value of a function call), compares it against 0 and spits an error message if equality doesn’t hold. The error message itself contains information about the file where this macro was used, the line in this file where the macro was called and the error code ‘c’ causing the error. • EXPECT_STREQ(a, b): A function-like macro that takes as input two strings ‘a’ and ‘b’, compares them for equality via strcmp() and throws an error if equality doesn’t hold. The error message thrown contains information regarding the file where this macro was invoked, the line in this file where the macro was called and the mismatched strings ‘a’ and ‘b’. • EXPECT(c): A function-like macro that takes as input an integer ‘c’ (generally the result of a Boolean expression like a == b) and throws an error message if c == 0. The error message is similar to EXPECT_ERR(c). • RUN_TEST(f): A function-like macro that takes as input the name of a function ‘f’ (a test function that exercises a part of reftable’s code), prints to stdout the message ‘running f’ and then calls the function with f(). Other than these, the framework consists of two additional functions, set_test_hash() and strbuf_add_void() which are used exclusively in the stack tests and refname tests respectively. Since the reftable test framework is written in C like the unit testing framework, we can create a direct translation of the features mentioned above using the existing tools in the unit testing framework with the following plan: • EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0). • EXPECT_STREQ(a, b): Can be replaced by check_str(a, b). • EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR. E.g. expect(a >= b) --> check_int(a, “>=”, b) • RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”). The information contained in the diagnostic messages of these macros is replicated in the unit testing framework by default. Any additional information can be displayed using the test_msg() functionality in the framework. The additional functions set_test_hash() and strbuf_add_void() may be moved to the source files for stack and refname respectively. The plan itself is basic and might need some modifications, but using the above plan, I have already created a working albeit primitive copy for two out of the nine tests (basics test and tree test) as can be seen here: (link: https://github.com/gitgitgadget/git/pull/1698) -> Other tests: As is already mentioned, the rest of the tests in the test directory use the combination of a helper file (written in C) and a shell script that invokes the said helper file. I will use my work from the patch ‘tests: Move t0009-prio-queue.sh to the unit testing framework’ (link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/) to explain the steps involved in the porting of such tests: • Search for a suitable test to port: As Christian Couder mentioned in this mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@mail.gmail.com/), there exists a subset of t/helper worth porting and we need some sort of classification to discern these. All helper files contain a cmd__foo() function which acts as the entry point for that helper tool. For example, the helper/test-prio-queue.c file contained cmd__prio_queue() which served as the entry point for the prio-queue tool. This function is then mapped to a different name by helper/test-tool.c which is used by the ‘*.sh’ files to perform tests. Continuing the prior example, cmd__prio_queue() had been mapped to “prio-queue” by test-tool.c and t0009-prio-queue.sh invoked it like “prio-queue 1 2 get 3 dump”. To classify what among t/helper should be targeted first in this project we can use something like ‘git grep foo’ (where foo is the alias for cmd__foo()) to look at the instances where the helper tool is invoked. The ones appearing lesser in different test scripts are the ones most likely to be used solely for unit testing and should probably be targeted first. Utilising this strategy, I discovered that the ‘prio-queue’ tool was only used in t0009-prio-queue.sh and hence, was a good candidate for the unit testing framework. Note that this strategy is not full-proof and further investigation is absolutely required on a per-test basis, it is only meant to give an initial idea of what’s worth investigating. • Create a new C test file in t/unit-tests: After finding a test appropriate for the migration efforts, we create a new ‘*.c’ file in t/unit-tests. The test file must be named appropriately to reflect the nature of the tests it is supposed to perform. Most of the times, replacing ‘tXXXX’ with ‘t-‘ and ‘*.sh’ with ‘.c’ in the name of the test script suffices. E.g. t/t0009-prio-queue.sh turns to t/unit-tests/t-prio-queue.c. The new C file must #include “test-lib.h” (to be able to use the unit testing framework) and other necessary headers files. • Move the code from the helper file: Since the helper files are written in C, this step is mostly a ‘copy-paste then rename’ job. Changes similar to the following also need to be made in the Makefile: - TEST_BUILTINS_OBJS += test-prio-queue.o + UNIT_TEST_PROGRAMS += t-prio-queue • Translate the shell script: The trickiest part of the plan, since different test scripts perform various functions, and a direct translation of the scripts to C is not always optimal. Continuing the prior example, t0009-prio-queue.sh used a single pattern for testing, write expected output to a temporary file (named ‘expect’) -> feed input to the ‘prio-queue’ helper tool -> dump its output to another temporary file (named ‘actual’) -> compare the two files (‘actual’ vs ‘expect’). In the first iteration of my prio-queue patch, I worked out a straightforward translation of this pattern in C. I stored the input in a string buffer, passed that buffer to the test function and stored its output in another buffer, and then called memcmp() on these two buffers. While this did prove to be a working copy, this work was found to be inadequate on the mailing list. Through the next several iterations, I reworked the logic several times, like comparing the input and output on-the-go rather than using buffers and replacing strings with macro definitions. The test scripts similarly perform other functions like checking for prerequisites, creating commits, initializing repositories, changing or creating directories and so forth, and custom logic is required in most of the cases of translating these, as seen above. • Run the resulting test, correct any errors: It is rare for any migrated test to work correctly on the first run. This step involves resolving any compile/runtime errors arising from the test and making sure that at the very minimum, all the test-cases of the original test are replicated in the new work. Improvements upon the original can also be made, for example, the original t0009-prio-queue.sh did not exercise the reverse stack functionality of prio-queue, which I improved upon in unit-tests/t-prio-queue. • Send the resulting patch to the mailing list, respond to the feedback: This step involves writing a meaningful commit message explaining each patch in the series. From my experience contributing to the Git project, I find it to be rare for any patch series to be accepted in the very first iteration. Feedback from the community is vital for the refinement of any patch and must be addressed by performing the suggested changes and sending the work back to the mailing list. This must be repeated until the work is merged into ‘seen’, ‘next’ and further down, ‘master’. Timeline -------- I’m confident that I can start the project as early as the start of the Community Bonding Period (May 1 - 26). This is because I have read the related documentation and already have some experience with the idea. I believe I’ll be ready to get up to speed to work on the project by then. The exact time arrangement for each test is variable and hard to determine, but judging from the fact that it took me 3-4 days to complete the first iteration of the t-prio-queue work, here is a proposed migration schedule: • Reftable tests: If my current work from 'tests: Move t0032-reftable-unittest.sh to the unit testing framework' is found satisfactory, there are 7 tests left that need to be ported to the unit testing framework. Assuming it takes 2-3 days to port one test, I should be done with the first patch series for the reftable tests in about 2-3 weeks. From there, it’s a matter of responding to the feedback from the mailing list, which can deceptively take longer than expected. For instance, I had to continue polishing my t-prio-queue patch for about two weeks after the feedback on the first iteration. • Other tests: The time required to port these tests is highly variable and depends mostly upon the work required in translating the shell script. As mentioned previously, it took me 3-4 days to complete the first iteration of the test-prio-queue migration patch, and that was a short test with only about 50 or so lines of shell scripting and all the test cases following a single pattern. Considering all this, I believe it should be possible, on average, to migrate a new test in 5-8 days. Hence, it should be possible for me to migrate >=7 tests along with the reftable tests throughout the duration of this project. Availability ------------ My summertime is reserved for GSoC, so I expect that I can work on a new test 5 days per week, 6-8 hours per day, that is 30-40 hours a week. On the weekends, I would like to solely work on the feedback from the mailing list and advance my [WIP] patches. Certainly, something unexpected may arise, but I will try my best to adhere to this time commitment and be always available through the community’s mailing list. Post GSoC & Closing Remarks --------------------------- When I first started contributing to the Git project in October of 2023, I had no idea about programmes like GSoC. I was advised by a senior of mine to contribute to open-source projects and hence, my aim of contribution was to apply what I had learnt in college to solve real-world problems and learn from more experienced peers. However, most of what I have contributed to Git has been trivial owing to my lack of skills and inexperience with the project. Seeing how I need to do an internship in summer, with GSoC, I hope to be able to dedicate this internship time and effort to a cool project like Git while simultaneously learning skills to be able to make more useful contributions in the future. It’s two birds with one stone. I would also like to keep working on this project to see it to completion post-GSoC and help mentor other newcomers get started with the Git project. Thanks & Regards, Chandra ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework 2024-03-19 17:11 [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework Chandra Pratap @ 2024-03-20 13:17 ` Christian Couder 2024-03-20 19:35 ` Chandra Pratap 2024-03-21 12:27 ` Patrick Steinhardt 0 siblings, 2 replies; 8+ messages in thread From: Christian Couder @ 2024-03-20 13:17 UTC (permalink / raw) To: Chandra Pratap; +Cc: git, karthik.188, ps, kaartic.sivaraam On Tue, Mar 19, 2024 at 6:11 PM Chandra Pratap <chandrapratap3519@gmail.com> wrote: > > This is my project proposal for the Google Summer of Code 2024 program. > The document version of this proposal can be accessed through this link: > https://shorturl.at/ijrTU > > ---------<8----------<8----------<8----------<8----------<8----------<8 > > Personal Info > ------------- > > Full name: Chandra Pratap > Preferred name: Chand > > E-mail: chandrapratap3519@gmail.com > Phone: (+91)77618-24030 > > Education: SV National Institute of Technology, Surat, India > Year: Sophomore (2nd) > Major: Mathematics > > GitHub: https://github.com/Chand-ra > > Before GSoC > ----------- > > -----Synopsis----- > > A new unit testing framework was introduced to the Git mailing list last > year with the aim of simplifying testing and improving maintainability. > The idea was accepted and merged into master on 09/11/2023. This project > aims to extend that work by moving more tests from the current setup to > the new unit testing framework. > > The SoC 2024 Ideas page (link: https://git.github.io/SoC-2019-Ideas/) > mentions reftable unit tests migration as a separate project from the > general unit test migration project, however, I propose migrating other > tests alongside the reftable unit tests as a part of this proposal. It means that if we select your proposal, we cannot select someone else to work on either the "Move existing tests to a unit testing framework" project or the "Convert reftable unit tests to use the unit testing framework" project. I am not sure but I think that, after migrating all the reftable unit tests, I would prefer you working on other reftable related tasks rather than on more unit test migrations. > The reasoning behind this is explained further down. > The difficulty for the project should be medium and it should take > somewhat between 175 to 350 hours. > > -----Contributions----- > > • apply.c: make git apply respect core.fileMode settings > -> Status: merged into master > -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@pobox.com/ A link to (or the hash of) the commit that merged your patch into the master branch would be nice. > -> Description: When applying a patch that adds an executable file, git > apply ignores the core.fileMode setting (core.fileMode specifies whether > the executable bit on files in the working tree are to be honored or not) > resulting in false warnings. Fix this by inferring the correct file mode > from the existing index entry when core.filemode is set to false. Add a > test case that verifies the change and prevents future regression. > > -> Remarks: This was the first patch I worked on as a contributor to Git. > Served me as an essential intro lesson to the community’s working flow and > general practices. > > • tests: Move t0009-prio-queue.sh to the unit testing framework > -> Status: merged into master > -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/ I don't think it's a good idea to submit patches related to a project we propose before the GSoC actually starts. We might have been able to detect this earlier if you added [GSoC] to the patch titles. Also a link to (or the hash of) the commit that merged your patch into the master branch would be nice. > -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c > unit test Git's implementation of a priority queue. Migrate the test > over to the new unit testing framework to simplify debugging and reduce > test run-time. > > -> Remarks: Perhaps the most relevant patch of all the ones mentioned > here, this patch helped me understand the expectations and workflow for > the work to be performed in this project. > > • write-or-die: make GIT_FLUSH a Boolean environment variable > -> Status: merged into master > -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com/ Also a link to (or the hash of) the commit that merged your patch into the master branch would be nice. > -> Description: Among Git's environment variable, the ones marked as s/variable/variables/ > "Boolean" accept values like Boolean configuration variables, i.e. > values like 'yes', 'on', 'true' and positive numbers are taken as "on" > and values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH > accept more values besides '0' and '1' by turning it into a Boolean > environment variable & update the related documentation. > > • sideband.c: remoye redundant NEEDSWORK tag s/remoye/remove/ > -> Status: merged into master > -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com/ Also a link to (or the hash of) the commit that merged your patch into the master branch would be nice. > -> Description: Replace a misleading NEEDSWORK tag in sideband.c that > reads ‘replace int with size_t’ with another comment explaining why it > is fine to use ‘int’ and the replacement isn’t necessary. > > • make tig callable from PowerShell/Command Prompt > -> Status: merged into main > -> link: https://github.com/git-for-windows/MINGW-packages/pull/104 > > -> Description: Tig is a text mode interface for Git that ships with the > standard Git for Windows package but isn’t callable from PowerShell/ > Command Prompt by default. Fix this by updating the relevant Makefiles > and resource scripts. > > • fix broken link on Git for Windows’ GitHub wiki > -> Status: merged > -> link: https://github.com/git-for-windows/git/wiki/Home/_history > > -> Remarks: A simple fix for a broken link that I stumbled upon while > browsing Git for Windows’ wiki looking for some help with the patch > mentioned just before this one. > > • t4129: prevent loss of exit codes due to the use of pipes > -> Status: merged into master > -> link: https://lore.kernel.org/git/20220311132141.1817-1-shaoxuan.yuan02@gmail.com/ The link shows a patch from 2022 by Shaoxuan Yuan. Are you sure this is the right link? > -> Description: Piping the output of git commands like git-ls-files to > another command (grep in this case) in t4129 hides the exit code returned > by these commands. Prevent this by storing the output of git-ls-files to > a temporary file and then "grep-ping" from that file. Replace grep with > test_grep as the latter is more verbose when it fails. > > • t9146: replace test -d/-f with appropriate test_path_is_* function > -> Status: merged into master > -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@gmail.com/ Also a link to (or the hash of) the commit that merged your patch into the master branch would be nice. > -> Description: The helper functions test_path_is_* provide better debugging > information than test -d/-e/-f. > Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls. > > • regex: update relevant files in compat/regex > -> Status: WIP > -> link: https://github.com/gitgitgadget/git/pull/1641 This is a GitGitGadget patch. Please mention that. Same thing for some other contributions above that are not part of Git. > -> Description: The RegEx code in compat/regex has been vendored from > gawk and was last updated in 2010. This may lead to performance issues > like high CPU usage. Fix this by synchronizing the relevant files in > compat/regex with the latest version from GNULib and then replaying any > changes we made to gawk’s version on top of the new files. > > -> Remarks: When I started working on this patch, I thought it was an > easy fix but the work turned out to be more involved than I anticipated. > I had to seek help from the other community members, and we have made > some good progress, but there is still a lot of cleaning and changes that > need to be done. I haven’t found enough time to commit to this again, > but it’s surely something that I want to get done soon. > > • tests: Move t0032-reftable-unittest.sh to the unit testing framework > -> Status: WIP > -> link: https://github.com/gitgitgadget/git/pull/1698 > > -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c > unit test Git’s reftable framework. Migrate the test over to the new > unit testing framework to simplify debugging and reduce test run-time. > > -> Remarks: An infant version as of now, I tinkered with this after > seeing the project list on 'Git SoC 2024 Ideas' page to get an idea of > the kind of work that will be involved in this project. It's Ok to tinker and to mention this. I hope it helped you write this proposal. > -----Related Work----- > > Prior works about the idea have been performed by other community members > and previous interns which form a good guiding path for my own approach. > Some previous example work: > > i) Port helper/test-ctype.c to unit-tests/t-ctype.c > -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@gmail.com/ > > ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c > -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@gmail.com/ > > iii) Port helper/test-date.c to unit-tests/t-date.c > -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@gmail.com/ > > iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c > -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@gmail.com/ > > v) Integrate a simple strbuf unit test with Git's Makefiles > -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@google.com/ > > vi) t0080: turn t-basic unit test into a helper > -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@google.com/ Thanks for mentioning these. > In GSoC > ------- > > -----Plan----- > > -> Reftable tests: > > The reftable tests are different from other tests in the test directory > because they perform unit testing with the help of a custom test framework > rather than the usual ‘helper file + shell script’ combination. > Reftable tests do have a helper file and a shell script invoking the > helper file, but rather than performing the tests, this combination is > used to invoke tests defined in the reftable directory. > The reftable directory consists of nine tests: > > • basics test > • record test > • block test > • tree test > • pq test > • stack test > • merged test > • refname test > • read-write test > > Each of these tests is written in C using a custom reftable testing > framework defined by reftable/test_framework (also written in C). The > framework has four major features utilized in performing the tests: > > • EXPECT_ERR(c): A function-like macro that takes as input an integer > ‘c’ (generally the return value of a function call), compares it against > 0 and spits an error message if equality doesn’t hold. The error message > itself contains information about the file where this macro was used, > the line in this file where the macro was called and the error code ‘c’ > causing the error. > > • EXPECT_STREQ(a, b): A function-like macro that takes as input two > strings ‘a’ and ‘b’, compares them for equality via strcmp() and throws an > error if equality doesn’t hold. The error message thrown contains information > regarding the file where this macro was invoked, the line in this > file where the macro was called and the mismatched strings ‘a’ and ‘b’. > > • EXPECT(c): A function-like macro that takes as input an integer ‘c’ > (generally the result of a Boolean expression like a == b) and throws an > error message if c == 0. The error message is similar to EXPECT_ERR(c). > > • RUN_TEST(f): A function-like macro that takes as input the name of a > function ‘f’ (a test function that exercises a part of reftable’s code), > prints to stdout the message ‘running f’ and then calls the function with f(). > > Other than these, the framework consists of two additional functions, > set_test_hash() and strbuf_add_void() which are used exclusively in the > stack tests and refname tests respectively. > > Since the reftable test framework is written in C like the unit testing > framework, we can create a direct translation of the features mentioned > above using the existing tools in the unit testing framework with the > following plan: > > • EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0). > > • EXPECT_STREQ(a, b): Can be replaced by check_str(a, b). > > • EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR. > E.g. expect(a >= b) --> check_int(a, “>=”, b) > > • RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”). > > The information contained in the diagnostic messages of these macros is > replicated in the unit testing framework by default. Any additional > information can be displayed using the test_msg() functionality in the > framework. The additional functions set_test_hash() and strbuf_add_void() > may be moved to the source files for stack and refname respectively. You mean "reftable/stack.c" and "reftable/refname.c"? > The plan itself is basic and might need some modifications, but using > the above plan, I have already created a working albeit primitive copy for > two out of the nine tests (basics test and tree test) as can be seen here: > (link: https://github.com/gitgitgadget/git/pull/1698) > > -> Other tests: > > As is already mentioned, the rest of the tests in the test directory use the > combination of a helper file (written in C) and a shell script that invokes > the said helper file. I will use my work from the patch > ‘tests: Move t0009-prio-queue.sh to the unit testing framework’ > (link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/) > to explain the steps involved in the porting of such tests: > > • Search for a suitable test to port: As Christian Couder mentioned in this > mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@mail.gmail.com/), > there exists a subset of t/helper worth porting and we need some sort of > classification to discern these. > > All helper files contain a cmd__foo() function which acts as the entry > point for that helper tool. For example, the helper/test-prio-queue.c > file contained cmd__prio_queue() which served as the entry point for the > prio-queue tool. This function is then mapped to a different name by > helper/test-tool.c which is used by the ‘*.sh’ files to perform tests. > Continuing the prior example, cmd__prio_queue() had been mapped to > “prio-queue” by test-tool.c and t0009-prio-queue.sh invoked it like > “prio-queue 1 2 get 3 dump”. > > To classify what among t/helper should be targeted first in this project > we can use something like ‘git grep foo’ (where foo is the alias for cmd__foo()) > to look at the instances where the helper tool is invoked. The ones appearing > lesser in different test scripts are the ones most likely to be used solely > for unit testing and should probably be targeted first. > Utilising this strategy, I discovered that the ‘prio-queue’ tool was only > used in t0009-prio-queue.sh and hence, was a good candidate for the unit > testing framework. > > Note that this strategy is not full-proof and further investigation is > absolutely required on a per-test basis, it is only meant to give an > initial idea of what’s worth investigating. > > • Create a new C test file in t/unit-tests: After finding a test appropriate > for the migration efforts, we create a new ‘*.c’ file in t/unit-tests. > The test file must be named appropriately to reflect the nature of the > tests it is supposed to perform. Most of the times, replacing ‘tXXXX’ > with ‘t-‘ and ‘*.sh’ with ‘.c’ in the name of the test script suffices. > E.g. t/t0009-prio-queue.sh turns to t/unit-tests/t-prio-queue.c. The > new C file must #include “test-lib.h” (to be able to use the unit > testing framework) and other necessary headers files. > > • Move the code from the helper file: Since the helper files are written > in C, this step is mostly a ‘copy-paste then rename’ job. Changes similar > to the following also need to be made in the Makefile: > - TEST_BUILTINS_OBJS += test-prio-queue.o > + UNIT_TEST_PROGRAMS += t-prio-queue > > • Translate the shell script: The trickiest part of the plan, since > different test scripts perform various functions, and a direct translation > of the scripts to C is not always optimal. Continuing the prior example, > t0009-prio-queue.sh used a single pattern for testing, write expected > output to a temporary file (named ‘expect’) -> feed input to the ‘prio-queue’ > helper tool -> dump its output to another temporary file (named ‘actual’) > -> compare the two files (‘actual’ vs ‘expect’). > > In the first iteration of my prio-queue patch, I worked out a > straightforward translation of this pattern in C. I stored the input in > a string buffer, passed that buffer to the test function and stored its > output in another buffer, and then called memcmp() on these two buffers. > While this did prove to be a working copy, this work was found to be inadequate > on the mailing list. Through the next several iterations, I reworked the > logic several times, like comparing the input and output on-the-go rather > than using buffers and replacing strings with macro definitions. > > The test scripts similarly perform other functions like checking for > prerequisites, creating commits, initializing repositories, changing or > creating directories and so forth, and custom logic is required in most > of the cases of translating these, as seen above. > > • Run the resulting test, correct any errors: It is rare for any migrated > test to work correctly on the first run. This step involves resolving any > compile/runtime errors arising from the test and making sure that at the > very minimum, all the test-cases of the original test are replicated in the > new work. Improvements upon the original can also be made, for example, the > original t0009-prio-queue.sh did not exercise the reverse stack > functionality of prio-queue, which I improved upon in unit-tests/t-prio-queue. > > • Send the resulting patch to the mailing list, respond to the feedback: > This step involves writing a meaningful commit message explaining each patch > in the series. From my experience contributing to the Git project, I find it > to be rare for any patch series to be accepted in the very first iteration. > Feedback from the community is vital for the refinement of any patch and > must be addressed by performing the suggested changes and sending the work > back to the mailing list. This must be repeated until the work is merged > into ‘seen’, ‘next’ and further down, ‘master’. > > > Timeline > -------- > > I’m confident that I can start the project as early as the start of the > Community Bonding Period (May 1 - 26). This is because I have read > the related documentation and already have some experience with the idea. > I believe I’ll be ready to get up to speed to work on the project by then. > The exact time arrangement for each test is variable and hard to determine, > but judging from the fact that it took me 3-4 days to complete the first > iteration of the t-prio-queue work, here is a proposed migration schedule: > > • Reftable tests: > > If my current work from 'tests: Move t0032-reftable-unittest.sh to the unit > testing framework' is found satisfactory, there are 7 tests left that need > to be ported to the unit testing framework. Assuming it takes 2-3 days to > port one test, I should be done with the first patch series for the reftable > tests in about 2-3 weeks. From there, it’s a matter of responding to the > feedback from the mailing list, which can deceptively take longer than expected. > For instance, I had to continue polishing my t-prio-queue patch for about > two weeks after the feedback on the first iteration. > > • Other tests: > > The time required to port these tests is highly variable and depends mostly > upon the work required in translating the shell script. As mentioned > previously, it took me 3-4 days to complete the first iteration of the > test-prio-queue migration patch, and that was a short test with only about > 50 or so lines of shell scripting and all the test cases following a single > pattern. Considering all this, I believe it should be possible, on average, > to migrate a new test in 5-8 days. > > Hence, it should be possible for me to migrate >=7 tests along with the > reftable tests throughout the duration of this project. > > Availability > ------------ > > My summertime is reserved for GSoC, so I expect that I can work on a new > test 5 days per week, 6-8 hours per day, that is 30-40 hours a week. > On the weekends, I would like to solely work on the feedback from > the mailing list and advance my [WIP] patches. Certainly, something > unexpected may arise, but I will try my best to adhere to this time > commitment and be always available through the community’s mailing list. > > Post GSoC & Closing Remarks > --------------------------- > > When I first started contributing to the Git project in October of 2023, > I had no idea about programmes like GSoC. I was advised by a senior of > mine to contribute to open-source projects and hence, my aim of contribution > was to apply what I had learnt in college to solve real-world problems > and learn from more experienced peers. However, most of what I have > contributed to Git has been trivial owing to my lack of skills and > inexperience with the project. > > Seeing how I need to do an internship in summer, with GSoC, I hope to be > able to dedicate this internship time and effort to a cool project like > Git while simultaneously learning skills to be able to make more useful > contributions in the future. It’s two birds with one stone. I would also > like to keep working on this project to see it to completion post-GSoC > and help mentor other newcomers get started with the Git project. > -- After GSoC -- > --------------------- > > After GSoC I intend to be a part of the community and keep > contributing to the git’s codebase. I eagerly want to learn a lot from > the git community and enhance my skills. I also see myself making > important contributions to git in the future. > > When I first joined the community 2 months ago, the ancient way of > collaborating through a mailing list by sending diff patches was > really puzzling (GitHub was the only means that I knew about for > open-source collaboration). After sending a patch I got a little > comfortable with it and started loving it. > > -- Closing remarks -- > ---------------------------- > > I am really enthusiastic to contribute to git. I really want to learn > a lot from my mentors and the whole git community. Everyone > contributes using git, why not contribute to git :). > > In the end I really want to thank the whole community, especially > Christian and Junio, for their valuable time in reviewing my patches > and guiding me with the problems I faced. Thanks for your proposal. Please make sure you submit it soon as a pdf file to the GSoC website. It can then be updated by uploading a new pdf until the April 2 1800 UTC deadline. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework 2024-03-20 13:17 ` Christian Couder @ 2024-03-20 19:35 ` Chandra Pratap 2024-03-21 12:27 ` Patrick Steinhardt 1 sibling, 0 replies; 8+ messages in thread From: Chandra Pratap @ 2024-03-20 19:35 UTC (permalink / raw) To: Christian Couder; +Cc: git, karthik.188, ps, kaartic.sivaraam On 20/03/24 18:47, Christian Couder wrote: > On Tue, Mar 19, 2024 at 6:11 PM Chandra Pratap > <chandrapratap3519@gmail.com> wrote: >> >> This is my project proposal for the Google Summer of Code 2024 program. >> The document version of this proposal can be accessed through this link: >> https://shorturl.at/ijrTU >> >> ---------<8----------<8----------<8----------<8----------<8----------<8 >> >> Personal Info >> ------------- >> >> Full name: Chandra Pratap >> Preferred name: Chand >> >> E-mail: chandrapratap3519@gmail.com >> Phone: (+91)77618-24030 >> >> Education: SV National Institute of Technology, Surat, India >> Year: Sophomore (2nd) >> Major: Mathematics >> >> GitHub: https://github.com/Chand-ra >> >> Before GSoC >> ----------- >> >> -----Synopsis----- >> >> A new unit testing framework was introduced to the Git mailing list last >> year with the aim of simplifying testing and improving maintainability. >> The idea was accepted and merged into master on 09/11/2023. This project >> aims to extend that work by moving more tests from the current setup to >> the new unit testing framework. >> >> The SoC 2024 Ideas page (link: https://git.github.io/SoC-2019-Ideas/) >> mentions reftable unit tests migration as a separate project from the >> general unit test migration project, however, I propose migrating other >> tests alongside the reftable unit tests as a part of this proposal. > > It means that if we select your proposal, we cannot select someone > else to work on either the "Move existing tests to a unit testing > framework" project or the "Convert reftable unit tests to use the unit > testing framework" project. > > I am not sure but I think that, after migrating all the reftable unit > tests, I would prefer you working on other reftable related tasks > rather than on more unit test migrations. > So, should I choose one between the reftable tests migration project and the general unit tests migration project? Say I decide to go with the reftable tests migration project, what kind of "other reftable related tasks" can I expect to work on? Would it be similar to the day-to-day reftable tasks perfomed by contributors or one of the reftable projects mentioned on 'SoC 2024 Ideas' page? Would I need to explain these tasks in my GSoC proposal? >> The reasoning behind this is explained further down. >> The difficulty for the project should be medium and it should take >> somewhat between 175 to 350 hours. >> >> -----Contributions----- >> >> • apply.c: make git apply respect core.fileMode settings >> -> Status: merged into master >> -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@pobox.com/ > > A link to (or the hash of) the commit that merged your patch into the > master branch would be nice. > Any pointers on how I go about finding this commit/hash? >> -> Description: When applying a patch that adds an executable file, git >> apply ignores the core.fileMode setting (core.fileMode specifies whether >> the executable bit on files in the working tree are to be honored or not) >> resulting in false warnings. Fix this by inferring the correct file mode >> from the existing index entry when core.filemode is set to false. Add a >> test case that verifies the change and prevents future regression. >> >> -> Remarks: This was the first patch I worked on as a contributor to Git. >> Served me as an essential intro lesson to the community’s working flow and >> general practices. >> >> • tests: Move t0009-prio-queue.sh to the unit testing framework >> -> Status: merged into master >> -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/ > > I don't think it's a good idea to submit patches related to a project > we propose before the GSoC actually starts. We might have been able to > detect this earlier if you added [GSoC] to the patch titles. > As I mentioned in the 'Closing Remarks' section, this work was done well before I decided to take part in this year's GSoC (or even before this topic was proposed as a project for this year's GSoC), so I don't think I could have made it a [GSoC] patch. > Also a link to (or the hash of) the commit that merged your patch into > the master branch would be nice. > Noted. >> -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c >> unit test Git's implementation of a priority queue. Migrate the test >> over to the new unit testing framework to simplify debugging and reduce >> test run-time. >> >> -> Remarks: Perhaps the most relevant patch of all the ones mentioned >> here, this patch helped me understand the expectations and workflow for >> the work to be performed in this project. >> >> • write-or-die: make GIT_FLUSH a Boolean environment variable >> -> Status: merged into master >> -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com/ > > Also a link to (or the hash of) the commit that merged your patch into > the master branch would be nice. > >> -> Description: Among Git's environment variable, the ones marked as > > s/variable/variables/ > >> "Boolean" accept values like Boolean configuration variables, i.e. >> values like 'yes', 'on', 'true' and positive numbers are taken as "on" >> and values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH >> accept more values besides '0' and '1' by turning it into a Boolean >> environment variable & update the related documentation. >> >> • sideband.c: remoye redundant NEEDSWORK tag > > s/remoye/remove/ > >> -> Status: merged into master >> -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com/ > > Also a link to (or the hash of) the commit that merged your patch into > the master branch would be nice. > >> -> Description: Replace a misleading NEEDSWORK tag in sideband.c that >> reads ‘replace int with size_t’ with another comment explaining why it >> is fine to use ‘int’ and the replacement isn’t necessary. >> >> • make tig callable from PowerShell/Command Prompt >> -> Status: merged into main >> -> link: https://github.com/git-for-windows/MINGW-packages/pull/104 >> >> -> Description: Tig is a text mode interface for Git that ships with the >> standard Git for Windows package but isn’t callable from PowerShell/ >> Command Prompt by default. Fix this by updating the relevant Makefiles >> and resource scripts. >> >> • fix broken link on Git for Windows’ GitHub wiki >> -> Status: merged >> -> link: https://github.com/git-for-windows/git/wiki/Home/_history >> >> -> Remarks: A simple fix for a broken link that I stumbled upon while >> browsing Git for Windows’ wiki looking for some help with the patch >> mentioned just before this one. >> >> • t4129: prevent loss of exit codes due to the use of pipes >> -> Status: merged into master >> -> link: https://lore.kernel.org/git/20220311132141.1817-1-shaoxuan.yuan02@gmail.com/ > > The link shows a patch from 2022 by Shaoxuan Yuan. Are you sure this > is the right link? That seems to be an error. I will make sure to correct it in the next draft. > >> -> Description: Piping the output of git commands like git-ls-files to >> another command (grep in this case) in t4129 hides the exit code returned >> by these commands. Prevent this by storing the output of git-ls-files to >> a temporary file and then "grep-ping" from that file. Replace grep with >> test_grep as the latter is more verbose when it fails. >> >> • t9146: replace test -d/-f with appropriate test_path_is_* function >> -> Status: merged into master >> -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@gmail.com/ > > Also a link to (or the hash of) the commit that merged your patch into > the master branch would be nice. > >> -> Description: The helper functions test_path_is_* provide better debugging >> information than test -d/-e/-f. >> Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls. >> >> • regex: update relevant files in compat/regex >> -> Status: WIP >> -> link: https://github.com/gitgitgadget/git/pull/1641 > > This is a GitGitGadget patch. Please mention that. Same thing for some > other contributions above that are not part of Git. > Sure thing. >> -> Description: The RegEx code in compat/regex has been vendored from >> gawk and was last updated in 2010. This may lead to performance issues >> like high CPU usage. Fix this by synchronizing the relevant files in >> compat/regex with the latest version from GNULib and then replaying any >> changes we made to gawk’s version on top of the new files. >> >> -> Remarks: When I started working on this patch, I thought it was an >> easy fix but the work turned out to be more involved than I anticipated. >> I had to seek help from the other community members, and we have made >> some good progress, but there is still a lot of cleaning and changes that >> need to be done. I haven’t found enough time to commit to this again, >> but it’s surely something that I want to get done soon. >> >> • tests: Move t0032-reftable-unittest.sh to the unit testing framework >> -> Status: WIP >> -> link: https://github.com/gitgitgadget/git/pull/1698 >> >> -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c >> unit test Git’s reftable framework. Migrate the test over to the new >> unit testing framework to simplify debugging and reduce test run-time. >> >> -> Remarks: An infant version as of now, I tinkered with this after >> seeing the project list on 'Git SoC 2024 Ideas' page to get an idea of >> the kind of work that will be involved in this project. > > It's Ok to tinker and to mention this. I hope it helped you write this proposal. > >> -----Related Work----- >> >> Prior works about the idea have been performed by other community members >> and previous interns which form a good guiding path for my own approach. >> Some previous example work: >> >> i) Port helper/test-ctype.c to unit-tests/t-ctype.c >> -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@gmail.com/ >> >> ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c >> -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@gmail.com/ >> >> iii) Port helper/test-date.c to unit-tests/t-date.c >> -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@gmail.com/ >> >> iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c >> -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@gmail.com/ >> >> v) Integrate a simple strbuf unit test with Git's Makefiles >> -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@google.com/ >> >> vi) t0080: turn t-basic unit test into a helper >> -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@google.com/ > > Thanks for mentioning these. > :) >> In GSoC >> ------- >> >> -----Plan----- >> >> -> Reftable tests: >> >> The reftable tests are different from other tests in the test directory >> because they perform unit testing with the help of a custom test framework >> rather than the usual ‘helper file + shell script’ combination. >> Reftable tests do have a helper file and a shell script invoking the >> helper file, but rather than performing the tests, this combination is >> used to invoke tests defined in the reftable directory. >> The reftable directory consists of nine tests: >> >> • basics test >> • record test >> • block test >> • tree test >> • pq test >> • stack test >> • merged test >> • refname test >> • read-write test >> >> Each of these tests is written in C using a custom reftable testing >> framework defined by reftable/test_framework (also written in C). The >> framework has four major features utilized in performing the tests: >> >> • EXPECT_ERR(c): A function-like macro that takes as input an integer >> ‘c’ (generally the return value of a function call), compares it against >> 0 and spits an error message if equality doesn’t hold. The error message >> itself contains information about the file where this macro was used, >> the line in this file where the macro was called and the error code ‘c’ >> causing the error. >> >> • EXPECT_STREQ(a, b): A function-like macro that takes as input two >> strings ‘a’ and ‘b’, compares them for equality via strcmp() and throws an >> error if equality doesn’t hold. The error message thrown contains information >> regarding the file where this macro was invoked, the line in this >> file where the macro was called and the mismatched strings ‘a’ and ‘b’. >> >> • EXPECT(c): A function-like macro that takes as input an integer ‘c’ >> (generally the result of a Boolean expression like a == b) and throws an >> error message if c == 0. The error message is similar to EXPECT_ERR(c). >> >> • RUN_TEST(f): A function-like macro that takes as input the name of a >> function ‘f’ (a test function that exercises a part of reftable’s code), >> prints to stdout the message ‘running f’ and then calls the function with f(). >> >> Other than these, the framework consists of two additional functions, >> set_test_hash() and strbuf_add_void() which are used exclusively in the >> stack tests and refname tests respectively. >> >> Since the reftable test framework is written in C like the unit testing >> framework, we can create a direct translation of the features mentioned >> above using the existing tools in the unit testing framework with the >> following plan: >> >> • EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0). >> >> • EXPECT_STREQ(a, b): Can be replaced by check_str(a, b). >> >> • EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR. >> E.g. expect(a >= b) --> check_int(a, “>=”, b) >> >> • RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”). >> >> The information contained in the diagnostic messages of these macros is >> replicated in the unit testing framework by default. Any additional >> information can be displayed using the test_msg() functionality in the >> framework. The additional functions set_test_hash() and strbuf_add_void() >> may be moved to the source files for stack and refname respectively. > > You mean "reftable/stack.c" and "reftable/refname.c"? > Yes, I will re-write this to make it clearer. > >> The plan itself is basic and might need some modifications, but using >> the above plan, I have already created a working albeit primitive copy for >> two out of the nine tests (basics test and tree test) as can be seen here: >> (link: https://github.com/gitgitgadget/git/pull/1698) >> >> -> Other tests: >> >> As is already mentioned, the rest of the tests in the test directory use the >> combination of a helper file (written in C) and a shell script that invokes >> the said helper file. I will use my work from the patch >> ‘tests: Move t0009-prio-queue.sh to the unit testing framework’ >> (link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/) >> to explain the steps involved in the porting of such tests: >> >> • Search for a suitable test to port: As Christian Couder mentioned in this >> mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@mail.gmail.com/), >> there exists a subset of t/helper worth porting and we need some sort of >> classification to discern these. >> >> All helper files contain a cmd__foo() function which acts as the entry >> point for that helper tool. For example, the helper/test-prio-queue.c >> file contained cmd__prio_queue() which served as the entry point for the >> prio-queue tool. This function is then mapped to a different name by >> helper/test-tool.c which is used by the ‘*.sh’ files to perform tests. >> Continuing the prior example, cmd__prio_queue() had been mapped to >> “prio-queue” by test-tool.c and t0009-prio-queue.sh invoked it like >> “prio-queue 1 2 get 3 dump”. >> >> To classify what among t/helper should be targeted first in this project >> we can use something like ‘git grep foo’ (where foo is the alias for cmd__foo()) >> to look at the instances where the helper tool is invoked. The ones appearing >> lesser in different test scripts are the ones most likely to be used solely >> for unit testing and should probably be targeted first. >> Utilising this strategy, I discovered that the ‘prio-queue’ tool was only >> used in t0009-prio-queue.sh and hence, was a good candidate for the unit >> testing framework. >> >> Note that this strategy is not full-proof and further investigation is >> absolutely required on a per-test basis, it is only meant to give an >> initial idea of what’s worth investigating. >> >> • Create a new C test file in t/unit-tests: After finding a test appropriate >> for the migration efforts, we create a new ‘*.c’ file in t/unit-tests. >> The test file must be named appropriately to reflect the nature of the >> tests it is supposed to perform. Most of the times, replacing ‘tXXXX’ >> with ‘t-‘ and ‘*.sh’ with ‘.c’ in the name of the test script suffices. >> E.g. t/t0009-prio-queue.sh turns to t/unit-tests/t-prio-queue.c. The >> new C file must #include “test-lib.h” (to be able to use the unit >> testing framework) and other necessary headers files. >> >> • Move the code from the helper file: Since the helper files are written >> in C, this step is mostly a ‘copy-paste then rename’ job. Changes similar >> to the following also need to be made in the Makefile: >> - TEST_BUILTINS_OBJS += test-prio-queue.o >> + UNIT_TEST_PROGRAMS += t-prio-queue >> >> • Translate the shell script: The trickiest part of the plan, since >> different test scripts perform various functions, and a direct translation >> of the scripts to C is not always optimal. Continuing the prior example, >> t0009-prio-queue.sh used a single pattern for testing, write expected >> output to a temporary file (named ‘expect’) -> feed input to the ‘prio-queue’ >> helper tool -> dump its output to another temporary file (named ‘actual’) >> -> compare the two files (‘actual’ vs ‘expect’). >> >> In the first iteration of my prio-queue patch, I worked out a >> straightforward translation of this pattern in C. I stored the input in >> a string buffer, passed that buffer to the test function and stored its >> output in another buffer, and then called memcmp() on these two buffers. >> While this did prove to be a working copy, this work was found to be inadequate >> on the mailing list. Through the next several iterations, I reworked the >> logic several times, like comparing the input and output on-the-go rather >> than using buffers and replacing strings with macro definitions. >> >> The test scripts similarly perform other functions like checking for >> prerequisites, creating commits, initializing repositories, changing or >> creating directories and so forth, and custom logic is required in most >> of the cases of translating these, as seen above. >> >> • Run the resulting test, correct any errors: It is rare for any migrated >> test to work correctly on the first run. This step involves resolving any >> compile/runtime errors arising from the test and making sure that at the >> very minimum, all the test-cases of the original test are replicated in the >> new work. Improvements upon the original can also be made, for example, the >> original t0009-prio-queue.sh did not exercise the reverse stack >> functionality of prio-queue, which I improved upon in unit-tests/t-prio-queue. >> >> • Send the resulting patch to the mailing list, respond to the feedback: >> This step involves writing a meaningful commit message explaining each patch >> in the series. From my experience contributing to the Git project, I find it >> to be rare for any patch series to be accepted in the very first iteration. >> Feedback from the community is vital for the refinement of any patch and >> must be addressed by performing the suggested changes and sending the work >> back to the mailing list. This must be repeated until the work is merged >> into ‘seen’, ‘next’ and further down, ‘master’. >> >> >> Timeline >> -------- >> >> I’m confident that I can start the project as early as the start of the >> Community Bonding Period (May 1 - 26). This is because I have read >> the related documentation and already have some experience with the idea. >> I believe I’ll be ready to get up to speed to work on the project by then. >> The exact time arrangement for each test is variable and hard to determine, >> but judging from the fact that it took me 3-4 days to complete the first >> iteration of the t-prio-queue work, here is a proposed migration schedule: >> >> • Reftable tests: >> >> If my current work from 'tests: Move t0032-reftable-unittest.sh to the unit >> testing framework' is found satisfactory, there are 7 tests left that need >> to be ported to the unit testing framework. Assuming it takes 2-3 days to >> port one test, I should be done with the first patch series for the reftable >> tests in about 2-3 weeks. From there, it’s a matter of responding to the >> feedback from the mailing list, which can deceptively take longer than expected. >> For instance, I had to continue polishing my t-prio-queue patch for about >> two weeks after the feedback on the first iteration. >> >> • Other tests: >> >> The time required to port these tests is highly variable and depends mostly >> upon the work required in translating the shell script. As mentioned >> previously, it took me 3-4 days to complete the first iteration of the >> test-prio-queue migration patch, and that was a short test with only about >> 50 or so lines of shell scripting and all the test cases following a single >> pattern. Considering all this, I believe it should be possible, on average, >> to migrate a new test in 5-8 days. >> >> Hence, it should be possible for me to migrate >=7 tests along with the >> reftable tests throughout the duration of this project. >> >> Availability >> ------------ >> >> My summertime is reserved for GSoC, so I expect that I can work on a new >> test 5 days per week, 6-8 hours per day, that is 30-40 hours a week. >> On the weekends, I would like to solely work on the feedback from >> the mailing list and advance my [WIP] patches. Certainly, something >> unexpected may arise, but I will try my best to adhere to this time >> commitment and be always available through the community’s mailing list. >> >> Post GSoC & Closing Remarks >> --------------------------- >> >> When I first started contributing to the Git project in October of 2023, >> I had no idea about programmes like GSoC. I was advised by a senior of >> mine to contribute to open-source projects and hence, my aim of contribution >> was to apply what I had learnt in college to solve real-world problems >> and learn from more experienced peers. However, most of what I have >> contributed to Git has been trivial owing to my lack of skills and >> inexperience with the project. >> >> Seeing how I need to do an internship in summer, with GSoC, I hope to be >> able to dedicate this internship time and effort to a cool project like >> Git while simultaneously learning skills to be able to make more useful >> contributions in the future. It’s two birds with one stone. I would also >> like to keep working on this project to see it to completion post-GSoC >> and help mentor other newcomers get started with the Git project. > > >> -- After GSoC -- >> --------------------- >> >> After GSoC I intend to be a part of the community and keep >> contributing to the git’s codebase. I eagerly want to learn a lot from >> the git community and enhance my skills. I also see myself making >> important contributions to git in the future. >> >> When I first joined the community 2 months ago, the ancient way of >> collaborating through a mailing list by sending diff patches was >> really puzzling (GitHub was the only means that I knew about for >> open-source collaboration). After sending a patch I got a little >> comfortable with it and started loving it. >> >> -- Closing remarks -- >> ---------------------------- >> >> I am really enthusiastic to contribute to git. I really want to learn >> a lot from my mentors and the whole git community. Everyone >> contributes using git, why not contribute to git :). >> >> In the end I really want to thank the whole community, especially >> Christian and Junio, for their valuable time in reviewing my patches >> and guiding me with the problems I faced. > > Thanks for your proposal. Please make sure you submit it soon as a pdf > file to the GSoC website. It can then be updated by uploading a new > pdf until the April 2 1800 UTC deadline. Thanks for the feedback, I will. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework 2024-03-20 13:17 ` Christian Couder 2024-03-20 19:35 ` Chandra Pratap @ 2024-03-21 12:27 ` Patrick Steinhardt 2024-03-29 6:54 ` [RFC][GSoC] Proposal v2: Move more " Chandra Pratap 2024-03-29 7:08 ` [RFC][GSoC] Proposal v2: Move and improve reftable tests in " Chandra Pratap 1 sibling, 2 replies; 8+ messages in thread From: Patrick Steinhardt @ 2024-03-21 12:27 UTC (permalink / raw) To: Christian Couder; +Cc: Chandra Pratap, git, karthik.188, kaartic.sivaraam [-- Attachment #1: Type: text/plain, Size: 2180 bytes --] On Wed, Mar 20, 2024 at 02:17:23PM +0100, Christian Couder wrote: > On Tue, Mar 19, 2024 at 6:11 PM Chandra Pratap > <chandrapratap3519@gmail.com> wrote: [snip] > > A new unit testing framework was introduced to the Git mailing list last > > year with the aim of simplifying testing and improving maintainability. > > The idea was accepted and merged into master on 09/11/2023. This project > > aims to extend that work by moving more tests from the current setup to > > the new unit testing framework. > > > > The SoC 2024 Ideas page (link: https://git.github.io/SoC-2019-Ideas/) > > mentions reftable unit tests migration as a separate project from the > > general unit test migration project, however, I propose migrating other > > tests alongside the reftable unit tests as a part of this proposal. > > It means that if we select your proposal, we cannot select someone > else to work on either the "Move existing tests to a unit testing > framework" project or the "Convert reftable unit tests to use the unit > testing framework" project. > > I am not sure but I think that, after migrating all the reftable unit > tests, I would prefer you working on other reftable related tasks > rather than on more unit test migrations. I agree, I'd also like to keep these projects separate from each other. Also, the reftable tests could certainly use some polishing and a lot more documentation than they currently have. Their coding style does not really match the rest of the project, and if we're busy migrating the code then I think we should also take the chance and touch it up. That will require the student to understand the reftable code a whole lot more deeply than a mere conversion would require though. Thus, I would like to caution any student who wants to pick up this project to not underestimate the effort required for this project. The reftable code itself is certainly non-trivial. As Chris mentioned, if this project would be finished early I'm quite sure that the efforts would uncover bugs, test gaps or similar things in the vicinity of reftables that the student could certainly continue to work on. Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal v2: Move more tests to the unit testing framework 2024-03-21 12:27 ` Patrick Steinhardt @ 2024-03-29 6:54 ` Chandra Pratap 2024-03-29 17:51 ` Karthik Nayak 2024-03-29 7:08 ` [RFC][GSoC] Proposal v2: Move and improve reftable tests in " Chandra Pratap 1 sibling, 1 reply; 8+ messages in thread From: Chandra Pratap @ 2024-03-29 6:54 UTC (permalink / raw) To: Patrick Steinhardt, Christian Couder; +Cc: git, karthik.188, kaartic.sivaraam Thanks for the feedback, Christian and Patrick! With your advice, I decided to split my original proposal into two to conform to what was suggested by the SoC 2024 Ideas page. This is the proposal for the unit tests migration project. ---------<8----------<8----------<8----------<8----------<8----------<8 Personal Info ------------- Full name: Chandra Pratap Preferred name: Chand E-mail: chandrapratap3519@gmail.com Phone: (+91)77618-24030 Education: SV National Institute of Technology, Surat, India Year: Sophomore (2nd) Major: Mathematics GitHub: https://github.com/Chand-ra Before GSoC ----------- -----Synopsis----- A new unit testing framework was introduced to the Git mailing list last year with the aim of simplifying testing and improving maintainability by moving the current testing setup from shell scripts and helper files to a framework written wholly in C. The idea was accepted and merged into master on 09/11/2023. The choice of testing framework and the reasoning behind the choice is described in Documentation/technical/unit-tests.txt. This project aims to extend that work by moving more tests from the current setup to the new unit testing framework. The difficulty for the project should be medium and it should take somewhat between 175 to 350 hours. -----Contributions----- • apply.c: make git apply respect core.fileMode settings -> Status: merged into master -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@pobox.com/ -> Merge commit: cf47fb7ec7e183a1a1e521a540862fba3c2a89eb -> Description: When applying a patch that adds an executable file, git apply ignores the core.fileMode setting (core.fileMode specifies whether the executable bit on files in the working tree are to be honored or not) resulting in false warnings. Fix this by inferring the correct file mode from the existing index entry when core.filemode is set to false. Add a test case that verifies the change and prevents future regression. -> Remarks: This was the first patch I worked on as a contributor to Git. Served me as an essential intro lesson to the community’s working flow and general practices. • tests: Move t0009-prio-queue.sh to the unit testing framework -> Status: merged into master -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/ -> Merge commit: 107023e1c9f981476c505e73eab319db6534a536 -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c unit test Git's implementation of a priority queue. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. -> Remarks: Probably the most relevant patch of all the ones mentioned here, I decided to work on this patch well before I decided to take part in this year’s GSoC. This patch helped me understand the expectations and workflow for the work to be performed in unit tests migrations. • write-or-die: make GIT_FLUSH a Boolean environment variable -> Status: merged into master -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com/ -> Merge commit: b3049bbb97c9c0d0292bc9239e976cc661961f39 -> Description: Among Git's environment variables, the ones marked as "Boolean" accept values like Boolean configuration variables, i.e. values like 'yes', 'on', 'true' and positive numbers are taken as "on" and values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH accept more values besides '0' and '1' by turning it into a Boolean environment variable & update the related documentation. • sideband.c: remoye redundant NEEDSWORK tag -> Status: merged into master -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com/ -> Merge commit: 6484eb9a97fe3cd81a2d5711183b93494e6ddefa -> Description: Replace a misleading NEEDSWORK tag in sideband.c that reads ‘replace int with size_t’ with another comment explaining why it is fine to use ‘int’ and the replacement isn’t necessary. • make tig callable from PowerShell/Command Prompt -> Status: merged into main (Git for Windows) -> link: https://github.com/git-for-windows/MINGW-packages/pull/104 -> Merge commit: 8eb71eb31c6a1b5d41b253c1ab66d981cc448cb0 -> Description: Tig is a text mode interface for Git that ships with the standard Git for Windows package but isn’t callable from PowerShell/ Command Prompt by default. Fix this by updating the relevant Makefiles and resource scripts. • fix broken link on Git for Windows’ GitHub wiki -> Status: merged (Git for Windows) -> link: https://github.com/git-for-windows/git/wiki/Home/_history -> Merge commit: 0df663304bce986da6571cca48b34508d4823a11 -> Remarks: A simple fix for a broken link that I stumbled upon while browsing Git for Windows’ wiki looking for some help with the patch mentioned just before this one. • t4129: prevent loss of exit codes due to the use of pipes -> Status: merged into master -> link: https://public-inbox.org/git/pull.1636.git.1704891257544.gitgitgadget@gmail.com/ -> Merge commit: 1b095626937a858e3a68e9c7d1de11b71117bb42 -> Description: Piping the output of git commands like git-ls-files to another command (grep in this case) in t4129 hides the exit code returned by these commands. Prevent this by storing the output of git-ls-files to a temporary file and then "grep-ping" from that file. Replace grep with test_grep as the latter is more verbose when it fails. • t9146: replace test -d/-f with appropriate test_path_is_* function -> Status: merged into master -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@gmail.com/ -> Merge commit: 90c0c15e56fa761ae8c4cf5f5fe09b329c0a5dc5 -> Description: The helper functions test_path_is_* provide better debugging information than test -d/-e/-f. Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls. • regex: update relevant files in compat/regex -> Status: WIP (GitGitGadget PR) -> link: https://github.com/gitgitgadget/git/pull/1641 -> Description: The RegEx code in compat/regex has been vendored from gawk and was last updated in 2010. This may lead to performance issues like high CPU usage. Fix this by synchronizing the relevant files in compat/regex with the latest version from GNULib and then replaying any changes we made to gawk’s version on top of the new files. -> Remarks: When I started working on this patch, I thought it was an easy fix but the work turned out to be more involved than I anticipated. I had to seek help from the other community members, and we have made some good progress, but there is still a lot of cleaning and changes that need to be done. I haven’t found enough time to commit to this again, but it’s surely something that I want to get done soon. • tests: Move t0032-reftable-unittest.sh to the unit testing framework -> Status: WIP (GitGitGadget PR) -> link: https://github.com/gitgitgadget/git/pull/1698 -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c unit test Git’s reftable framework. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. -> Remarks: An infant version as of now, I tinkered with this after seeing the project list on 'Git SoC 2024 Ideas' page to get an idea of the kind of work that will be involved in the ‘Move reftable tests to the new unit testing framework’ project. • commit.c: ensure find_header_mem() doesn't scan beyond given range -> Status: Dropped -> Remarks: This was a patch addressing a NEEDSWORK comment in commit.c which was dropped because René Scharfe found out a better way to fix the issue at hand than my approach or what the NEEDSWORK comment suggested. -----Related Work----- Prior works about the idea have been performed by other community members and previous interns which form a good guiding path for my own approach. Some previous example work: i) Port helper/test-ctype.c to unit-tests/t-ctype.c -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@gmail.com/ ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@gmail.com/ iii) Port helper/test-date.c to unit-tests/t-date.c -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@gmail.com/ iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@gmail.com/ v) Integrate a simple strbuf unit test with Git's Makefiles -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@google.com/ vi) t0080: turn t-basic unit test into a helper -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@google.com/ In GSoC ------- -----Plan----- Tests for Git are defined in the t/ directory and use the combination of a helper file (written in C) and a shell script that invokes the said helper file. I will use my work from the patch ‘tests: Move t0009-prio-queue.sh to the unit testing framework’ to explain the steps involved in the porting of such tests: • Search for a suitable test to port: As Christian Couder mentioned in this mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@mail.gmail.com/), there exists a subset of t/helper worth porting and we need some sort of classification to discern these. All helper files contain a cmd__foo() function which acts as the entry point for that helper tool. For example, the helper/test-prio-queue.c file contained cmd__prio_queue() which served as the entry point for that file. The binary for the helper file is then mapped to a different name by helper/test-tool.c which is used by the ‘*.sh’ files to perform the tests. This name can be discovered by searching for the helper file’s entry point in test-tool.c. Continuing the prior example, “prio-queue” was the name for the helper/test-prio-queue.c binary and t0009-prio-queue.sh invoked it like “prio-queue 1 2 get 3 dump”. To classify what among t/helper should be targeted first in this project, we can use something like ‘git grep foo’ (where foo is the name for the helper’s binary) to look at the instances where the helper tool is invoked. The ones appearing lesser in different test scripts are the ones most likely to be used solely for unit testing and should probably be targeted first. Utilising this strategy, I discovered that the ‘prio-queue’ tool was only used in t0009-prio-queue.sh and hence, was a good candidate for the unit testing framework. Note that this strategy is not full-proof and further investigation is absolutely required on a per-test basis, it is only meant to give an initial idea of what’s worth investigating. • Create a new C test file in t/unit-tests: After finding a test appropriate for the migration efforts, we create a new ‘*.c’ file in t/unit-tests. The test file must be named appropriately to reflect the nature of the tests it is supposed to perform. Most of the times, replacing ‘tXXXX’ with ‘t-‘ and ‘*.sh’ with ‘.c’ in the name of the test script suffices. E.g. t/t0009-prio-queue.sh turns to t/unit-tests/t-prio-queue.c. The new C file must #include “test-lib.h” (to be able to use the unit testing framework) and other necessary headers files. • Move the code from the helper file: Since the helper files are written in C, this step is mostly a ‘copy-paste then rename’ job. Changes similar to the following also need to be made in the Makefile: - TEST_BUILTINS_OBJS += test-prio-queue.o + UNIT_TEST_PROGRAMS += t-prio-queue • Translate the shell script: The trickiest part of the plan, since different test scripts perform various functions and a direct translation of the scripts to C is not always optimal. Continuing the prior example, t0009-prio-queue.sh used a single pattern for testing, write expected output to a temporary file (named ‘expect’) -> feed input to the ‘prio-queue’ helper tool -> dump its output to another temporary file (named ‘actual’) -> compare the two files (‘actual’ vs ‘expect’). In the first iteration of my prio-queue patch, I worked out a straightforward translation of this pattern in C. I stored the input in a string buffer, passed that buffer to the test function, stored its output in another buffer and then called memcmp() on these two buffers. While this did prove to be a working copy, this work was found to be inadequate on the mailing list. Through the next several iterations, I reworked the logic several times, like comparing the input and output on-the-go rather than using buffers and replacing strings with macro definitions. The test scripts similarly perform other functions like checking for prerequisites, creating commits, initializing repositories, changing or creating directories and so forth, and custom logic is required in most of the cases of translating these, as seen above. • Run the resulting test, correct any errors: It is rare for any migrated test to work correctly on the first run. This step involves resolving any compile/runtime errors arising from the test and making sure that at the very minimum, all the test-cases of the original test are replicated in the new work. Improvements upon the original can also be made, for example, the original t0009-prio-queue.sh did not exercise the reverse stack functionality of prio-queue, which I improved upon in unit-tests/t-prio-queue. • Send the resulting patch to the mailing list, respond to the feedback: This step involves writing a meaningful commit message explaining each patch in the series. From my experience contributing to the Git project, I find it to be rare for any patch series to be accepted in the very first iteration. Feedback from the community is vital for the refinement of any patch and must be addressed by performing the suggested changes and sending the work back to the mailing list. This must be repeated until the work is merged into ‘seen’, ‘next’ and further down, ‘master’. Timeline -------- I’m confident that I can start the project as early as the start of the Community Bonding Period (May 1 - 26). This is because I have read the related documentation and already have some experience with the idea. I believe I’ll be ready to get up to speed to work on the project by then. The exact time arrangement for each test is variable and hard to determine, but judging from the fact that it took me 3-4 days to complete the first iteration of the t-prio-queue work, here is a proposed migration schedule: The first few steps of the plan are easy enough to knock out in a day, the time required to port the tests depends mostly upon the work required in translating the shell script. As mentioned previously, it took me 3-4 days to complete the first iteration of the test-prio-queue migration patch and that was a short test with only about 50 or so lines of shell scripting and all the test cases following a single pattern. Considering all this, I believe it should be possible, on average, to migrate a new test in 4-7 days. From there, it’s a matter of polishing the patch until integration with ‘master’ by addressing the feedback on the mailing list which can deceptively take longer than expected. For instance, I had to continue refining my t-prio-queue patch for around 2 weeks after the first iteration to get it merged to ‘next’. Hence, it should be possible for me to migrate >=8 tests throughout the duration of this project. Blogging -------- I plan on writing weekly blogs on the weekends summarizing my work and outlining future plans here (link: https://chand-ra.github.io/) (I have yet to set up the blog). This is because I believe jotting ideas down help you understand them better while simultaneously serving as a guiding path for new contributors to get started with the Git project. I learnt quite a lot from previous intern’s blogs like Shaoxuan Yuan’s GSoC 2022 Blog (link: https://ffyuanda.github.io/), Shuqi Lang’s GSoC 2023 blog (link: https://cheskaqiqi.github.io/) and Achu Luma’s Outreachy 2023 Blog (link: https://gitlab.com/lumap/lumap.gitlab.io) and plan on leaving something similar for other newcomers. Availability ------------ My summertime is reserved for GSoC, so I expect that I can work on a new test 5 days per week, 6-8 hours per day, that is 30-40 hours a week. On the weekends, I would like to solely work on the feedback from the mailing list and advance my [WIP] patches. Certainly, something unexpected may arise, but I will try my best to adhere to this time commitment and be always available through the community’s mailing list. Post GSoC & Closing Remarks --------------------------- When I first started contributing to the Git project in October of 2023, I had no idea about programmes like GSoC. I was advised by a senior of mine to contribute to open-source projects and hence, my aim of contribution was to apply what I had learnt in college to solve real-world problems and learn from more experienced peers. However, most of what I have contributed to Git has been trivial owing to my lack of skills and inexperience with the project. Seeing how I need to do an internship in summer, with GSoC, I hope to be able to dedicate this internship time and effort to a cool project like Git while simultaneously learning skills to be able to make more useful contributions in the future. It’s two birds with one stone. I would also like to keep working on this project to see it to completion post-GSoC and help mentor other newcomers get started with the Git project. Thanks & Regards, Chandra ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal v2: Move more tests to the unit testing framework 2024-03-29 6:54 ` [RFC][GSoC] Proposal v2: Move more " Chandra Pratap @ 2024-03-29 17:51 ` Karthik Nayak 0 siblings, 0 replies; 8+ messages in thread From: Karthik Nayak @ 2024-03-29 17:51 UTC (permalink / raw) To: Chandra Pratap, Patrick Steinhardt, Christian Couder Cc: git, kaartic.sivaraam [-- Attachment #1: Type: text/plain, Size: 7131 bytes --] Chandra Pratap <chandrapratap3519@gmail.com> writes: Hello, > -----Contributions----- Nit: Not sure how these contributions are sorted, but it would be nice to group the git-for-windows patches together. > In GSoC > ------- > > -----Plan----- > > Tests for Git are defined in the t/ directory and use the combination of > a helper file (written in C) and a shell script that invokes the said > helper file. I will use my work from the patch ‘tests: Move > t0009-prio-queue.sh to the unit testing framework’ to explain the steps > involved in the porting of such tests: > > • Search for a suitable test to port: > > As Christian Couder mentioned in this mail (link: https://public-inbox.org/git/CAP8UFD22EpdBU8HJqFM+=75EBABOTf5a0q+KsbzLK+XTEGSkPw@mail.gmail.com/), > there exists a subset of t/helper worth porting and we need some sort of > classification to discern these. > > All helper files contain a cmd__foo() function which acts as the entry > point for that helper tool. For example, the helper/test-prio-queue.c > file contained cmd__prio_queue() which served as the entry point for > that file. The binary for the helper file is then mapped to a different > name by helper/test-tool.c which is used by the ‘*.sh’ files to perform > the tests. This name can be discovered by searching for the helper > file’s entry point in test-tool.c. Continuing the prior example, > “prio-queue” was the name for the helper/test-prio-queue.c binary and > t0009-prio-queue.sh invoked it like “prio-queue 1 2 get 3 dump”. > It is really nice to provide an example, but you're referring to files which used to exist here, which can be a bit confusing. Would be nice to talk about a new helper file you plan to port and what is the general flow you'd follow there. > • Create a new C test file in t/unit-tests: > > After finding a test appropriate for the migration efforts, we create a > new ‘*.c’ file in t/unit-tests. The test file must be named appropriately > to reflect the nature of the tests it is supposed to perform. > Most of the times, replacing ‘tXXXX’ with ‘t-‘ and ‘*.sh’ with ‘.c’ in > the name of the test script suffices. E.g. t/t0009-prio-queue.sh turns > to t/unit-tests/t-prio-queue.c. > The new C file must #include “test-lib.h” (to be able to use the unit > testing framework) and other necessary headers files. > > • Move the code from the helper file: > > Since the helper files are written in C, this step is mostly a > ‘copy-paste then rename’ job. Changes similar to the following also need > to be made in the Makefile: > - TEST_BUILTINS_OBJS += test-prio-queue.o > + UNIT_TEST_PROGRAMS += t-prio-queue > This has a good potential for refactoring these files as they're added to the new unit testing library. > • Translate the shell script: > > The trickiest part of the plan, since different test scripts perform > various functions and a direct translation of the scripts to C is not > always optimal. Continuing the prior example, t0009-prio-queue.sh used a > single pattern for testing, write expected output to a temporary file > (named ‘expect’) -> feed input to the ‘prio-queue’ helper tool -> dump its > output to another temporary file (named ‘actual’) -> compare the two files > (‘actual’ vs ‘expect’). > > In the first iteration of my prio-queue patch, I worked out a > straightforward translation of this pattern in C. I stored the input in > a string buffer, passed that buffer to the test function, stored its > output in another buffer and then called memcmp() on these two buffers. > While this did prove to be a working copy, this work was found to be inadequate > but why? It is important to state why this is not the best approach. > The test scripts similarly perform other functions like checking for > prerequisites, creating commits, initializing repositories, changing or > creating directories and so forth, and custom logic is required in most > of the cases of translating these, as seen above. > > • Run the resulting test, correct any errors: > > It is rare for any migrated test to work correctly on the first run. > This step involves resolving any compile/runtime errors arising from the > test and making sure that at the very minimum, all the test-cases of the > original test are replicated in the new work. Improvements upon the original > can also be made, for example, the original t0009-prio-queue.sh did not > exercise the reverse stack functionality of prio-queue, which I improved > upon in unit-tests/t-prio-queue. > > • Send the resulting patch to the mailing list, respond to the feedback: > > This step involves writing a meaningful commit message explaining each patch > in the series. From my experience contributing to the Git project, I find it > to be rare for any patch series to be accepted in the very first iteration. > Feedback from the community is vital for the refinement of any patch and > must be addressed by performing the suggested changes and sending the work > back to the mailing list. This must be repeated until the work is merged > into ‘seen’, ‘next’ and further down, ‘master’. > > Timeline > -------- > > I’m confident that I can start the project as early as the start of the > Community Bonding Period (May 1 - 26). This is because I have read > the related documentation and already have some experience with the idea. > I believe I’ll be ready to get up to speed to work on the project by then. > The exact time arrangement for each test is variable and hard to determine, > but judging from the fact that it took me 3-4 days to complete the first > iteration of the t-prio-queue work, here is a proposed migration schedule: > I think the Community Bonding Period could also be utilized to go over the patches submitted to date to move tests to the unit test framework and draw learning's from them. > The first few steps of the plan are easy enough to knock out in a day, > the time required to port the tests depends mostly upon the work > required in translating the shell script. As mentioned previously, it > took me 3-4 days to complete the first iteration of the test-prio-queue > migration patch and that was a short test with only about 50 or so lines > of shell scripting and all the test cases following a single pattern. > Considering all this, I believe it should be possible, on average, to > migrate a new test in 4-7 days. > From there, it’s a matter of polishing the patch until integration with > ‘master’ by addressing the feedback on the mailing list which can > deceptively take longer than expected. For instance, I had to continue > refining my t-prio-queue patch for around 2 weeks after the first > iteration to get it merged to ‘next’. > This is an important point, while you can probably have tests being converted fast, it would be nicer to reduce the turn around time overall and this could mean you spend more time at the start. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal v2: Move and improve reftable tests in the unit testing framework 2024-03-21 12:27 ` Patrick Steinhardt 2024-03-29 6:54 ` [RFC][GSoC] Proposal v2: Move more " Chandra Pratap @ 2024-03-29 7:08 ` Chandra Pratap 2024-04-02 10:01 ` Christian Couder 1 sibling, 1 reply; 8+ messages in thread From: Chandra Pratap @ 2024-03-29 7:08 UTC (permalink / raw) To: Patrick Steinhardt, Christian Couder; +Cc: git, karthik.188, kaartic.sivaraam Thanks for the feedback, Christian and Patrick! With your advice, I decided to split my original proposal into two to conform to what was suggested by the SoC 2024 Ideas page. This is the proposal for the reftable tests migration project. However, I am unsure of what would be a good project size for this project. I have quite a long summer vacation and don't really have any other plans other than GSoC as of now so I decided to go with large project size on GSoC's website but please let me know if another size would be more appropriate. ---------<8----------<8----------<8----------<8----------<8----------<8 Personal Info ------------- Full name: Chandra Pratap Preferred name: Chand E-mail: chandrapratap3519@gmail.com Phone: (+91)77618-24030 Education: SV National Institute of Technology, Surat, India Year: Sophomore (2nd) Major: Mathematics GitHub: https://github.com/Chand-ra Before GSoC ----------- -----Synopsis----- A new unit testing framework was introduced to the Git mailing list last year with the aim of simplifying testing and improving maintainability by moving the current testing setup from shell scripts and helper files to a framework written wholly in C. The idea was accepted and merged into master on 09/11/2023. The choice of testing framework and the rationale behind the choice is thoroughly described in Documentation/technical/unit-tests.txt. This project aims to extend that work by moving the reftable tests from the current setup to the new unit testing framework and improving the tests themselves. The difficulty for the project should be medium and it should take somewhat around 175-350 hours. -----Contributions----- • apply.c: make git apply respect core.fileMode settings -> Status: merged into master -> link: https://public-inbox.org/git/20231226233218.472054-1-gitster@pobox.com/ -> Merge commit: cf47fb7ec7e183a1a1e521a540862fba3c2a89eb -> Description: When applying a patch that adds an executable file, git apply ignores the core.fileMode setting (core.fileMode specifies whether the executable bit on files in the working tree are to be honored or not) resulting in false warnings. Fix this by inferring the correct file mode from the existing index entry when core.filemode is set to false. Add a test case that verifies the change and prevents future regression. -> Remarks: This was the first patch I worked on as a contributor to Git. Served me as an essential intro lesson to the community’s working flow and general practices. • tests: Move t0032-reftable-unittest.sh to the unit testing framework -> Status: WIP (GitGitGadget PR) -> link: https://github.com/gitgitgadget/git/pull/1698 -> Description: t/t0032-reftable-unittest.sh along with t/helper/test-reftable.c unit test Git’s reftable framework. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. -> Remarks: An infant version as of now, I tinkered with this after seeing the project list on Git’s SoC 2024 Ideas page to get an idea of the kind of work that will be involved in this project. Probably the most relevant patch out of all the ones mentioned here. • tests: Move t0009-prio-queue.sh to the unit testing framework -> Status: merged into master -> link: https://public-inbox.org/git/pull.1642.v4.git.1705865326185.gitgitgadget@gmail.com/ -> Merge commit: 107023e1c9f981476c505e73eab319db6534a536 -> Description: t/t0009-prio-queue.sh along with t/helper/test-prio-queue.c unit test Git's implementation of a priority queue. Migrate the test over to the new unit testing framework to simplify debugging and reduce test run-time. • write-or-die: make GIT_FLUSH a Boolean environment variable -> Status: merged into master -> link: https://public-inbox.org/git/pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com/ -> Merge commit: b3049bbb97c9c0d0292bc9239e976cc661961f39 -> Description: Among Git's environment variables, the ones marked as "Boolean" accept values like Boolean configuration variables, i.e. values like 'yes', 'on', 'true' and positive numbers are taken as "on" and values like 'no', 'off','false' are taken as "off". Make GIT_FLUSH accept more values besides '0' and '1' by turning it into a Boolean environment variable & update the related documentation. • sideband.c: remove redundant NEEDSWORK tag -> Status: merged into master -> link: https://public-inbox.org/git/pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com/ -> Merge commit: 6484eb9a97fe3cd81a2d5711183b93494e6ddefa -> Description: Replace a misleading NEEDSWORK tag in sideband.c that reads ‘replace int with size_t’ with another comment explaining why it is fine to use ‘int’ and the replacement isn’t necessary. • make tig callable from PowerShell/Command Prompt -> Status: merged into main (Git for Windows) -> link: https://github.com/git-for-windows/MINGW-packages/pull/104 -> Merge commit: 8eb71eb31c6a1b5d41b253c1ab66d981cc448cb0 -> Description: Tig is a text mode interface for Git that ships with the standard Git for Windows package but isn’t callable from PowerShell/ Command Prompt by default. Fix this by updating the relevant Makefiles and resource scripts. • fix broken link on Git for Windows’ GitHub wiki -> Status: merged (Git for Windows) -> link: https://github.com/git-for-windows/git/wiki/Home/_history -> Merge commit: 0df663304bce986da6571cca48b34508d4823a11 -> Remarks: A simple fix for a broken link that I stumbled upon while browsing Git for Windows’ wiki looking for some help with the patch mentioned just before this one. • t4129: prevent loss of exit codes due to the use of pipes -> Status: merged into master -> link: https://public-inbox.org/git/pull.1636.git.1704891257544.gitgitgadget@gmail.com/ -> Merge commit: 1b095626937a858e3a68e9c7d1de11b71117bb42 -> Description: Piping the output of git commands like git-ls-files to another command (grep in this case) in t4129 hides the exit code returned by these commands. Prevent this by storing the output of git-ls-files to a temporary file and then "grep-ping" from that file. Replace grep with test_grep as the latter is more verbose when it fails. • t9146: replace test -d/-f with appropriate test_path_is_* function -> Status: merged into master -> link: https://public-inbox.org/git/pull.1661.v3.git.1707933048210.gitgitgadget@gmail.com/ -> Merge commit: 90c0c15e56fa761ae8c4cf5f5fe09b329c0a5dc5 -> Description: The helper functions test_path_is_* provide better debugging information than test -d/-e/-f. Replace tests -d/-e/-f with their respective ‘test_path_is_foo’ calls. • regex: update relevant files in compat/regex -> Status: WIP (GitGitGadget PR) -> link: https://github.com/gitgitgadget/git/pull/1641 -> Description: The RegEx code in compat/regex has been vendored from gawk and was last updated in 2010. This may lead to performance issues like high CPU usage. Fix this by synchronizing the relevant files in compat/regex with the latest version from GNULib and then replaying any changes we made to gawk’s version on top of the new files. -> Remarks: When I started working on this patch, I thought it was an easy fix but the work turned out to be more involved than I anticipated. I had to seek help from the other community members, and we have made some good progress, but there is still a lot of cleaning and changes that need to be done. I haven’t found enough time to commit to this again, but it’s surely something that I want to get done soon. • commit.c: ensure find_header_mem() doesn't scan beyond given range -> Status: Dropped -> Remarks: This was a patch addressing a NEEDSWORK comment in commit.c. It was dropped because René Scharfe found out a better way to fix the issue at hand than my approach or what the NEEDSWORK comment suggested. -----Related Work----- Prior work regarding tests migrations have been performed by other community members and previous interns which form a good guiding path for my own approach. Some previous example work: i) Port helper/test-ctype.c to unit-tests/t-ctype.c -> link: https://lore.kernel.org/git/20240112102743.1440-1-ach.lumap@gmail.com/ ii) Port test-sha256.c and test-sha1.c to unit-tests/t-hash.c -> link: https://lore.kernel.org/git/20240229054004.3807-2-ach.lumap@gmail.com/ iii) Port helper/test-date.c to unit-tests/t-date.c -> link: https://lore.kernel.org/git/20240205162506.1835-2-ach.lumap@gmail.com/ iv) Port test-strcmp-offset.c to unit-tests/t-strcmp-offset.c -> link: https://lore.kernel.org/git/20240310144819.4379-1-ach.lumap@gmail.com/ v) Integrate a simple strbuf unit test with Git's Makefiles -> link: https://lore.kernel.org/git/20230517-unit-tests-v2-v2-4-21b5b60f4b32@google.com/ vi) t0080: turn t-basic unit test into a helper -> link: https://lore.kernel.org/git/a9f67ed703c8314f0f0507ffb83b503717b846b3.1705443632.git.steadmon@google.com/ In GSoC ------- -----Background for reftable----- Git’s internals consist of mainly three objects: blobs, tree objects and commit objects. The blobs and tree objects are responsible for storing a repository’s content while the commit objects store information about commits in the repo and are responsible for capturing the repo’s history. Every one of these objects can be accessed through a unique key generated by a SHA-256 (previously SHA-1) algorithm. To make life easier, instead of remembering the hash key for commit objects, we can assign a simple name to them, store these names in a file and use that file whenever we need access to the commits. These names are called ‘references’ or ‘refs’. Since a repository can contain a lot of commits and branches and hence, a lot of refs, Git used packed-refs to save space by storing unused refs in a single file. However, this arrangement doesn’t scale well in terms of both space and performance. This is where reftable comes in. A reftable file is a portable binary file format customized for storing references. Some objectives of reftable are: - Sorted references enabling advanced scans like binary search. - Near constant time lookup for any single reference. - Efficient enumeration of an entire namespace like refs/tags/ - Combined reflog storage with ref storage for small transactions and separate reflog storage for base refs and historical logs. - Near constant time verification if an object name is referred to by at least one reference. -----Plan----- The reftable tests are different from other tests in the test directory because they perform unit testing with the help of a custom test framework rather than the usual ‘helper file + shell script’ combination. Reftable tests do have a helper file and a shell script invoking the helper file, but rather than performing the tests, this combination is used to invoke tests defined in the reftable directory. The reftable directory consists of nine tests: • basics test • record test • block test • tree test • pq test • stack test • merged test • refname test • read-write test Each of these tests is written in C using a custom reftable testing framework defined by reftable/test_framework (also written in C). The framework has four major features utilized in performing the tests: • EXPECT_ERR(c): A function-like macro that takes as input an integer ‘c’ (generally the return value of a function call), compares it against 0 and spits an error message if equality doesn’t hold. The error message itself contains information about the file where this macro was used, the line in this file where the macro was called and the error code ‘c’ causing the error. • EXPECT_STREQ(a, b): A function-like macro that takes as input two strings ‘a’ and ‘b’, compares them for equality via strcmp() and throws an error if equality doesn’t hold. The error message thrown contains information regarding the file where this macro was invoked, the line in this file where the macro was called and the mismatched strings ‘a’ and ‘b’. • EXPECT(c): A function-like macro that takes as input an integer ‘c’ (generally the result of a Boolean expression like a == b) and throws an error message if c == 0. The error message is similar to EXPECT_ERR(c). • RUN_TEST(f): A function-like macro that takes as input the name of a function ‘f’ (a test function that exercises a part of reftable’s code), prints to stdout the message ‘running f’ and then calls the function with f(). Other than these, the framework consists of two additional functions, set_test_hash() and strbuf_add_void() which are used exclusively in the stack tests and refname tests respectively. Since the reftable test framework is written in C like the unit testing framework, we can create a direct translation of the features mentioned above using the existing tools in the unit testing framework with the following plan: • EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0). • EXPECT_STREQ(a, b): Can be replaced by check_str(a, b). • EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR. E.g. expect(a >= b) --> check_int(a, “>=”, b) • RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”). The information contained in the diagnostic messages of these macros is replicated in the unit testing framework by default. Any additional information can be displayed using the test_msg() functionality in the framework. The additional functions set_test_hash() and strbuf_add_void() may be moved to reftable/stack.c and reftable/refname.c respectively. The plan itself is basic and does need improvements, but using this plan, I have already created a working albeit primitive copy for two out of the nine tests (basics test and tree test) as can be seen here: (link: https://github.com/gitgitgadget/git/pull/1698) -----Improvements----- As Patrick Steinhardt mentioned in this mail (link: https://public-inbox.org/git/ZfwnrL6Zl_lcV09y@tanuki/), apart from the port to the unit testing framework, the reftable tests could use some more refinement. This can be done alongside the test migration efforts and can involve: • Improving documentation for the tests: The reftable tests suffer from a lack of documentation. This might be fine for the simpler tests like basics tests and tree tests but for the more complex tests, more documentation means easier on-boarding to the reftable sub-project for newer people as well as ease of maintenance. This is something that could be worked on during this project. • Match coding style with rest of the project: There are parts of reftable’s code not in accordance with Git’s coding style. As a simple example, this snippet in reftable/basics_test.c (lines 41-48) doesn’t follow the coding conventions: if (res < sz) { foo; if (res > 0) { bar; } } else { baz; } And should be re-written as: if (res < sz) { foo; if (res > 0) bar; } else baz; It should be well worth it to refactor other parts of the tests to align with Git’s coding standards through this project. • Increase test coverage: There may be parts of reftable’s code that are not currently exercised by the existing tests, or edge cases being overlooked by the current setup. This could also be improved upon during this project. As an example, while working on the t-prio-queue migration patch, I discovered that prio-queue’s reverse stack functionality was not tested by the existing setup, which I improved upon by adding a new test case to the prio-queue test in the unit testing framework. • Other than these, working on the tests might uncover bugs or there might exist issues within the vicinity of this project that could be worked upon. Timeline -------- -----Pre-GSoC (Until May 1)----- My end-semester examinations take place during this period, so it would be difficult for me to allocate time to much else. Nevertheless, I wish to keep working on my [WIP] patches and finish reading the reftable documentation (link: https://git-scm.com/docs/reftable) during this period. -----Community Bonding (May 1 - May 26)----- Continue learning in-depth about the reftable sub-project with the help of mentors, contribute a few patches related to reftable to familiarize myself with the code. Finish the first iteration of the test migration project using the plan outlined prior. -----Phase I (May 27 - July 11)----- Polish the new tests in the unit testing framework using the points mentioned above and mentors’ advice. Send the resulting tests to the mailing list and work on the feedback received from the community. At the very least, get the ported tests merged to ‘seen’ before the end of this period. Continue learning and experimenting with the rest of reftable’s codebase. -----Phase II (July 12 - Aug 18)----- Continue working on refining the tests until merge with ‘master’. If that is already within sight, focus on extending the reftable tests and improving documentation. Work on other reftable related tasks in the vicinity of this project. -----Final Week (Aug 19 - Aug 26)----- Finish up the final touches on any of the work done during the project, this involves ensuring the said work is merged with upstream. Write a final report on the work accomplished during the project and outline future goals. Blogging -------- I plan on writing weekly blogs on the weekends summarizing my work and outlining future plans here (link: https://chand-ra.github.io/) (I have yet to set up the blog). This is because I believe jotting ideas down help you understand them better while simultaneously serving as a guiding path for new contributors to get started with the Git project. I learnt quite a lot from previous intern’s blogs like Shaoxuan Yuan’s GSoC 2022 Blog (link: https://ffyuanda.github.io/), Shuqi Lang’s GSoC 2023 blog (link: https://cheskaqiqi.github.io/) and Achu Luma’s Outreachy 2023 Blog (link: https://gitlab.com/lumap/lumap.gitlab.io) and plan on leaving something similar for other newcomers. Availability ------------ My summertime is reserved for GSoC, so I expect that I can work on the project 5 days per week, 6 - 8 hours per day, that is 30 - 40 hours a week. Certainly, something unexpected may arise, but I will try my best to adhere to this time commitment and be always available through the community’s mailing list. Post GSoC & Closing Remarks --------------------------- When I first started contributing to the Git project in October of 2023, I had no idea about programs like GSoC. I was advised by a senior of mine to contribute to open-source projects and hence, my aim of contribution was to apply what I had learnt in college to solve real-world problems and learn from more experienced peers. However, most of what I have contributed to Git has been trivial owing to my lack of skills and inexperience with the project. Seeing how I need to do an internship in summer, with GSoC, I hope to be able to dedicate this internship time and effort to a cool project like Git while simultaneously learning skills to be able to make more useful contributions in the future. It’s two birds with one stone. I would also like to keep contributing to reftable after GSoC and help mentor other newcomers get started with the Git project. Thanks & Regards, Chandra References ---------- - The Pro Git book: https://git-scm.com/book/en/v2 - Git’s reftable documentation: https://git-scm.com/docs/reftable - Shaoxuan Yuan’s 2022 GSoC proposal: https://docs.google.com/document/d/1VZq0XGl-wCxECxJamKN9PGVPXaBhszQWV1jaIIvlFjE/edit ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][GSoC] Proposal v2: Move and improve reftable tests in the unit testing framework 2024-03-29 7:08 ` [RFC][GSoC] Proposal v2: Move and improve reftable tests in " Chandra Pratap @ 2024-04-02 10:01 ` Christian Couder 0 siblings, 0 replies; 8+ messages in thread From: Christian Couder @ 2024-04-02 10:01 UTC (permalink / raw) To: Chandra Pratap; +Cc: Patrick Steinhardt, git, karthik nayak, Kaartic Sivaraam On Fri, Mar 29, 2024 at 8:08 AM Chandra Pratap <chandrapratap3519@gmail.com> wrote: > > Thanks for the feedback, Christian and Patrick! With your advice, I > decided to split my original proposal into two to conform to what was > suggested by the SoC 2024 Ideas page. > > This is the proposal for the reftable tests migration project. However, > I am unsure of what would be a good project size for this project. > I have quite a long summer vacation and don't really have any other > plans other than GSoC as of now so I decided to go with large project > size on GSoC's website but please let me know if another size would > be more appropriate. On our Idea page, we have "Expected Project Size: 175 hours or 350 hours" for this project. So it's fine for us if you pick one of those (medium or large). > Before GSoC > ----------- > > -----Synopsis----- > > A new unit testing framework was introduced to the Git mailing list last > year with the aim of simplifying testing and improving maintainability > by moving the current testing setup from shell scripts and helper files > to a framework written wholly in C. The idea was accepted and merged > into master on 09/11/2023. The choice of testing framework and the > rationale behind the choice is thoroughly described in > Documentation/technical/unit-tests.txt. > > This project aims to extend that work by moving the reftable tests from > the current setup to the new unit testing framework and improving the > tests themselves. The difficulty for the project should be medium > and it should take somewhat around 175-350 hours. It's fine to give information about the project that is already in the Idea page (or even to just copy from that page), but we are more interested to know how you approach the project, and how you want to work on it. So if you give information that is already in the Idea page, please say it clearly, so we can easily skip reading that if we don't have much time. As we prefer that you give information specific to your approach, I think it would have been more interesting to say that you decided to go with a "large" project size for example. > In GSoC > ------- > > -----Background for reftable----- > > Git’s internals consist of mainly three objects: blobs, tree objects and > commit objects. The blobs and tree objects are responsible for storing a > repository’s content while the commit objects store information about > commits in the repo and are responsible for capturing the repo’s > history. When explaining refs with as much detail, I think tag objects are important too, as refs often point to them too. On the contrary, blobs and tree objects are very rarely pointed to by refs. So it's a bit strange that you don't talk about tag objects, but that you talk about blobs and trees which are less important than tag objects in this context. > Every one of these objects can be accessed through a unique key > generated by a SHA-256 (previously SHA-1) algorithm. Actually both SHA-256 and SHA-1 are supported. > To make life > easier, instead of remembering the hash key for commit objects, we can > assign a simple name to them, store these names in a file and use that > file whenever we need access to the commits. These names are called > ‘references’ or ‘refs’. > > Since a repository can contain a lot of commits and branches and hence, > a lot of refs, Git used packed-refs to save space by storing unused refs > in a single file. However, this arrangement doesn’t scale well in terms > of both space and performance. This is where reftable comes in. A > reftable file is a portable binary file format customized for storing > references. Some objectives of reftable are: > - Sorted references enabling advanced scans like binary search. > - Near constant time lookup for any single reference. > - Efficient enumeration of an entire namespace like refs/tags/ > - Combined reflog storage with ref storage for small transactions and > separate reflog storage for base refs and historical logs. > - Near constant time verification if an object name is referred to by at > least one reference. > > -----Plan----- > > The reftable tests are different from other tests in the test directory Not sure which "test directory" you are talking about. I guess it's "t/". > because they perform unit testing with the help of a custom test framework > rather than the usual ‘helper file + shell script’ combination. There are actually already unit tests in t/unit-tests which don't need a shell script. I think it would have been clearer if this paragraph was started with something like "Compared to unit tests that use both a "t/*.sh" test script in shell and a test-tool helper in "t/helper/", ..." > Reftable tests do have a helper file and a shell script invoking the > helper file, but rather than performing the tests, this combination is > used to invoke tests defined in the reftable directory. It looks like you are talking about the "reftable/*_test.c" files, but I think it would be clearer if you spelled that out. > The reftable directory consists of nine tests: I think using "contains" instead of "consists" would be a bit better, as there are other things than the nine tests in the "reftable/" directory. > > • basics test > • record test > • block test > • tree test > • pq test > • stack test > • merged test > • refname test > • read-write test Yeah, this corresponds to the output of `ls reftable/*_test.c`. > Each of these tests is written in C using a custom reftable testing > framework defined by reftable/test_framework (also written in C). Using "reftable/test_framework.{c,h}" would be a bit clearer as there is no "reftable/test_framework" file. > Since the reftable test framework is written in C like the unit testing > framework, we can create a direct translation of the features mentioned > above using the existing tools in the unit testing framework with the > following plan: > > • EXPECT_ERR(c): Can be replaced by check(!c) or check_int(c, “==”, 0). > > • EXPECT_STREQ(a, b): Can be replaced by check_str(a, b). > > • EXPECT(c): Can be replaced by check_int(), similar to EXPECT_ERR. > E.g. expect(a >= b) --> check_int(a, “>=”, b) > > • RUN_TEST(f): Can be replaced by TEST(f(), “message explaining the test”). > > The information contained in the diagnostic messages of these macros is > replicated in the unit testing framework by default. Any additional > information can be displayed using the test_msg() functionality in the > framework. The additional functions set_test_hash() and strbuf_add_void() > may be moved to reftable/stack.c and reftable/refname.c respectively. > > The plan itself is basic and does need improvements, but using this plan, > I have already created a working albeit primitive copy for two out of the > nine tests (basics test and tree test) as can be seen here: > (link: https://github.com/gitgitgadget/git/pull/1698) Nice, but it looks like "unit-tests/bin/t-reftable-tree" sometimes fails, perhaps because it's missing a call to test_done(). The rest of your proposal LGTM. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-04-02 10:02 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-19 17:11 [RFC][GSoC] Proposal: Move reftable and other tests to the unit testing framework Chandra Pratap 2024-03-20 13:17 ` Christian Couder 2024-03-20 19:35 ` Chandra Pratap 2024-03-21 12:27 ` Patrick Steinhardt 2024-03-29 6:54 ` [RFC][GSoC] Proposal v2: Move more " Chandra Pratap 2024-03-29 17:51 ` Karthik Nayak 2024-03-29 7:08 ` [RFC][GSoC] Proposal v2: Move and improve reftable tests in " Chandra Pratap 2024-04-02 10:01 ` Christian Couder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).