From: "Darrick J. Wong" <djwong@kernel.org>
To: Luca Di Maio <luca.dimaio1@gmail.com>
Cc: linux-xfs@vger.kernel.org, dimitri.ledkov@chainguard.dev,
smoser@chainguard.dev, hch@infradead.org
Subject: Re: [PATCH v6 2/4] populate: add ability to populate a filesystem from a directory
Date: Thu, 24 Apr 2025 15:00:41 -0700 [thread overview]
Message-ID: <20250424220041.GK25675@frogsfrogsfrogs> (raw)
In-Reply-To: <vmiujkqli3d4c7ohgegpxvwacowl2tdaps6m4wyvwh6dcfado7@csca7fs5y7ss>
On Thu, Apr 24, 2025 at 06:09:45PM +0200, Luca Di Maio wrote:
> On Wed, Apr 23, 2025 at 01:23:58PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 23, 2025 at 06:03:17PM +0200, Luca Di Maio wrote:
> > > +static void fail(char *msg, int i)
> > > +{
> > > + fprintf(stderr, _("%s: %s [%d - %s]\n"), progname, msg, i, strerror(i));
> > > + exit(1);
> > > +}
> > > +
> > > +static int newregfile(char *fname)
> > > +{
> > > + int fd;
> > > + off_t size;
> > > +
> > > + if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
> > > + fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, fname,
> > > + strerror(errno));
> > > + exit(1);
> > > + }
> > > +
> > > + return fd;
> > > +}
> >
> > Why is this copy-pasting code from proto.c? Put the new functions
> > there, and then you don't need all this externing.
> >
>
> Right, this is because with a separate flag I thought it would have been
> better to keep it in a separate file.
Common functionality goes together in a C module, regardless of how it
gets called.
> With the new behaviour you proposed in the previous mail (one -p flag,
> check if file/directory) then I can unify back into proto.c, thus
> removing all the exported functions changes.
<nod>
> > > +
> > > +static void writetimestamps(struct xfs_inode *ip, struct stat statbuf)
> > > +{
> > > + struct timespec64 ts;
> > > +
> > > + /*
> > > + * Copy timestamps from source file to destination inode.
> > > + * In order to not be influenced by our own access timestamp,
> > > + * we set atime and ctime to mtime of the source file.
> > > + * Usually reproducible archives will delete or not register
> > > + * atime and ctime, for example:
> > > + * https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html
> > > + */
> > > + ts.tv_sec = statbuf.st_mtime;
> > > + ts.tv_nsec = statbuf.st_mtim.tv_nsec;
> > > + inode_set_atime_to_ts(VFS_I(ip), ts);
> > > + inode_set_ctime_to_ts(VFS_I(ip), ts);
> > > + inode_set_mtime_to_ts(VFS_I(ip), ts);
> >
> > This seems weird to me that you'd set [ac]time to mtime. Why not open
> > the source file O_ATIME and copy atime? And why would copying ctime not
> > result in a reproducible build?
> >
> > Not sure what you do about crtime.
> >
>
> The problem stems from the extraction of the artifact. Usually
> reproducible archives will remove [ac]time and only keep mtime, but in
> the moment that a file is extracted, any filesystem will assign [ac]time
> to the moment of extraction.
> This will add randomness not to the filesystem itself, because it will
> be reproducible if acting on the same extracted archive, but it will not
> be reproducible if acting on a new extraction of the same archive.
>
> Another approach we can do is what mkfs.ext4's populate functionality is
> doing: while it preserves mtime, [cr,a,c]time is set to whatever time the
> mkfs command is running.
>
> This would make it preserve the important timestamp (mtime) and move the
> "problem" of the reproducible/changing timestamp to the environment,
> while keeping the behaviour of mkfs.xfs sensible
>
> What do you think?
The thing is, if you were relying on atime/mtime for detection of "file
data changed since last read" then /not/ copying atime into the
filesystem breaks that property in the image.
How about copying [acm]time from the source file by default, but then
add a new -p noatime option to skip the atime?
ctime/crtime should be the current time when mkfs command is running.
I assume that you have a gettimeofday type wrapper that makes it always
return the same value?
> > > + /*
> > > + * copy over file content, attributes and
> > > + * timestamps
> > > + */
> > > + if (fd != 0) {
> > > + writefile(ip, fname, fd);
> > > + writeattrs(ip, fname, fd);
> >
> > Since we're adding features, should this read the fsxattr info from the
> > source file, override it with the set fields in *fsxp, and set that on
> > the file? If you're going to slurp up a directory, you might as well
> > get all the non-xattr file attributes.
> >
>
> Right, I thought creatproto() did that, but now I see that this is done
> only for the root inode, I'll add this for others too, thanks.
Right.
> > > + libxfs_parent_finish(mp, ppargs);
> > > + tp = NULL;
> >
> > Shouldn't this copy xattrs and fsxattrs to directories and symlinks too?
> >
>
> Right, will add, thanks.
<nod>
> > > +/*
> > > + * walk_dir will recursively list files and directories
> > > + * and populate the mountpoint *mp with them using handle_direntry().
> > > + */
> > > +static void walk_dir(struct xfs_mount *mp, struct xfs_inode *pip,
> > > + struct fsxattr *fsxp, char *cur_path)
> > > +{
> > > + DIR *dir;
> > > + struct dirent *entry;
> > > +
> > > + /*
> > > + * open input directory and iterate over all entries in it.
> > > + * when another directory is found, we will recursively call
> > > + * populatefromdir.
> > > + */
> > > + if ((dir = opendir(cur_path)) == NULL)
> > > + fail(_("cannot open input dir"), 1);
> > > + while ((entry = readdir(dir)) != NULL) {
> > > + handle_direntry(mp, pip, fsxp, cur_path, entry);
> > > + }
> > > + closedir(dir);
> > > +}
> >
> > nftw() ? Which has the nice feature of constraining the number of open
> > dirs at any given time.
> >
> > --D
> >
>
> The problem with nftw() is that working with callback functions, we will
> need to switch to static variables for state, for example to keep track
> of each ip's pip, while with the recursive approach we can have some
> state and basically walk_dir() behaves similar to parseproto(), making
> changes to the rest of the file minimal.
> This seems to involve a lot more changes than now where we're basically
> just adding a limited number of functions to proto.c.
Eck, ok. Never mind then. I guess we could try to bump RLIMIT_NOFILE
in that case to avoid EMFILE.
--D
> Thanks again for the review Darrick,
> I'll wait for your feedback on the walk_dir() vs nftw() and the [ac]time
> approach,
> thanks
>
> L.
>
next prev parent reply other threads:[~2025-04-24 22:00 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 16:03 [PATCH v6 0/4] mkfs: add ability to populate filesystem from directory Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 1/4] proto: expose more functions from proto Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 2/4] populate: add ability to populate a filesystem from a directory Luca Di Maio
2025-04-23 20:23 ` Darrick J. Wong
2025-04-24 16:09 ` Luca Di Maio
2025-04-24 22:00 ` Darrick J. Wong [this message]
2025-04-25 13:10 ` Christoph Hellwig
2025-04-25 15:00 ` Darrick J. Wong
2025-04-25 17:58 ` Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 3/4] mkfs: add -P flag " Luca Di Maio
2025-04-23 20:09 ` Darrick J. Wong
2025-04-24 12:01 ` Luca Di Maio
2025-04-24 21:55 ` Darrick J. Wong
2025-04-23 16:03 ` [PATCH v6 4/4] man: document " Luca Di Maio
2025-04-23 20:03 ` [PATCH v6 0/4] mkfs: add ability to populate filesystem from directory Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250424220041.GK25675@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=dimitri.ledkov@chainguard.dev \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=luca.dimaio1@gmail.com \
--cc=smoser@chainguard.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox