public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Luca Di Maio <luca.dimaio1@gmail.com>
Cc: linux-xfs@vger.kernel.org, dimitri.ledkov@chainguard.dev,
	smoser@chainguard.dev, hch@infradead.org
Subject: Re: [PATCH v6 2/4] populate: add ability to populate a filesystem from a directory
Date: Thu, 24 Apr 2025 15:00:41 -0700	[thread overview]
Message-ID: <20250424220041.GK25675@frogsfrogsfrogs> (raw)
In-Reply-To: <vmiujkqli3d4c7ohgegpxvwacowl2tdaps6m4wyvwh6dcfado7@csca7fs5y7ss>

On Thu, Apr 24, 2025 at 06:09:45PM +0200, Luca Di Maio wrote:
> On Wed, Apr 23, 2025 at 01:23:58PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 23, 2025 at 06:03:17PM +0200, Luca Di Maio wrote:
> > > +static void fail(char *msg, int i)
> > > +{
> > > +   fprintf(stderr, _("%s: %s [%d - %s]\n"), progname, msg, i, strerror(i));
> > > +   exit(1);
> > > +}
> > > +
> > > +static int newregfile(char *fname)
> > > +{
> > > +   int fd;
> > > +   off_t size;
> > > +
> > > +   if ((fd = open(fname, O_RDONLY)) < 0 || (size = filesize(fd)) < 0) {
> > > +           fprintf(stderr, _("%s: cannot open %s: %s\n"), progname, fname,
> > > +                   strerror(errno));
> > > +           exit(1);
> > > +   }
> > > +
> > > +   return fd;
> > > +}
> >
> > Why is this copy-pasting code from proto.c?  Put the new functions
> > there, and then you don't need all this externing.
> >
> 
> Right, this is because with a separate flag I thought it would have been
> better to keep it in a separate file.

Common functionality goes together in a C module, regardless of how it
gets called.

> With the new behaviour you proposed in the previous mail (one -p flag,
> check if file/directory) then I can unify back into proto.c, thus
> removing all the exported functions changes.

<nod>

> > > +
> > > +static void writetimestamps(struct xfs_inode *ip, struct stat statbuf)
> > > +{
> > > +   struct timespec64 ts;
> > > +
> > > +   /*
> > > +    * Copy timestamps from source file to destination inode.
> > > +    *  In order to not be influenced by our own access timestamp,
> > > +    *  we set atime and ctime to mtime of the source file.
> > > +    *  Usually reproducible archives will delete or not register
> > > +    *  atime and ctime, for example:
> > > +    *     https://www.gnu.org/software/tar/manual/html_section/Reproducibility.html
> > > +    */
> > > +   ts.tv_sec = statbuf.st_mtime;
> > > +   ts.tv_nsec = statbuf.st_mtim.tv_nsec;
> > > +   inode_set_atime_to_ts(VFS_I(ip), ts);
> > > +   inode_set_ctime_to_ts(VFS_I(ip), ts);
> > > +   inode_set_mtime_to_ts(VFS_I(ip), ts);
> >
> > This seems weird to me that you'd set [ac]time to mtime.  Why not open
> > the source file O_ATIME and copy atime?  And why would copying ctime not
> > result in a reproducible build?
> >
> > Not sure what you do about crtime.
> >
> 
> The problem stems from the extraction of the artifact. Usually
> reproducible archives will remove [ac]time and only keep mtime, but in
> the moment that a file is extracted, any filesystem will assign [ac]time
> to the moment of extraction.
> This will add randomness not to the filesystem itself, because it will
> be reproducible if acting on the same extracted archive, but it will not
> be reproducible if acting on a new extraction of the same archive.
> 
> Another approach we can do is what mkfs.ext4's populate functionality is
> doing: while it preserves mtime, [cr,a,c]time is set to whatever time the
> mkfs command is running.
> 
> This would make it preserve the important timestamp (mtime) and move the
> "problem" of the reproducible/changing timestamp to the environment,
> while keeping the behaviour of mkfs.xfs sensible
> 
> What do you think?

The thing is, if you were relying on atime/mtime for detection of "file
data changed since last read" then /not/ copying atime into the
filesystem breaks that property in the image.

How about copying [acm]time from the source file by default, but then
add a new -p noatime option to skip the atime?

ctime/crtime should be the current time when mkfs command is running.
I assume that you have a gettimeofday type wrapper that makes it always
return the same value?

> > > +   /*
> > > +    * copy over file content, attributes and
> > > +    * timestamps
> > > +    */
> > > +   if (fd != 0) {
> > > +           writefile(ip, fname, fd);
> > > +           writeattrs(ip, fname, fd);
> >
> > Since we're adding features, should this read the fsxattr info from the
> > source file, override it with the set fields in *fsxp, and set that on
> > the file?  If you're going to slurp up a directory, you might as well
> > get all the non-xattr file attributes.
> >
> 
> Right, I thought creatproto() did that, but now I see that this is done
> only for the root inode, I'll add this for others too, thanks.

Right.

> > > +           libxfs_parent_finish(mp, ppargs);
> > > +           tp = NULL;
> >
> > Shouldn't this copy xattrs and fsxattrs to directories and symlinks too?
> >
> 
> Right, will add, thanks.

<nod>

> > > +/*
> > > + * walk_dir will recursively list files and directories
> > > + * and populate the mountpoint *mp with them using handle_direntry().
> > > + */
> > > +static void walk_dir(struct xfs_mount *mp, struct xfs_inode *pip,
> > > +                       struct fsxattr *fsxp, char *cur_path)
> > > +{
> > > +   DIR *dir;
> > > +   struct dirent *entry;
> > > +
> > > +   /*
> > > +    * open input directory and iterate over all entries in it.
> > > +    * when another directory is found, we will recursively call
> > > +    * populatefromdir.
> > > +    */
> > > +   if ((dir = opendir(cur_path)) == NULL)
> > > +           fail(_("cannot open input dir"), 1);
> > > +   while ((entry = readdir(dir)) != NULL) {
> > > +           handle_direntry(mp, pip, fsxp, cur_path, entry);
> > > +   }
> > > +   closedir(dir);
> > > +}
> >
> > nftw() ?  Which has the nice feature of constraining the number of open
> > dirs at any given time.
> >
> > --D
> >
> 
> The problem with nftw() is that working with callback functions, we will
> need to switch to static variables for state, for example to keep track
> of each ip's pip, while with the recursive approach we can have some
> state and basically walk_dir() behaves similar to parseproto(), making
> changes to the rest of the file minimal.
> This seems to involve a lot more changes than now where we're basically
> just adding a limited number of functions to proto.c.

Eck, ok.  Never mind then.  I guess we could try to bump RLIMIT_NOFILE
in that case to avoid EMFILE.

--D

> Thanks again for the review Darrick,
> I'll wait for your feedback on the walk_dir() vs nftw() and the [ac]time
> approach,
> thanks
> 
> L.
> 

  reply	other threads:[~2025-04-24 22:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23 16:03 [PATCH v6 0/4] mkfs: add ability to populate filesystem from directory Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 1/4] proto: expose more functions from proto Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 2/4] populate: add ability to populate a filesystem from a directory Luca Di Maio
2025-04-23 20:23   ` Darrick J. Wong
2025-04-24 16:09     ` Luca Di Maio
2025-04-24 22:00       ` Darrick J. Wong [this message]
2025-04-25 13:10         ` Christoph Hellwig
2025-04-25 15:00           ` Darrick J. Wong
2025-04-25 17:58             ` Luca Di Maio
2025-04-23 16:03 ` [PATCH v6 3/4] mkfs: add -P flag " Luca Di Maio
2025-04-23 20:09   ` Darrick J. Wong
2025-04-24 12:01     ` Luca Di Maio
2025-04-24 21:55       ` Darrick J. Wong
2025-04-23 16:03 ` [PATCH v6 4/4] man: document " Luca Di Maio
2025-04-23 20:03 ` [PATCH v6 0/4] mkfs: add ability to populate filesystem from directory Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250424220041.GK25675@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=dimitri.ledkov@chainguard.dev \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luca.dimaio1@gmail.com \
    --cc=smoser@chainguard.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox