From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
sandeen@sandeen.net, linux-xfs@vger.kernel.org, jack@suse.com,
jeffm@suse.com, okurz@suse.com, lpechacek@suse.com,
jtulak@redhat.com
Subject: Re: [PATCH v2 5/5] mkfs.xfs: add configuration file parsing support using our own parser
Date: Mon, 21 May 2018 08:30:17 -0700 [thread overview]
Message-ID: <20180521153017.GG23858@magnolia> (raw)
In-Reply-To: <20180521001434.GO23861@dastard>
On Mon, May 21, 2018 at 10:14:34AM +1000, Dave Chinner wrote:
> On Sat, May 19, 2018 at 03:32:24AM +0200, Luis R. Rodriguez wrote:
> > On Fri, May 18, 2018 at 10:44:04AM +1000, Dave Chinner wrote:
> > > On Thu, May 17, 2018 at 12:27:00PM -0700, Luis R. Rodriguez wrote:
> > > > --- a/mkfs/xfs_mkfs.c
> > > > +++ b/mkfs/xfs_mkfs.c
> > > > @@ -3077,11 +3078,13 @@ print_mkfs_cfg(
> > > > struct mkfs_params *cfg,
> > > > char *dfile,
> > > > char *logfile,
> > > > - char *rtfile)
> > > > + char *rtfile,
> > > > + struct mkfs_default_params *dft)
> > > > {
> > > > struct sb_feat_args *fp = &cfg->sb_feat;
> > > >
> > > > printf(_(
> > > > +"config-file=%-22s\n"
> > > > "meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
> > > > " =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
> > > > " =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u, reflink=%u\n"
> > > > @@ -3091,6 +3094,7 @@ print_mkfs_cfg(
> > > > "log =%-22s bsize=%-6d blocks=%lld, version=%d\n"
> > > > " =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
> > > > "realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"),
> > > > + dft->type != DEFAULTS_BUILTIN ? dft->config_file : "empty",
> > > > dfile, cfg->inodesize, (long long)cfg->agcount,
> > > > (long long)cfg->agsize,
> > > > "", cfg->sectorsize, fp->attr_version, fp->projid32bit,
> > >
> > > Haven't we already printed where the config was sourced from? Why do
> > > we need to print it again here?
> >
> > The other print describes the enum, this print prints out the *used*
> > config file path if a config file was actually used. Without the other
> > print this would just print either the config file or empty.
>
> If you look at the changes I proposed, I suggested we change the
> initial print out the path of the source file, not how the user
> specified the source file. So this becomes redundant. And given this
> code is now shared with xfs_info, it has no idea about mkfs config
> files, so it's not ideal to be adding mkfs-specific stuff to this
> output.
>
> Further....
>
> > Since we are going to remove the environment variable option, then
> > I think the other print is now not needed anymore and my preference
> > is to keep it here.
> >
> > Thoughts?
>
> .... this printout only happens after al input has been validated
> and we're about to make the filesystem. If there's a validation
> problem, nobody knows what file the defaults were sourced from.
> IOWs, the file the default are sourced from needs to be printed even
> before the file is read...
>
> > > > .sectorsize = XFS_MIN_SECTORSIZE,
> > > > .blocksize = 1 << XFS_DFL_BLOCKSIZE_LOG,
> > > > .sb_feat = {
> > > > @@ -3714,6 +3719,13 @@ static void process_defaults(
> > > > struct mkfs_default_params *dft,
> > > > struct cli_params *cli)
> > > > {
> > > > + int ret;
> > > > +
> > > > + ret = parse_config_init(dft);
> > > > +
> > > > + if (ret && dft->type != DEFAULTS_BUILTIN)
> > > > + usage();
> > >
> > > That doesn't make any sense in isolation.
> >
> > Not sure what you meant at first, but I see that you prefer this
> > to be hidden from the caller. Will change.
>
> I meant I have no idea what this is checking and why there's a usage
> call there because usage() is for CLI input, not setting up
> defaults before CLI processing. Explanation in comments are needed.
>
> > > > install_defaults(dft, cli);
> > > > }
> > > >
> > > > @@ -3750,6 +3762,8 @@ main(
> > > > };
> > > > struct mkfs_params cfg = {};
> > > > struct mkfs_default_params dft;
> > > > + char *tmp_config;
> > > > + char custom_config[PATH_MAX];
> > > >
> > > > reset_defaults_and_cli(&dft, &cli);
> > > >
> > > > @@ -3760,21 +3774,36 @@ main(
> > > > textdomain(PACKAGE);
> > > >
> > > > /*
> > > > - * TODO: Sourcing defaults from a config file
> > > > - *
> > > > * Before anything else, see if there's a config file with different
> > > > - * defaults. If a file exists in <package location>, read in the new
> > > > + * defaults. If a file exists in MKFS_XFS_CONF_DIR/default, read the new
> > > > * default values and overwrite them in the &dft structure. This way the
> > > > * new defaults will apply before we parse the CLI, and the CLI will
> > > > * still be able to override them. When more than one source is
> > > > * implemented, emit a message to indicate where the defaults being
> > > > * used came from.
> > > > */
> > >
> > > This comment makes no mention of environment variables, or how they
> > > interact with other sources of config files.
> >
> > As per Eric's request, We're getting rid of the environment variable option, so
> > mentioning it is no longer needed.
>
> I thought Darrick wanted it kept?
Eric and I argued about this for a while on irc, I think we settled on
allowing a single '-c $foo' arg where we parse openat($foo)...
> > > > + tmp_config = getenv("MKFS_XFS_CONFIG");
> > > > + if (tmp_config != NULL) {
> > > > + dft.config_file = tmp_config;
> > > > + dft.type = DEFAULTS_ENVIRONMENT_CONFIG;
> > > > + }
> > > >
> > > > process_defaults(&dft, &cli);
> > >
> > > So why are we processing the defaults before we've checked if a
> > > default file is specified on the command line? All this should do is
> > > set up the config file to be read, and set the dft.source =
> > > _("Environment Variable");
> >
> > This was being done *here* since I previously has setup to process the
> > arguments *early* and check if a -c was specified, and if so use the config
> > file for the defaults, otherwise the environment variable if set. But since
> > the way I had implemented processing the arguments early relied on not well
> > guaranteed mechanisms I resorted to avoid that approach.
> >
> > This was the alternative I came up with.
> >
> > The environment variable stuff is going away so this should be much simpler
> > now, however the question of argument processing early still remains.
> >
> > > >
> > > > - while ((c = getopt(argc, argv, "b:d:i:l:L:m:n:KNp:qr:s:CfV")) != EOF) {
> > > > + while ((c = getopt(argc, argv, "b:c:d:i:l:L:m:n:KNp:qr:s:CfV")) != EOF) {
> > > > switch (c) {
> > > > + case 'c':
> > > > + reset_defaults_and_cli(&dft, &cli);
> > >
> > > This will trash the existing CLI inputs that have already been
> > > parsed.
> >
> > Exactly.
> >
> > > All this code here should do is set up the config file
> > > to be read.
> >
> > Consider the case of /etc/mkfs.d/default existing and it having
> >
> > [metadata]
> > crc = 1
> > [naming]
> > ftype=1
> >
> > And suppose the user passed -c foo which had:
> >
> > [metadata]
> > crc = 0
> >
> > Without the reset we'd make /etc/mkfs.d/default the system wide defaults onto
> > which -c parameters overlay on top of. I figured that was not desirable.
>
> It's not desirable. If the user specified a config file, then
> /etc/mkfs.d/default *is never read*. The user config file is used
> instead. If the user supplied file does not exist (even after
> searching) then we fail, not use some other set of defaults.
Yeah. Less confusing to trace where a particular knobturn came from if
we only ever parse one config file.
--D
> > > Again, what happens if the user specifies multiple config files?
> >
> > The reset strategy lets the user set -c as many times as they wish and also
> > starts from a clean base always.
>
> multiple "-c" options is a user error and should error out.
>
> > > The second config file will trash everything from the first, as well
> > > as any other options between them.
> >
> > Yes! By design as we couldn't easily parse arguments early first and then
> > choose just one strategy.
> >
> > I liked the simplicity that this brings.
> >
> > Would you really want multiple -c options to work as overlays, one on top of
> > the other?
>
> I don't want multiple -c option to be supported at all. We source
> defaults from a single file only - either the user specified file,
> or the default if it exists. Not both, not multiple user specified
> files. One file.
>
> > > I think that we need to pull the "-c <file>" option from the command
> > > line and process all the defaults /before/ we enter the main command
> > > line processing loop.
> >
> > That is what I had done originally but ran into that snag of processing
> > arguments twice and the undocumented functionality I found that worked.
>
> Multiple pass option parsing is documented as supported and has been
> for a long, long time. And to make that clear, we even have a
> wrapper in xfsprogs for it:
>
> $ git grep platform_getoptreset
> db/command.c: platform_getoptreset();
> include/darwin.h:static __inline__ void platform_getoptreset(void)
> include/freebsd.h:static __inline__ void platform_getoptreset(void)
> include/gnukfreebsd.h:static __inline__ void platform_getoptreset(void)
> include/linux.h:static __inline__ void platform_getoptreset(void)
> libxcmd/command.c: platform_getoptreset();
>
> And we use it in libxcmd so that each function in xfs_io, xfs_db,
> xfs_quota, etc can run their own sub-command getopt calls correctly.
> It was in the initial commits for xfs_db back in 2001, so we've been
> using multiple pass CLI option parsing in xfsprogs since 2001....
>
> > > IOWs:
> > > config_file = getenv("MKFS_XFS_CONFIG");
> > > if (config_file)
> > > dft.source = _("Environment Variable");
> > >
> > > /*
> > > * Pull config line options from command line
> > > */
> > > while ((c = getopt(argc, argv, "c:")) != EOF) {
> > > switch (c) {
> > > case 'c':
> > > /* XXX: fail if already seen a CLI option */
> >
> > BTW isn't my enum here sufficient to check for this easily in a clean
> > short and easy way?
>
> A local boolean variable is all that is ncessary for this....
>
> [....]
>
> > > memcpy(/* sb_feats */);
> > > memcpy(/* fsxattr */);
> > > optind = 1;
> >
> > My getopt(3) *does* not make mention of any of this. This man page however does:
>
> It's been there as long as I can remember. 2nd paragraph of the
> description:
>
> $man 3 getopt
> [...]
> DESCRIPTION
> [...]
> The variable optind is the index of the next element to be
> processed in argv. The system initializes this value to 1.
> The caller can reset it to 1 to restart scanning of the
> same argv, or when scanning a new argument vector.
>
>
> > optind = 1;
> >
> > However perhaps a big fat NOTE is worthy?
>
> Why? It's documented, well known behaviour....
>
> > > [...]
> > >
> > > > +const char *default_type_str(enum default_params_type type)
> > > > +{
> > > > + switch (type) {
> > > > + case DEFAULTS_BUILTIN:
> > > > + return _("package built-in definitions");
> > > > + case DEFAULTS_CONFIG:
> > > > + return _("default configuration system file");
> > > > + case DEFAULTS_ENVIRONMENT_CONFIG:
> > > > + return _("custom configuration file set by environment");
> > > > + case DEFAULTS_TYPE_CONFIG:
> > > > + return _("custom configuration type file");
> > > > + }
> > > > + return _("Unkown\n");
> > > > +}
> > >
> > > This function can go away.
> >
> > Well depends, if we keep the enum the no ?
>
> That enum needs to die. It's unnecssary abstraction.
>
> > > > + if (strlen(line) < 2)
> > > > + return PARSE_INVALID;
> > >
> > > So empty lines return PARSE_INVALID?
> >
> > Yes, but that's not fatal, we just discard it.
>
> That's "ignore", like comments, not "invalid".
>
> >
> > > > +
> > > > + if (line[0] == '#')
> > > > + return PARSE_COMMENT;
> > >
> > > What about lines starting with whitespace? e.g. " # comment"
> >
> > parse_line_section() below would not return 1 and its ignored
> > in the end as its also not a proper tag/value pair and the
> > section is not set.
>
> So it silently ignores a valid sections/{tag/value pair} if there's
> trailing stuff on the line? Shouldn't that throw an error?
>
> > > > +{
> > > > + int ret;
> > > > + FILE *fp;
> > > > + struct stat sp;
> > > > + unsigned int num_subopts_sections = sizeof(config_subopt_tab) /
> > > > + sizeof(struct config_subopts) - 1;
> > > > + char *p;
> > > > + char line[80], tag[80];
> > > > + bool value;
> > > > + enum parse_line_type parse_type;
> > > > + enum mkfs_config_section section = MKFS_CONFIG_SECTION_INVALID;
> > > > +
> > > > + fp = fopen(dft->config_file, "r");
> > > > + if (!fp) {
> > > > + if (dft->type != DEFAULTS_BUILTIN) {
> > > > + fprintf(stderr, _("could not open config file: %s\n"),
> > > > + dft->config_file);
> > > > + ret = -ENOENT;
> > > > + } else
> > > > + ret = 0;
> > > > + return ret;
> > > > + }
> > >
> > > This should be split out into a separate function that takes the
> > > config file and tries to open it. If it's a relative path that was
> > > supplied, then this function can also try all the configured search
> > > paths for config files.
> >
> > Eric asked to support 3 cases:
> >
> > a) ./ -- current working directory
> > b) / -- full path
> > c) no prefix - we use the sysconfigdir/mkfs.d/
>
> That's pretty much what I just suggested. :)
>
> > > > + return ret;
> > > > +
> > > > + while (!feof(fp)) {
> > > > + p = fgets(line, sizeof(line), fp);
> > >
> > > What if the line is longer than 80 characters? e.g. a long comment
> >
> > If its a comment or spaces with a comment, we still only care for
> > the first 80 characters, no?
>
> What does fgets() do with the remaining characters on the line? They
> are still parked in the input stream, so won't the next fgets() call
> read them? And then we parse those bytes as if they are a new config
> line?
>
> IOWs, shouldn't we be using a line-based input function here? Say,
> getline(3)?
>
> > > > + if (!p)
> > > > + continue;
> > > > +
> > > > + parse_type = parse_get_line_type(line, tag, &value);
> > > > +
> > > > + switch (parse_type) {
> > > > + case PARSE_COMMENT:
> > > > + case PARSE_INVALID:
> > > > + case PARSE_EOF:
> > > > + break;
> > > > + case PARSE_SECTION:
> > > > + section = get_config_section(tag);
> > > > + if (!section) {
> > > > + fprintf(stderr, _("Invalid section: %s\n"),
> > > > + tag);
> > > > + return -EINVAL;
> > > > + }
> > > > + break;
> > > > + case PARSE_TAG_VALUE:
> > > > + /* Trims white spaces */
> > > > + snprintf(line, sizeof(line), "%s=%u", tag, value);
> > > > + ret = parse_config_subopts(section, value, line, dft);
> > >
> > > We've already got the tag and value as discrete variables - why put
> > > them back into an ascii string to pass to another function for it to
> > > split them back into discrete variables?
> >
> > The comment above it explains, "Trims white spaces"
>
> Why do we need to do that? Doesn't the token parser trim away excess
> whitespace around each token it returns?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-05-21 15:30 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-17 19:26 [PATCH v2 0/5] xfsprogs: add mkfs.xfs configuration file parsing support Luis R. Rodriguez
2018-05-17 19:26 ` [PATCH v2 1/5] mkfs: distinguish between struct sb_feat_args and struct cli_params Luis R. Rodriguez
2018-05-17 22:02 ` Dave Chinner
2018-05-17 19:26 ` [PATCH v2 2/5] mkfs: move shared structs and cli params into their own headers Luis R. Rodriguez
2018-05-17 22:40 ` Dave Chinner
2018-05-17 23:54 ` Luis R. Rodriguez
2018-05-18 0:49 ` Dave Chinner
2018-05-19 1:33 ` Luis R. Rodriguez
2018-05-17 19:26 ` [PATCH v2 3/5] mkfs: replace defaults source with an enum Luis R. Rodriguez
2018-05-17 22:48 ` Dave Chinner
2018-05-17 23:09 ` Luis R. Rodriguez
2018-05-18 0:53 ` Dave Chinner
2018-05-17 19:26 ` [PATCH v2 4/5] mkfs: add helpers to process defaults Luis R. Rodriguez
2018-05-17 22:53 ` Dave Chinner
2018-05-18 0:06 ` Luis R. Rodriguez
2018-05-17 19:27 ` [PATCH v2 5/5] mkfs.xfs: add configuration file parsing support using our own parser Luis R. Rodriguez
2018-05-17 21:31 ` Darrick J. Wong
2018-05-18 0:29 ` Luis R. Rodriguez
2018-05-21 18:32 ` Luis R. Rodriguez
2018-05-18 0:44 ` Dave Chinner
2018-05-19 1:32 ` Luis R. Rodriguez
2018-05-21 0:14 ` Dave Chinner
2018-05-21 15:30 ` Darrick J. Wong [this message]
2018-05-21 16:58 ` Luis R. Rodriguez
2018-05-22 19:37 ` Luis R. Rodriguez
2018-05-18 3:24 ` Eric Sandeen
2018-05-18 3:46 ` Darrick J. Wong
2018-05-18 15:38 ` Luis R. Rodriguez
2018-05-18 17:09 ` Eric Sandeen
2018-05-18 23:56 ` Luis R. Rodriguez
2018-05-21 9:40 ` Jan Tulak
2018-05-25 0:50 ` Luis R. Rodriguez
2018-05-20 0:16 ` Dave Chinner
2018-05-21 15:33 ` Darrick J. Wong
2018-05-21 17:05 ` Luis R. Rodriguez
2018-05-21 22:10 ` Dave Chinner
2018-05-21 22:24 ` Eric Sandeen
2018-05-22 0:38 ` Dave Chinner
2018-05-25 0:51 ` Luis R. Rodriguez
2018-05-25 0:54 ` Luis R. Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180521153017.GG23858@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.com \
--cc=jeffm@suse.com \
--cc=jtulak@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=lpechacek@suse.com \
--cc=mcgrof@kernel.org \
--cc=okurz@suse.com \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).