All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 13/13] xfs: convert xfs_sysfs attrs to use ->seq_show
Date: Tue, 14 Sep 2021 07:12:43 +0200	[thread overview]
Message-ID: <YUAvSx42abg5S2ym@kroah.com> (raw)
In-Reply-To: <20210914012029.GF2361455@dread.disaster.area>

On Tue, Sep 14, 2021 at 11:20:29AM +1000, Dave Chinner wrote:
> On Mon, Sep 13, 2021 at 08:27:50AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Sep 13, 2021 at 07:41:21AM +0200, Christoph Hellwig wrote:
> > > Trivial conversion to the seq_file based sysfs attributes.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_stats.c | 24 +++++-------
> > >  fs/xfs/xfs_stats.h |  2 +-
> > >  fs/xfs/xfs_sysfs.c | 96 +++++++++++++++++++++++-----------------------
> > >  3 files changed, 58 insertions(+), 64 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
> > > index 20e0534a772c9..71e7a84ba0403 100644
> > > --- a/fs/xfs/xfs_stats.c
> > > +++ b/fs/xfs/xfs_stats.c
> > > @@ -16,10 +16,9 @@ static int counter_val(struct xfsstats __percpu *stats, int idx)
> > >  	return val;
> > >  }
> > >  
> > > -int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > > +void xfs_stats_format(struct xfsstats __percpu *stats, struct seq_file *sf)
> > >  {
> > >  	int		i, j;
> > > -	int		len = 0;
> > >  	uint64_t	xs_xstrat_bytes = 0;
> > >  	uint64_t	xs_write_bytes = 0;
> > >  	uint64_t	xs_read_bytes = 0;
> > > @@ -58,13 +57,12 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > >  	/* Loop over all stats groups */
> > >  
> > >  	for (i = j = 0; i < ARRAY_SIZE(xstats); i++) {
> > > -		len += scnprintf(buf + len, PATH_MAX - len, "%s",
> > > -				xstats[i].desc);
> > > +		seq_printf(sf, "%s", xstats[i].desc);
> > > +
> > >  		/* inner loop does each group */
> > >  		for (; j < xstats[i].endpoint; j++)
> > > -			len += scnprintf(buf + len, PATH_MAX - len, " %u",
> > > -					counter_val(stats, j));
> > > -		len += scnprintf(buf + len, PATH_MAX - len, "\n");
> > > +			seq_printf(sf, " %u", counter_val(stats, j));
> > > +		seq_printf(sf, "\n");
> > >  	}
> > >  	/* extra precision counters */
> > >  	for_each_possible_cpu(i) {
> > > @@ -74,18 +72,14 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
> > >  		defer_relog += per_cpu_ptr(stats, i)->s.defer_relog;
> > >  	}
> > >  
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n",
> > > +	seq_printf(sf, "xpc %Lu %Lu %Lu\n",
> > >  			xs_xstrat_bytes, xs_write_bytes, xs_read_bytes);
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n",
> > > -			defer_relog);
> > > -	len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n",
> > > +	seq_printf(sf, "defer_relog %llu\n", defer_relog);
> > >  #if defined(DEBUG)
> > > -		1);
> > > +	seq_printf(sf, "debug 1\n");
> > >  #else
> > > -		0);
> > > +	seq_printf(sf, "debug 0\n");
> > >  #endif
> > > -
> > > -	return len;
> > >  }
> > 
> > That is a sysfs file?  What happened to the "one value per file" rule
> > here?
> 
> 
> There is no "rule" that says syfs files must contain one value per
> file; the documentation says that one value per file is the
> "preferred" format.  Documentation/filesystems/sysfs.rst:
> 
> [...]
> Attributes
> ...
> Attributes should be ASCII text files, preferably with only one value
> per file. It is noted that it may not be efficient to contain only one
> value per file, so it is socially acceptable to express an array of
> values of the same type.
> [...]
> 

An array of values is one thing like "what is the power states for this
device".  A list of different key/value pairs is a totally different
thing entirely.

> We are exposing a large array of integer values here, so multiple
> values per file are explicitly considered an acceptible format.

Not really, that was not the goal of sysfs at all.

> Further, as there are roughly 200 individual stats in this file and
> calculating each stat requires per-cpu aggregation, the the cost of
> calculating and reading each stat individually is prohibitive, not
> just inefficient.

Have you measured it?  How often does the file get read and by what
tools?

We have learned from our past mistakes in /proc where we did this in the
past and required keeping obsolete values and constantly tweaking
userspace parsers.  That is why we made sysfs one-value-per-file.  If
the file is not there, the value is not there, much easier to handle
future changes.

> So, yes, we might have multiple lines in the file that you can frown
> about, but OTOH the file format has been exposed as a kernel ABI for
> a couple of decades via /proc/fs/xfs/stat.

proc had no such rules, but we have learned :)

> Hence exposing it in
> sysfs to provide a more fine-grained breakdown of the stats (per
> mount instead of global) is a no-brainer. We don't have to rewrite
> the parsing engines in multiple userspace monitoring programs to
> extract this information from the kernel - they just create a new
> instance and read a different file and it all just works.

But then you run into the max size restriction on sysfs files
(PAGE_SIZE) and things break down.

Please don't do this.

> Indeed, there's precedence for such /proc file formats in more
> fine-grained sysfs files. e.g.  /sys/bus/node/devices/node<n>/vmstat
> and /sys/bus/node/devices/node<n>/meminfo retain the same format
> (and hence userspace parsers) for the per-node stats as /proc/vmstat
> and /proc/meminfo use for the global stats...

And I have complained about those files in the past many times.  And
they are running into problems in places dealing with them too.

> tl;dr: the file contains arrays of values, it's inefficient to read
> values one at a time, it's a pre-existing ABI-constrainted file
> format, there's precedence in core kernel statistics
> implementations and the documented guidelines allow this sort of
> usage in these cases.

I would prefer not to do this, and I will not take core sysfs changes to
make this any easier.

Which is one big reason why I don't like just making sysfs use the seq
file api, it would allow stuff like this to propagate to other places in
the kernel.

Maybe I should cut the file size of a sysfs file down to PAGE_SIZE/4 or
less, that might be better :)

thanks,

greg k-h

  reply	other threads:[~2021-09-14  5:13 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-13  5:41 start switching sysfs attributes to expose the seq_file Christoph Hellwig
2021-09-13  5:41 ` [PATCH 01/13] seq_file: mark seq_get_buf as deprecated Christoph Hellwig
2021-09-13 13:19   ` Christian Brauner
2021-09-13 16:22   ` Daniel Wagner
2021-09-13 16:29   ` Tejun Heo
2021-09-13  5:41 ` [PATCH 02/13] kernfs: remove kernfs_create_file and kernfs_create_file_ns Christoph Hellwig
2021-09-13 13:20   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 03/13] kernfs: remove the unused lockdep_key field in struct kernfs_ops Christoph Hellwig
2021-09-13 13:21   ` Christian Brauner
2021-09-13 16:30   ` Tejun Heo
2021-09-13  5:41 ` [PATCH 04/13] sysfs: split out binary attribute handling from sysfs_add_file_mode_ns Christoph Hellwig
2021-09-13 13:26   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 05/13] sysfs: refactor sysfs_add_file_mode_ns Christoph Hellwig
2021-09-13 13:27   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 06/13] sysfs: simplify sysfs_kf_seq_show Christoph Hellwig
2021-09-13  5:41 ` [PATCH 07/13] sysfs: add ->seq_show support to sysfs_ops Christoph Hellwig
2021-09-13 13:30   ` Christian Brauner
2021-09-13  5:41 ` [PATCH 08/13] block: convert the blk_mq_hw_ctx attrs to use ->seq_show Christoph Hellwig
2021-09-13  5:41 ` [PATCH 09/13] block: convert the blk_integrity " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 10/13] block: convert the request_queue " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 11/13] block: convert the elevator_queue " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 12/13] xfs: convert xfs_errortag " Christoph Hellwig
2021-09-13  5:41 ` [PATCH 13/13] xfs: convert xfs_sysfs " Christoph Hellwig
2021-09-13  6:27   ` Greg Kroah-Hartman
2021-09-14  1:20     ` Dave Chinner
2021-09-14  5:12       ` Greg Kroah-Hartman [this message]
2021-09-14 10:56         ` Dave Chinner
2021-09-14  7:30     ` Christoph Hellwig
2021-09-14 15:28       ` Greg Kroah-Hartman
2021-09-14 15:30         ` Christoph Hellwig
2021-09-14 15:41           ` Greg Kroah-Hartman
2021-09-15  7:04             ` Christoph Hellwig
2021-09-15  7:07               ` Greg Kroah-Hartman
2021-09-13 16:39 ` start switching sysfs attributes to expose the seq_file Bart Van Assche
2021-09-13 16:46   ` Greg Kroah-Hartman
2021-09-14  2:53     ` Bart Van Assche
2021-09-13 16:46 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUAvSx42abg5S2ym@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.