Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
To: Stewart Smith <stewart@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.
Date: Tue, 25 Feb 2014 10:32:11 +0530	[thread overview]
Message-ID: <530C23D3.6020203@linux.vnet.ibm.com> (raw)
In-Reply-To: <87y515mij4.fsf@river.au.ibm.com>

On 02/21/2014 05:41 AM, Stewart Smith wrote:
>  Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>  > This patch adds support to read error logs from OPAL and export them
>  > to userspace through sysfs interface /sys/firmware/opa/opal-elog.

Hi Stewart,

Thanks for the review. This code definitely needs improvement.

> 
>  I think we could provide a better interface with instead having a file
>  per log message appear in sysfs. We're never going to have more than 128
>  of these at any one time on the Linux side, so it's not going to bee too
>  many files.

It is not just about 128 files, we may be adding/removing sysfs node for
every new log id that gets informed to kernel and ack-ed. In worst case,
when we have flood of elog errors with user daemon consuming it and
ack-ing back to get ready for next log in a tight poll, we may
continuously add/remove the sysfs node for each new <id>.

> 
>  e.g. /sys/firmware/opal/elog/<id>
> 
>  that way, any new file in /sys/firmware/opal/elog/ means there's a new
>  log entry available. I believe there's 
> 
>  To ack a log, you could just echo 'ack' to the file.
> 
>  The other option woudl be to more closely follow what sysfs is meant to
>  be - ascii text. This would mean having more (any) of the parser in
>  kernel for the error logs - which may/may not be a bad idea.
> 
>  However, it would make the end user code for consuming them much much
>  simpler, and that may be a good thing.
> 
>  Having some way of getting some information out without a userspace
>  parser is probably good though, I'm pretty sure having only a binary
>  interface in /sys is at least partially frowned upon.
> 
>  > This is what user space tool would do:
>  > - Read error log from /sys/firmware/opa/opal-elog.
>  > - Save it to the disk.
>  > - Send an acknowledgement on successful consumption by writing error log
>  >   id to /sys/firmware/opa/opal-elog-ack.
> 
>  A userspace tool may want to explicitly *not* ack the log too, or only
>  ack some entries, so the interface sohuld be sane for this use case too.
> 
>  e.g. we could display them in petitboot.
> 
>  > diff --git a/arch/powerpc/platforms/powernv/opal-elog.c
>  > b/arch/powerpc/platforms/powernv/opal-elog.c
>  [ 2 more citation lines. Click/Enter to show. ]
>  > new file mode 100644
>  > index 0000000..fc891ae
>  > --- /dev/null
>  > +++ b/arch/powerpc/platforms/powernv/opal-elog.c
>  > @@ -0,0 +1,309 @@
>  <snip>
>  > +/* Maximum size of a single log on FSP is 16KB */
>  > +#define OPAL_MAX_ERRLOG_SIZE	16384
> 
>  I've seen some conflicting things on this - is it 2kb or 16kb?

We choose 16kb because we want to pull all the log data and not partial.

> 
>  > +
>  > +struct opal_err_log {
>  > +	struct list_head link;
>  > +	uint64_t opal_log_id;
> 
>  why is this uint64_t and not uint32_t? It appears that the log id is 32bits.

Agree, This needs to be changed to uint_32.

> 
>  > +	size_t opal_log_size;
>  > +	uint8_t data[OPAL_MAX_ERRLOG_SIZE];
>  > +};
>  > +
>  > +/* Pre-allocated temp buffer to pull error log from opal. */
>  > +static uint8_t err_log_data[OPAL_MAX_ERRLOG_SIZE];
> 
>  Why do we need temporary space? Why not just store directly into struct
>  opal_err_log?
> 
>  > +/* Protect err_log_data buf */
>  > +static DEFINE_MUTEX(err_log_data_mutex);
>  [ 15 more citation lines. Click/Enter to show. ]
>  > +
>  > +static uint64_t total_log_size;
>  > +static bool opal_log_available;
>  > +static LIST_HEAD(elog_list);
>  > +static LIST_HEAD(elog_ack_list);
>  > +
>  > +/* lock to protect elog_list and elog-ack_list. */
>  > +static DEFINE_SPINLOCK(opal_elog_lock);
>  > +
>  > +static DECLARE_WAIT_QUEUE_HEAD(opal_log_wait);
>  > +
>  > +/*
>  > + * Interface for user to acknowledge the error log.
>  > + *
>  > + * Once user acknowledge the log, we delete that record entry from the
>  > + * list and move it ack list.
>  > + */
>  > +void opal_elog_ack(uint64_t ack_id)
> 
>  s/ack_id/log_id/

Yup. makes sense.

> 
>  > +
>  > +static ssize_t elog_ack_store(struct kobject *kobj,
>  [ 7 more citation lines. Click/Enter to show. ]
>  > +					struct kobj_attribute *attr,
>  > +					const char *buf, size_t count)
>  > +{
>  > +	uint32_t log_ack_id;
>  > +	log_ack_id = *(uint32_t *) buf;
>  > +
>  > +	/* send acknowledgment to FSP */
>  > +	opal_elog_ack(log_ack_id);
>  > +	return 0;
>  > +}
> 
>  This function has a few problems:
> 
>  Consider the following actions:
>  $ echo 1 > /sys/firmware/opal/opal-elog-ack
>  $ echo 'abcde' > /sys/firmware/opal/opal-elog-ack
> 
>  The former will read undefined memory and the latter will make a kernel
>  thread, rsyslogd and systemd-journal all each a CPU each.
> 
>  Basically, the problems are:
>  1) not endian safe
>  2) not following store API of returning nr bytes read
>  3) binary interface. Use sscanf to read numbers instead.
> 
>  > +/*
>  > + * Show error log records to user.
>  [ 9 more citation lines. Click/Enter to show. ]
>  > + */
>  > +static ssize_t opal_elog_show(struct file *filp, struct kobject *kobj,
>  > +				struct bin_attribute *bin_attr, char *buf,
>  > +				loff_t pos, size_t count)
>  > +{
>  > +	unsigned long flags;
>  > +	struct opal_err_log *record, *next;
>  > +	size_t size = 0;
>  > +	size_t data_to_copy = 0;
>  > +	int error = 0;
>  > +
>  > +	/* Display one log@a time. */
> 
>  use words, not @.
> 
>  > +	if (count > OPAL_MAX_ERRLOG_SIZE)
>  > +		count = OPAL_MAX_ERRLOG_SIZE;
>  [ 23 more citation lines. Click/Enter to show. ]
>  > +	spin_lock_irqsave(&opal_elog_lock, flags);
>  > +	/* Align the pos to point within total errlog size. */
>  > +	if (total_log_size && pos > total_log_size)
>  > +		pos = pos % total_log_size;
>  > +
>  > +	/*
>  > +	 * if pos goes beyond total_log_size then we know we don't have any
>  > +	 * new record to show.
>  > +	 */
>  > +	if (total_log_size == 0 || pos >= total_log_size) {
>  > +		opal_log_available = 0;
>  > +		if (filp->f_flags & O_NONBLOCK) {
>  > +			spin_unlock_irqrestore(&opal_elog_lock, flags);
>  > +			error = -EAGAIN;
>  > +			goto out;
>  > +		}
>  > +		spin_unlock_irqrestore(&opal_elog_lock, flags);
>  > +		pos = 0;
>  > +
>  > +		/* Wait until we get log from sapphire */
>  > +		error = wait_event_interruptible(opal_log_wait,
>  > +						 opal_log_available);
>  > +		if (error)
>  > +			goto out;
>  > +		spin_lock_irqsave(&opal_elog_lock, flags);
>  > +	}
> 
>  Why should we wait for there to be a log message? If there's not one
>  then there's not one and that's fine.
> 
>  I also wonder if we really need total_log_size and opal_log_available,
>  this information seems readily available from the list of events.
> 
>  Instead, for notification (as i understand it)  we should be using
>  sysfs_notify() from kernel and then in userspace we can just call
>  select() to wait for something to happen.
> 
>  > +/*
>  > + * Pre-allocate a buffer to hold handful of error logs until user space
>  [ 5 more citation lines. Click/Enter to show. ]
>  > + * consumes it.
>  > + */
>  > +static int init_err_log_buffer(void)
>  > +{
>  > +	int i = 0;
>  > +	struct opal_err_log *buf_ptr;
>  > +
>  > +	buf_ptr = vmalloc(sizeof(struct opal_err_log) * MAX_NUM_RECORD);
> 
>  This means we constantly use 128 * sizeof(struct opal_err_log) which
>  equates to somewhere north of 2MB of memory (due to list overhead).
> 
>  I don't think we need to statically allocate this, we can probably just
>  allocate on-demand as in a typical system you're probably quite
>  unlikely to have too many of these sitting around (besides, if for
>  whatever reason we cannot allocate memory at some point, that's okay
>  because we can read it again later).

The reason we choose to go for static allocation is, we can not afford
to drop or delay a critical error log due to memory allocation failure.
OR we can keep static allocations for critical errors and follow dynamic
allocation for informative error logs.  What do you say?

Thanks,
-Mahesh.

next prev parent reply	other threads:[~2014-02-25  5:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16  9:58 [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface Mahesh J Salgaonkar
2014-02-21  0:11 ` Stewart Smith
2014-02-25  5:02   ` Mahesh Jagannath Salgaonkar [this message]
2014-02-25 23:19     ` Stewart Smith
2014-02-28  0:58       ` [PATCH] powerpc/powernv: Read OPAL error log and export it through sysfs Stewart Smith
2014-03-04 12:31         ` Vasant Hegde
2014-03-05  1:56           ` Stewart Smith
2014-03-05  3:26             ` Vasant Hegde
2014-03-05  3:26             ` Vasant Hegde
2014-03-05  4:00               ` Benjamin Herrenschmidt
2014-03-05  4:05                 ` Stewart Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530C23D3.6020203@linux.vnet.ibm.com \
    --to=mahesh@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=stewart@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.