All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Weber <olaf@sgi.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] Channel Bonding Debug Information
Date: Thu, 1 Oct 2015 16:31:56 +0200	[thread overview]
Message-ID: <560D43DC.2080600@sgi.com> (raw)
In-Reply-To: <CAAqp6i2jWQheq7aeT+VkQ5+=JsuAuMTJSPewGAfvaC8r8kSf9Q@mail.gmail.com>

On 28-09-15 21:30, Amir Shehata wrote:
> Hello,
>
> As a followup on the discussion in the LAD developer summit, regarding
> ensuring that there is enough debug information provided as part of the
> Channel Bonding solution, I'm sending this email to ask for ideas on what
> type of debug information you would like to see.
>
> thanks
> amir

Hi Amir,

My random and disorganized thoughts.

Significant state changes and anything unexpected should of course be logged.

In addition I'd like interfaces that allow me to efficiently get the 
status/stats of a specific network interface or a specific peer, as opposed 
to only being able to get the information for all interfaces or peers and 
then having to filter. That may imply an ioctl type interface instead of or 
in addition to debugfs or sysfs (or procfs).

For the local interfaces, stats include TX/RX counters, credits, interface 
state, and some measure of how busy the interface is. The latter can be 
derived by watching the TX/RX counters over time, but it would be nice to 
have it calculated. A variant on the "File Heat" idea presented at LAD might 
work for this. (Think decaying sum over recent activity.) When interfaces 
are associated with CPTs, the CPT number -- especially important if the 
kernel automatically associates an interface with a CPT.

For the peers, a way to obtain the list of peers, and then to obtain the 
interfaces for each peer. Stats per peer interface include TX/RX counters 
and credits, perceived health, and maybe "heat". For a peer itself possibly 
totals, and peer health as perceived by the current node.

A note on calculating heat: the full list of peer interfaces becomes large 
(on the servers of a large cluster) and you don't want to walk it without 
needing to. If you store a timestamp for the last use, then heat can be 
calculated when the TX/RX counters are updated or read, which is when the 
relevant datastructure is being accessed anyway.

For local interfaces the list is likely small enough that this kind of 
approach isn't worth it. Moreover the list of local interfaces might be 
regularly walked to check on health etc.

Olaf

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696796
                            Veldzigt 2b       Fax:    +31(0)30-6696799
Sr Software Engineer       3454 PW de Meern  Vnet:   955-6796
Storage Software           The Netherlands   Email:  olaf at sgi.com

  parent reply	other threads:[~2015-10-01 14:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-28 19:30 [lustre-devel] Channel Bonding Debug Information Amir Shehata
2015-09-29 21:10 ` Ashish Purkar
2015-10-01 14:31 ` Olaf Weber [this message]
2015-10-02 15:03 ` DEGREMONT Aurelien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560D43DC.2080600@sgi.com \
    --to=olaf@sgi.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.