From: Alexander Aring <aahringo@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH dlm-tool 14/14] python: add posix lockdb plot tool
Date: Thu, 2 Mar 2023 12:14:41 -0500 [thread overview]
Message-ID: <20230302171441.1509914-14-aahringo@redhat.com> (raw)
In-Reply-To: <20230302171441.1509914-1-aahringo@redhat.com>
This patch adds a python tool to plot with use of python plotly [0] the
dlm_controld cluster-wide lockdb. It scans dlm_controld plock log files
for the required information about possible contention (waiters) and the
acquired lock modes over time.
Example:
./lockdb_plot /var/log/dlm_controld/plock.log
and plotly will start a webserver and you can see a generated, interactive
plot about the lockranges in y axis and time in x axis. Depending on
usage it's limited and does not scale. Ther exists arguments like --info
to get plock.log timing information and with --ts_start/ts_end you can
filter a specific time range to only show a small part of the plock log
to reduce the amount of rectangles need to be drawn by the browser.
It shows only lock modes and contention, no lock requests. Means A trylock
is only visible when it was successful. A non trylock will be drawn and
has possible contention.
With this plot it should be easy to figure out on which node a specific
lock range was acquired and resulted in contention.
[0] https://plotly.com/python/
---
python/tools/lockdb_plot | 429 +++++++++++++++++++++++++++++++++++++++
1 file changed, 429 insertions(+)
create mode 100755 python/tools/lockdb_plot
diff --git a/python/tools/lockdb_plot b/python/tools/lockdb_plot
new file mode 100755
index 00000000..761bdea2
--- /dev/null
+++ b/python/tools/lockdb_plot
@@ -0,0 +1,429 @@
+#!/usr/bin/env python3
+
+import plotly.graph_objects as go
+from enum import Enum
+import argparse
+import sys
+
+NSECS_PER_SECS = 1000000000
+
+def nsecs_to_secs(nsecs):
+ return nsecs/NSECS_PER_SECS
+
+def secs_to_nsecs(secs):
+ return secs * NSECS_PER_SECS
+
+# entry point of collecting data
+nodes = dict()
+# will parse log to get our_nodeid
+our_nodeid = None
+# legend groups, tmp set to show legend entry only once
+legend_groups = set()
+
+class EType(Enum):
+ LOCK = 1
+ WAITER = 2
+
+def lookup_lockdb(nodeid, ls, number):
+ if nodeid not in nodes:
+ nodes[nodeid] = dict()
+
+ lockspaces = nodes[nodeid]
+ if ls not in lockspaces:
+ lockspaces[ls] = dict()
+
+ if number not in lockspaces[ls]:
+ lockspaces[ls][number] = LockDb(nodeid, ls, number)
+
+ return lockspaces[ls][number]
+
+def plotly_rect_shape(x0, y0, x1, y1, color, opacity, text, group):
+ # we show it in secs
+ x0 = nsecs_to_secs(x0)
+ x1 = nsecs_to_secs(x1)
+
+ rv = go.Scatter(x=[x0,x0,x1,x1,x0],
+ y=[y0,y1,y1,y0,y0],
+ fill="toself",
+ fillcolor=color,
+ mode='lines',
+ name=group,
+ legendgroup=group,
+ showlegend=group not in legend_groups,
+ visible=True,
+ text=text,
+ line=dict(color=color,
+ width=1),
+ opacity=opacity)
+
+ legend_groups.add(group)
+ return rv
+
+
+class LockDbRect():
+ def __init__(self, **kwargs):
+ self.ls = kwargs["ls"]
+ self.number = kwargs["number"]
+ self.pid = kwargs["pid"]
+ self.nodeid = kwargs["nodeid"]
+ self.owner = kwargs["owner"]
+ self.ts_start = kwargs["ts_start"]
+ self.ts_end = kwargs["ts_end"]
+ # +1 because lock range 1000-1000 will then be 999-1000
+ # exclusive vs inclusive we need to decide how to display
+ # it but we don't set a zero height
+ self.lk_start = kwargs["lk_start"] - 1
+ self.lk_end = kwargs["lk_end"]
+
+ def create_shape(self):
+ return plotly_rect_shape(self.ts_start,
+ self.lk_start,
+ self.ts_end,
+ self.lk_end,
+ self.get_color(),
+ self.get_opacity(),
+ self.get_text(),
+ self.get_group())
+
+ def get_duration(self):
+ return self.ts_end - self.ts_start
+
+ def get_range(self):
+ return self.lk_end - self.lk_start
+
+ def __str__(self):
+ return str(self.__dict__)
+
+ def __repr__(self):
+ return str(self)
+
+class LockDbRectWaiter(LockDbRect):
+ def __init__(self, **kwargs):
+ super(LockDbRectWaiter, self).__init__(**kwargs)
+
+ def get_color(self):
+ return "green"
+
+ def get_opacity(self):
+ return 0.2
+
+ def get_text(self):
+ # TODO make it human readable for high numbers?
+ ts_start = nsecs_to_secs(self.ts_start);
+ ts_end = nsecs_to_secs(self.ts_end);
+ dt = nsecs_to_secs(self.get_duration());
+
+ text = "nodeid: {}, pid: {}, owner: {}<br>".format(self.nodeid, self.pid, hex(self.owner)) + \
+ "ts_start: {0:.3f},".format(ts_start) + \
+ "ts_end: {0:.3f},".format(ts_end) + \
+ "dt: {0:.3f}<br>".format(dt) + \
+ "lk_start: {}, lk_end: {}, dy: {}<br>".format(hex(self.lk_start), hex(self.lk_end), hex(self.get_range()))
+
+ return text
+
+ def get_text_pl(self):
+ text = "contention:<br>" + \
+ "start: {0:.3f},".format(nsecs_to_secs(self.ts_start)) + \
+ "end: {0:.3f},".format(nsecs_to_secs(self.ts_end)) + \
+ "dt: {0:.3f}".format(nsecs_to_secs(self.get_duration()))
+
+ return text
+
+ def get_group(self):
+ return "contention"
+
+class LockDbRectPLock(LockDbRect):
+ def __init__(self, **kwargs):
+ super(LockDbRectPLock, self).__init__(**kwargs)
+ self.mode = kwargs["mode"]
+ if self.mode == "WR":
+ self.color = "red"
+ self.group = "write lock"
+ elif self.mode == "RD":
+ self.color = "blue"
+ self.group = "read lock"
+ else:
+ print("{}: received unknown mode".format(self.ts_start))
+ return
+
+ self.waiter = None
+
+ def __str__(self):
+ return str(self.__dict__)
+
+ def __repr__(self):
+ return str(self)
+
+ def has_waiter(self):
+ return self.waiter
+
+ def set_waiter(self, wa):
+ if self.has_waiter():
+ print("{}: has already a waiter set".format(self.ts_start))
+ return
+
+ self.waiter = wa
+
+ def get_waiter(self):
+ return self.waiter
+
+ def get_color(self):
+ return self.color
+
+ def get_opacity(self):
+ return 0.4
+
+ def get_text(self):
+ # TODO make it human readable for high numbers?
+ ts_start = nsecs_to_secs(self.ts_start);
+ ts_end = nsecs_to_secs(self.ts_end);
+ dt = nsecs_to_secs(self.get_duration());
+
+ text = "nodeid: {}, pid: {}, owner: {}<br>".format(self.nodeid, self.pid, hex(self.owner)) + \
+ "ts_start: {0:.3f},".format(ts_start) + \
+ "ts_end: {0:.3f},".format(ts_end) + \
+ "dt: {0:.3f}<br>".format(dt) + \
+ "lk_start: {}, lk_end: {}, dy: {}<br>".format(hex(self.lk_start), hex(self.lk_end), hex(self.get_range()))
+
+ if self.has_waiter():
+ text = text + "<br>{}".format(self.get_waiter().get_text_pl())
+
+ return text;
+
+ def get_group(self):
+ return self.group
+
+class LockDb:
+ def __init__(self, nodeid, ls, number):
+ self.nodeid = nodeid
+ self.ls = ls
+ self.number = number
+ self.__ts = dict()
+ self.plocks = list()
+ self.__last_plock = dict()
+ self.waiters = list()
+ self.__last_waiter = dict()
+ self.__ts_start_min = None
+ self.__ts_end_max = 0
+
+ def event_start(self, vals):
+ self.__ts[vals["ptr"]] = vals
+
+ def __plock(self, vals_start, vals_end):
+ pl = LockDbRectPLock(ls=vals_start["ls"],
+ number=vals_start["number"],
+ pid=vals_start["pid"],
+ nodeid=vals_start["nodeid"],
+ owner=vals_start["owner"],
+ ts_start=vals_start["ts"],
+ ts_end=vals_end["ts"],
+ lk_start=vals_start["start"],
+ lk_end=vals_start["end"],
+ mode=vals_start["mode"])
+ self.plocks.append(pl)
+ self.__last_plock[vals_end["ptr"]] = pl
+
+ try:
+ pl.set_waiter(self.__last_waiter[vals_start["owner"]])
+ del(self.__last_waiter[vals_start["owner"]])
+ except KeyError:
+ pass
+
+ def __waiter(self, vals_start, vals_end):
+ wa = LockDbRectWaiter(ls=vals_start["ls"],
+ number=vals_start["number"],
+ pid=vals_start["pid"],
+ nodeid=vals_start["nodeid"],
+ owner=vals_start["owner"],
+ ts_start=vals_start["ts"],
+ ts_end=vals_end["ts"],
+ lk_start=vals_start["start"],
+ lk_end=vals_start["end"])
+ self.waiters.append(wa)
+ self.__last_waiter[vals_start["owner"]] = wa
+
+ def event_end(self, vals):
+ try:
+ vals_start = self.__ts[vals["ptr"]]
+ del(self.__ts[vals["ptr"]])
+ except KeyError:
+ print("{}: cannot find previous start, ignoring".format(vals["ts"]))
+ return
+
+ if not self.__ts_start_min:
+ self.__ts_start_min = vals_start["ts"]
+ else:
+ self.__ts_start_min = min(self.__ts_start_min, vals_start["ts"])
+
+ if not self.__ts_start_min:
+ self.__ts_end_max = vals["ts"]
+ else:
+ self.__ts_end_max = max(self.__ts_end_max, vals["ts"])
+
+ if vals["event"] == EType.LOCK:
+ self.__plock(vals_start, vals)
+ elif vals["event"] == EType.WAITER:
+ self.__waiter(vals_start, vals)
+ else:
+ print("{}: unknown event type".format(vals["ts"]))
+
+ def add_shapes(self, fig):
+ global legend_groups
+ legend_groups = set()
+
+ # first waiters to be in background
+ for wa in self.waiters:
+ fig.add_trace(wa.create_shape())
+
+ # next layer to draw lock states
+ for pl in self.plocks:
+ fig.add_trace(pl.create_shape())
+
+ def get_ts_min(self):
+ return self.__ts_start_min
+
+ def get_ts_max(self):
+ return self.__ts_end_max
+
+ def __str__(self):
+ return "ls {} inode {}".format(self.ls, self.number)
+
+ def __repr__(self):
+ return str(self)
+
+parser = argparse.ArgumentParser(description="Plots dlm_controld lock databases")
+parser.add_argument('-i', "--info", action="store_true")
+parser.add_argument('-s', "--ts_start", type=float)
+parser.add_argument('-e', "--ts_end", type=float)
+
+parser.add_argument('plock_logs', type=argparse.FileType('r'), nargs='+')
+args = parser.parse_args()
+
+for f in args.plock_logs:
+ lines = f.readlines()
+
+ # Current way to figure out our nodeid
+ for line in lines:
+ if "read plock" in line:
+ lsplit = line.split()
+ npo = lsplit[11].split('/')
+ nodeid = int(npo[0])
+ our_nodeid = nodeid
+ break
+
+ if not our_nodeid:
+ print("{}: could not determine our nodeid".format(f))
+ continue
+
+ for line in lines:
+ if "state plock" in line:
+ pass
+ elif "state waiter" in line:
+ pass
+ else:
+ continue
+
+ vals = dict()
+ lsplit = line.split()
+ vals["ts"] = int(lsplit[3])
+
+ if args.ts_start and secs_to_nsecs(args.ts_start) > int(vals["ts"]):
+ continue
+
+ if args.ts_end and secs_to_nsecs(args.ts_end) < int(vals["ts"]):
+ break;
+
+ vals["ls"] = lsplit[4]
+ vals["op"] = lsplit[7]
+ vals["number"] = int(lsplit[8], 16)
+ vals["ptr"] = lsplit[9]
+
+ db = lookup_lockdb(our_nodeid, vals["ls"], vals["number"])
+ if "state plock" in line:
+ vals["event"] = EType.LOCK
+ if vals["op"] == "start":
+ vals["mode"] = lsplit[10]
+ se = lsplit[11].split('-')
+ vals["start"] = int(se[0], 16)
+ vals["end"] = int(se[1], 16)
+ npo = lsplit[12].split('/')
+ vals["nodeid"] = int(npo[0])
+ vals["pid"] = int(npo[1])
+ vals["owner"] = int(npo[2], 16)
+ db.event_start(vals)
+ elif vals["op"] =="end":
+ db.event_end(vals)
+ else:
+ print("{}: unknown state plock operation: {}".format(vals["ts"], vals["op"]))
+ elif "state waiter" in line:
+ vals["event"] = EType.WAITER
+ if vals["op"] == "start":
+ se = lsplit[10].split('-')
+ vals["start"] = int(se[0], 16)
+ vals["end"] = int(se[1], 16)
+ npo = lsplit[11].split('/')
+ vals["nodeid"] = int(npo[0])
+ vals["pid"] = int(npo[1])
+ vals["owner"] = int(npo[2], 16)
+ db.event_start(vals)
+ elif vals["op"] == "end":
+ db.event_end(vals)
+ else:
+ print("{}: unknown state waiter operation: {}".format(vals["ts"], vals["op"]))
+ else:
+ print("received unknown line event: {}".format(line))
+
+if args.info:
+ ts_max = None
+ ts_min = None
+
+ for nodeid in nodes:
+ for ls in nodes[nodeid]:
+ for number in nodes[nodeid][ls]:
+ db = lookup_lockdb(nodeid, ls, number)
+ if not ts_min:
+ ts_min = db.get_ts_min()
+ else:
+ ts_min = min(ts_min, db.get_ts_min())
+
+ if not ts_max:
+ ts_max = db.get_ts_max()
+ else:
+ ts_max = min(ts_max, db.get_ts_max())
+
+ print("global info:")
+ print("\tts_start: {0:.3f} secs".format(nsecs_to_secs(ts_min)))
+ print("\tts_end: {0:.3f} secs".format(nsecs_to_secs(ts_max)))
+ print("\tduration: {0:.3f} secs".format(nsecs_to_secs(ts_max) - nsecs_to_secs(ts_min)))
+
+ print()
+ print("per inode lock info:")
+ for nodeid in nodes:
+ print("nodeid: {}".format(nodeid))
+ for ls in nodes[nodeid]:
+ print("\tls: {}".format(ls))
+ for number in nodes[nodeid][ls]:
+ print("\t\tinode: {}".format(hex(number)))
+ db = lookup_lockdb(nodeid, ls, number)
+ print("\t\t\tts_start: {0:.3f} secs".format(nsecs_to_secs(db.get_ts_min())))
+ print("\t\t\tts_end: {0:.3f} secs".format(nsecs_to_secs(db.get_ts_max())))
+ print("\t\t\tduration: {0:.3f} secs".format(nsecs_to_secs(db.get_ts_max()) - nsecs_to_secs(db.get_ts_min())))
+
+ exit(0)
+
+for nodeid in nodes:
+ for ls in nodes[nodeid]:
+ for number in nodes[nodeid][ls]:
+ print("create plot: {} {} {}".format(nodeid, ls, number))
+
+ fig = go.Figure()
+ fig.update_layout(title_text="dlm_controld cluster-wide lock database<br>nodeid: {}, lockspace: {}, inode: {}".format(nodeid, ls, hex(number)))
+
+ # fill plot with shapes
+ lookup_lockdb(nodeid, ls, number).add_shapes(fig)
+
+ fig.update_xaxes(title_text='Time (secs)')
+ fig.update_yaxes(title_text='Lock Range (bytes decimal)')
+
+ config = dict({'scrollZoom': True, 'displaylogo': False})
+ fig.show(config=config)
--
2.31.1
next prev parent reply other threads:[~2023-03-02 17:14 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-02 17:14 [Cluster-devel] [PATCH dlm-tool 01/14] dlm_tool: add fail functionality if dump failed Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 02/14] dlm_controld: always create logdir Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 03/14] dlm_controld: add plock logfile Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 04/14] dlm_controld: move processing of saved messages to plock level Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 05/14] dlm_controld: remove ls parameter Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 06/14] dlm_controld: constify timeval of dt_usec() Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 07/14] dlm_controld: add gcc format printf attribute to log_level Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 08/14] dlm_controld: enable nanosec logging Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 09/14] dlm_controld: use write_result() Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 10/14] dlm_controld: be sure we stop lockspaces before shutdown Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 11/14] dlm_controld: constify name_in in log_level() Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 12/14] dlm_controld: plock log waiters state Alexander Aring
2023-03-02 17:14 ` [Cluster-devel] [PATCH dlm-tool 13/14] dlm_controld: plock log lock state Alexander Aring
2023-03-03 13:38 ` Andreas Gruenbacher
2023-03-03 14:35 ` Alexander Aring
2023-03-03 15:52 ` Andreas Gruenbacher
2023-03-03 16:02 ` Andreas Gruenbacher
2023-03-03 22:20 ` Alexander Aring
2023-03-03 22:28 ` Alexander Aring
2023-03-03 22:43 ` Alexander Aring
2023-03-03 22:31 ` Alexander Aring
2023-03-02 17:14 ` Alexander Aring [this message]
2023-03-03 14:40 ` [Cluster-devel] [PATCH dlm-tool 14/14] python: add posix lockdb plot tool Alexander Aring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230302171441.1509914-14-aahringo@redhat.com \
--to=aahringo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).