* [PATCH 0/5] Add trace events for filesystem freeze/thaw events
@ 2016-03-02 19:01 Frank Sorenson
2016-03-02 19:01 ` [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling Frank Sorenson
` (6 more replies)
0 siblings, 7 replies; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
Currently, the only visibility into filesystem freeze or
thaw activity is when an error is returned from the actual
call to freeze or thaw. If the action itself hangs, there
is no indication that the freeze or thaw was in-progress,
short of collecting a vmcore.
There is also no record of what process froze a filesystem
or when it happened, so if the process does not thaw it
later, debugging the issue is difficult.
These patches add tracepoints to the generic filesystem
freeze and thaw functions. When enabled, trace events
will create a record of these activities.
Sample trace events:
(freeze /dev/vdd)
fsfreeze-23072 [003] .... 1018126.904878: freeze_super_enter: comm=fsfreeze pid=23072 for xfs filesystem 'vdd' (253:48) frozen=0
fsfreeze-23072 [003] .... 1018126.926173: freeze_super_exit: comm=fsfreeze pid=23072 for xfs filesystem 'vdd' (253:48) frozen=4 ret=0
(thaw /dev/vdd)
<...>-23074 [000] .... 1018134.025833: thaw_super_enter: comm=fsfreeze pid=23074 for xfs filesystem 'vdd' (253:48) frozen=4
<...>-23074 [000] .... 1018134.025873: thaw_super_exit: comm=fsfreeze pid=23074 for xfs filesystem 'vdd' (253:48) frozen=0 ret=0
(freeze /dev/vdb)
fsfreeze-23077 [003] .... 1018162.211518: freeze_super_enter: comm=fsfreeze pid=23077 for ext4 filesystem 'vdb' (253:16) frozen=0
fsfreeze-23077 [003] .... 1018162.228586: freeze_super_exit: comm=fsfreeze pid=23077 for ext4 filesystem 'vdb' (253:16) frozen=4 ret=0
(attempt to freeze already-frozen /dev/vdb)
fsfreeze-23078 [003] .... 1018169.864362: freeze_super_enter: comm=fsfreeze pid=23078 for ext4 filesystem 'vdb' (253:16) frozen=4
fsfreeze-23078 [003] .... 1018169.864385: freeze_super_exit: comm=fsfreeze pid=23078 for ext4 filesystem 'vdb' (253:16) frozen=4 ret=-16
(unfreeze /dev/vdb)
fsfreeze-23079 [003] .... 1018173.074331: thaw_super_enter: comm=fsfreeze pid=23079 for ext4 filesystem 'vdb' (253:16) frozen=4
fsfreeze-23079 [003] .... 1018173.075928: thaw_super_exit: comm=fsfreeze pid=23079 for ext4 filesystem 'vdb' (253:16) frozen=0 ret=0
(attempt to unfreeze already unfrozen /dev/vdb)
fsfreeze-23080 [000] .... 1018175.105613: thaw_super_enter: comm=fsfreeze pid=23080 for ext4 filesystem 'vdb' (253:16) frozen=0
fsfreeze-23080 [000] .... 1018175.105639: thaw_super_exit: comm=fsfreeze pid=23080 for ext4 filesystem 'vdb' (253:16) frozen=0 ret=22
Frank Sorenson (5):
fs: simplify freeze_super()/thaw_super() exit handling
fs/block_dev.c: simplify freeze_bdev() and thaw_bdev() exit handling
fs: add trace events for freeze_super() and thaw_super()
fs/block_dev.c: add trace events for freeze_bdev() and
thaw_bdev()
fs: enable filesystem freeze/thaw events
fs/block_dev.c | 30 +++++---
fs/super.c | 54 ++++++++------
include/trace/events/fs.h | 109 +++++++++++++++++++++++++++++
include/trace/events/fs_bdev.h | 155 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 316 insertions(+), 32 deletions(-)
create mode 100644 include/trace/events/fs.h
create mode 100644 include/trace/events/fs_bdev.h
--
1.8.3.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
@ 2016-03-02 19:01 ` Frank Sorenson
2016-03-02 20:06 ` Al Viro
2016-03-02 19:01 ` [PATCH 2/5] fs/block_dev.c: simplify freeze_bdev() and thaw_bdev() " Frank Sorenson
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
Change freeze_super() and thaw_super() to have a common exit
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
---
fs/super.c | 44 ++++++++++++++++++++++----------------------
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 1182af8..5e9a974 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1272,26 +1272,21 @@ static void sb_freeze_unlock(struct super_block *sb)
*/
int freeze_super(struct super_block *sb)
{
- int ret;
+ int ret = 0;
atomic_inc(&sb->s_active);
down_write(&sb->s_umount);
if (sb->s_writers.frozen != SB_UNFROZEN) {
deactivate_locked_super(sb);
- return -EBUSY;
+ ret = -EBUSY;
+ goto out;
}
- if (!(sb->s_flags & MS_BORN)) {
- up_write(&sb->s_umount);
- return 0; /* sic - it's "nothing to do" */
- }
+ if (!(sb->s_flags & MS_BORN))
+ goto out_unlock; /* nothing to do */
- if (sb->s_flags & MS_RDONLY) {
- /* Nothing to do really... */
- sb->s_writers.frozen = SB_FREEZE_COMPLETE;
- up_write(&sb->s_umount);
- return 0;
- }
+ if (sb->s_flags & MS_RDONLY)
+ goto out_complete; /* nothing to do */
sb->s_writers.frozen = SB_FREEZE_WRITE;
/* Release s_umount to preserve sb_start_write -> s_umount ordering */
@@ -1319,16 +1314,19 @@ int freeze_super(struct super_block *sb)
sb_freeze_unlock(sb);
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
- return ret;
+ goto out;
}
}
/*
* This is just for debugging purposes so that fs can warn if it
* sees write activity when frozen is set to SB_FREEZE_COMPLETE.
*/
+out_complete:
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+out_unlock:
up_write(&sb->s_umount);
- return 0;
+out:
+ return ret;
}
EXPORT_SYMBOL(freeze_super);
@@ -1340,34 +1338,36 @@ EXPORT_SYMBOL(freeze_super);
*/
int thaw_super(struct super_block *sb)
{
- int error;
+ int ret = 0;
down_write(&sb->s_umount);
if (sb->s_writers.frozen == SB_UNFROZEN) {
up_write(&sb->s_umount);
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
if (sb->s_flags & MS_RDONLY) {
sb->s_writers.frozen = SB_UNFROZEN;
- goto out;
+ goto out_wake;
}
if (sb->s_op->unfreeze_fs) {
- error = sb->s_op->unfreeze_fs(sb);
- if (error) {
+ ret = sb->s_op->unfreeze_fs(sb);
+ if (ret) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
up_write(&sb->s_umount);
- return error;
+ goto out;
}
}
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb);
-out:
+out_wake:
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
- return 0;
+out:
+ return ret;
}
EXPORT_SYMBOL(thaw_super);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] fs/block_dev.c: simplify freeze_bdev() and thaw_bdev() exit handling
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
2016-03-02 19:01 ` [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling Frank Sorenson
@ 2016-03-02 19:01 ` Frank Sorenson
2016-03-02 19:01 ` [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super() Frank Sorenson
` (4 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
Change freeze_bdev() and thaw_bdev() to have a common exit
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
---
fs/block_dev.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 826b164..1207f40 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -237,8 +237,7 @@ struct super_block *freeze_bdev(struct block_device *bdev)
*/
sb = get_super(bdev);
drop_super(sb);
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return sb;
+ goto out_unlock;
}
sb = get_active_super(bdev);
@@ -251,12 +250,13 @@ struct super_block *freeze_bdev(struct block_device *bdev)
if (error) {
deactivate_super(sb);
bdev->bd_fsfreeze_count--;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return ERR_PTR(error);
+ sb = ERR_PTR(error);
+ goto out_unlock;
}
deactivate_super(sb);
- out:
+out:
sync_blockdev(bdev);
+out_unlock:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return sb; /* thaw_bdev releases s->s_umount */
}
@@ -288,14 +288,11 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
error = sb->s_op->thaw_super(sb);
else
error = thaw_super(sb);
- if (error) {
+ if (error)
bdev->bd_fsfreeze_count++;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return error;
- }
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return 0;
+ return error;
}
EXPORT_SYMBOL(thaw_bdev);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super()
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
2016-03-02 19:01 ` [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling Frank Sorenson
2016-03-02 19:01 ` [PATCH 2/5] fs/block_dev.c: simplify freeze_bdev() and thaw_bdev() " Frank Sorenson
@ 2016-03-02 19:01 ` Frank Sorenson
2016-03-02 20:12 ` Al Viro
2016-03-02 19:01 ` [PATCH 4/5] fs/block_dev.c: add trace events for freeze_bdev() and thaw_bdev() Frank Sorenson
` (3 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
There is currently no visibility into when or how
a filesystem became frozen, and no record of freeze/
thaw activity. Add tracepoints for these events.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
---
include/trace/events/fs.h | 109 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 109 insertions(+)
create mode 100644 include/trace/events/fs.h
diff --git a/include/trace/events/fs.h b/include/trace/events/fs.h
new file mode 100644
index 0000000..d7d933e
--- /dev/null
+++ b/include/trace/events/fs.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (c) 2016 Red Hat, Inc., All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fs
+#define TRACE_INCLUDE_FILE fs
+
+#if !defined(_TRACE_FS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FS_H_
+#include <linux/tracepoint.h>
+
+struct block_device;
+struct super_block;
+#include <linux/fs.h>
+
+#define SUPER_ID_MAX_LEN FIELD_SIZEOF(struct super_block, s_id)
+#define FSTYPE_MAX_LEN 32 /* choose an arbitrary limit */
+
+DECLARE_EVENT_CLASS(super_freezethaw_class,
+ TP_PROTO(struct super_block *sb),
+ TP_ARGS(sb),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ memcpy(__entry->super_id, sb->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, sb->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = sb->s_dev;
+ __entry->frozen = sb->s_writers.frozen;
+ __entry->pid = current->pid;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) frozen=%d",
+ __entry->comm, __entry->pid, __entry->fstype,
+ __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->frozen
+ )
+);
+
+DECLARE_EVENT_CLASS(super_freezethaw_exit_class,
+ TP_PROTO(struct super_block *sb, int ret),
+ TP_ARGS(sb, ret),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ __field(int, ret )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ memcpy(__entry->super_id, sb->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, sb->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = sb->s_dev;
+ __entry->frozen = sb->s_writers.frozen;
+ __entry->pid = current->pid;
+ __entry->ret = ret;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) frozen=%d ret=%d",
+ __entry->comm, __entry->pid, __entry->fstype,
+ __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->frozen, __entry->ret
+ )
+);
+
+DEFINE_EVENT(super_freezethaw_class, freeze_super_enter,
+ TP_PROTO(struct super_block *sb),
+ TP_ARGS(sb)
+);
+DEFINE_EVENT(super_freezethaw_exit_class, freeze_super_exit,
+ TP_PROTO(struct super_block *sb, int ret),
+ TP_ARGS(sb, ret)
+);
+
+DEFINE_EVENT(super_freezethaw_class, thaw_super_enter,
+ TP_PROTO(struct super_block *sb),
+ TP_ARGS(sb)
+);
+DEFINE_EVENT(super_freezethaw_exit_class, thaw_super_exit,
+ TP_PROTO(struct super_block *sb, int ret),
+ TP_ARGS(sb, ret)
+);
+
+#endif /* _TRACE_FS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.8.3.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] fs/block_dev.c: add trace events for freeze_bdev() and thaw_bdev()
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
` (2 preceding siblings ...)
2016-03-02 19:01 ` [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super() Frank Sorenson
@ 2016-03-02 19:01 ` Frank Sorenson
2016-03-02 19:01 ` [PATCH 5/5] fs: enable filesystem freeze/thaw events Frank Sorenson
` (2 subsequent siblings)
6 siblings, 0 replies; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
There is currently no visibility into when or how the
filesystem on a block device became frozen, and no record of
freeze/thaw activity. Add tracepoints for these events.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
---
include/trace/events/fs_bdev.h | 155 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 155 insertions(+)
create mode 100644 include/trace/events/fs_bdev.h
diff --git a/include/trace/events/fs_bdev.h b/include/trace/events/fs_bdev.h
new file mode 100644
index 0000000..8ba01c8
--- /dev/null
+++ b/include/trace/events/fs_bdev.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2016 Red Hat, Inc., All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fs
+#define TRACE_INCLUDE_FILE fs_bdev
+
+#if !defined(_TRACE_FS_BDEV_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FS_BDEV_H_
+#include <linux/tracepoint.h>
+
+struct block_device;
+struct super_block;
+#include <linux/fs.h>
+
+#define SUPER_ID_MAX_LEN FIELD_SIZEOF(struct super_block, s_id)
+#define FSTYPE_MAX_LEN 32 /* choose an arbitrary limit */
+
+TRACE_EVENT(freeze_bdev_enter,
+ TP_PROTO(struct block_device *bdev),
+ TP_ARGS(bdev),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, bdevname, BDEVNAME_SIZE )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ bdevname(bdev, __entry->bdevname);
+ memcpy(__entry->super_id, get_super(bdev)->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, get_super(bdev)->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = get_super(bdev)->s_dev;
+ __entry->frozen = get_super(bdev)->s_writers.frozen;
+ __entry->pid = current->pid;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) bdev=%s frozen=%d",
+ __entry->comm, __entry->pid,
+ __entry->fstype, __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->bdevname, __entry->frozen
+ )
+);
+
+TRACE_EVENT(freeze_bdev_exit,
+ TP_PROTO(struct block_device *bdev, void *ret),
+ TP_ARGS(bdev, ret),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, bdevname, BDEVNAME_SIZE )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ __field(int, ret )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ bdevname(bdev, __entry->bdevname);
+ memcpy(__entry->super_id, get_super(bdev)->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, get_super(bdev)->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = get_super(bdev)->s_dev;
+ __entry->frozen = get_super(bdev)->s_writers.frozen;
+ __entry->pid = current->pid;
+ __entry->ret = IS_ERR(ret) ? PTR_ERR(ret) : 0;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) bdev=%s frozen=%d ret=%d",
+ __entry->comm, __entry->pid,
+ __entry->fstype, __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->bdevname, __entry->frozen, __entry->ret
+ )
+);
+
+TRACE_EVENT(thaw_bdev_enter,
+ TP_PROTO(struct block_device *bdev, struct super_block *sb),
+ TP_ARGS(bdev, sb),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, bdevname, BDEVNAME_SIZE )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ bdevname(bdev, __entry->bdevname);
+ memcpy(__entry->super_id, sb->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, sb->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = sb->s_dev;
+ __entry->frozen = sb->s_writers.frozen;
+ __entry->pid = current->pid;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) bdev=%s frozen=%d",
+ __entry->comm, __entry->pid,
+ __entry->fstype, __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->bdevname, __entry->frozen
+ )
+);
+
+TRACE_EVENT(thaw_bdev_exit,
+ TP_PROTO(struct block_device *bdev, struct super_block *sb, int ret),
+ TP_ARGS(bdev, sb, ret),
+ TP_STRUCT__entry(
+ __array(char, comm, TASK_COMM_LEN )
+ __array(char, bdevname, BDEVNAME_SIZE )
+ __array(char, super_id, SUPER_ID_MAX_LEN )
+ __array(char, fstype, FSTYPE_MAX_LEN )
+ __field(dev_t, dev )
+ __field(int, frozen )
+ __field(pid_t, pid )
+ __field(int, ret )
+ ),
+ TP_fast_assign(
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ bdevname(bdev, __entry->bdevname);
+ memcpy(__entry->super_id, sb->s_id, SUPER_ID_MAX_LEN);
+ memcpy(__entry->fstype, sb->s_type->name, FSTYPE_MAX_LEN);
+ __entry->dev = sb->s_dev;
+ __entry->frozen = sb->s_writers.frozen;
+ __entry->pid = current->pid;
+ __entry->ret = ret;
+ ),
+ TP_printk("comm=%s pid=%d for %s filesystem '%s' (%d:%d) bdev=%s frozen=%d ret=%d",
+ __entry->comm, __entry->pid,
+ __entry->fstype, __entry->super_id,
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->bdevname, __entry->frozen, __entry->ret
+ )
+);
+
+#endif /* _TRACE_FS_BDEV_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.8.3.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] fs: enable filesystem freeze/thaw events
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
` (3 preceding siblings ...)
2016-03-02 19:01 ` [PATCH 4/5] fs/block_dev.c: add trace events for freeze_bdev() and thaw_bdev() Frank Sorenson
@ 2016-03-02 19:01 ` Frank Sorenson
2016-03-02 20:15 ` Al Viro
2016-03-02 21:47 ` [PATCH 0/5] Add trace events for " Al Viro
2016-03-03 11:18 ` Dave Wysochanski
6 siblings, 1 reply; 14+ messages in thread
From: Frank Sorenson @ 2016-03-02 19:01 UTC (permalink / raw)
To: linux-fsdevel; +Cc: dwysocha, lvaz
Add tracepoints to enable filesystem freeze/thaw
trace events.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
---
fs/block_dev.c | 13 +++++++++++++
fs/super.c | 10 ++++++++++
2 files changed, 23 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1207f40..095ef3e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -32,6 +32,9 @@
#include <asm/uaccess.h>
#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/fs_bdev.h>
+
struct bdev_inode {
struct block_device bdev;
struct inode vfs_inode;
@@ -228,6 +231,8 @@ struct super_block *freeze_bdev(struct block_device *bdev)
struct super_block *sb;
int error = 0;
+ trace_freeze_bdev_enter(bdev);
+
mutex_lock(&bdev->bd_fsfreeze_mutex);
if (++bdev->bd_fsfreeze_count > 1) {
/*
@@ -258,6 +263,9 @@ out:
sync_blockdev(bdev);
out_unlock:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
+
+ trace_freeze_bdev_exit(bdev, sb);
+
return sb; /* thaw_bdev releases s->s_umount */
}
EXPORT_SYMBOL(freeze_bdev);
@@ -273,6 +281,8 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
int error = -EINVAL;
+ trace_thaw_bdev_enter(bdev, sb);
+
mutex_lock(&bdev->bd_fsfreeze_mutex);
if (!bdev->bd_fsfreeze_count)
goto out;
@@ -292,6 +302,9 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
bdev->bd_fsfreeze_count++;
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
+
+ trace_thaw_bdev_exit(bdev, sb, error);
+
return error;
}
EXPORT_SYMBOL(thaw_bdev);
diff --git a/fs/super.c b/fs/super.c
index 5e9a974..70218ed 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -35,6 +35,8 @@
#include <linux/lockdep.h>
#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/fs.h>
static LIST_HEAD(super_blocks);
static DEFINE_SPINLOCK(sb_lock);
@@ -1274,6 +1276,8 @@ int freeze_super(struct super_block *sb)
{
int ret = 0;
+ trace_freeze_super_enter(sb);
+
atomic_inc(&sb->s_active);
down_write(&sb->s_umount);
if (sb->s_writers.frozen != SB_UNFROZEN) {
@@ -1326,6 +1330,8 @@ out_complete:
out_unlock:
up_write(&sb->s_umount);
out:
+ trace_freeze_super_exit(sb, ret);
+
return ret;
}
EXPORT_SYMBOL(freeze_super);
@@ -1340,6 +1346,8 @@ int thaw_super(struct super_block *sb)
{
int ret = 0;
+ trace_thaw_super_enter(sb);
+
down_write(&sb->s_umount);
if (sb->s_writers.frozen == SB_UNFROZEN) {
up_write(&sb->s_umount);
@@ -1368,6 +1376,8 @@ out_wake:
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
out:
+ trace_thaw_super_exit(sb, ret);
+
return ret;
}
EXPORT_SYMBOL(thaw_super);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling
2016-03-02 19:01 ` [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling Frank Sorenson
@ 2016-03-02 20:06 ` Al Viro
0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2016-03-02 20:06 UTC (permalink / raw)
To: Frank Sorenson; +Cc: linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 01:01:55PM -0600, Frank Sorenson wrote:
> Change freeze_super() and thaw_super() to have a common exit
What makes it a simplification in any kind of sense?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super()
2016-03-02 19:01 ` [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super() Frank Sorenson
@ 2016-03-02 20:12 ` Al Viro
0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2016-03-02 20:12 UTC (permalink / raw)
To: Frank Sorenson; +Cc: linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 01:01:57PM -0600, Frank Sorenson wrote:
> There is currently no visibility into when or how
> a filesystem became frozen, and no record of freeze/
> thaw activity. Add tracepoints for these events.
NAK.
> + TP_fast_assign(
> + memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
> + memcpy(__entry->super_id, sb->s_id, SUPER_ID_MAX_LEN);
> + memcpy(__entry->fstype, sb->s_type->name, FSTYPE_MAX_LEN);
> + __entry->dev = sb->s_dev;
> + __entry->frozen = sb->s_writers.frozen;
This turns the value of ->s_writers.frozen into a fucking userland ABI.
Committing us to maintain it forever. No, thanks.
BTW, how is your "record the pid" supposed to interact with containers and
pidns?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] fs: enable filesystem freeze/thaw events
2016-03-02 19:01 ` [PATCH 5/5] fs: enable filesystem freeze/thaw events Frank Sorenson
@ 2016-03-02 20:15 ` Al Viro
0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2016-03-02 20:15 UTC (permalink / raw)
To: Frank Sorenson; +Cc: linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 01:01:59PM -0600, Frank Sorenson wrote:
> Add tracepoints to enable filesystem freeze/thaw
> trace events.
... and they are defined in highly meaningful terms of entering and leaving
a specific function. Which implementation detail, again, is turned into
a userland ABI, with usual stability warranties.
Again, NAK.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Add trace events for filesystem freeze/thaw events
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
` (4 preceding siblings ...)
2016-03-02 19:01 ` [PATCH 5/5] fs: enable filesystem freeze/thaw events Frank Sorenson
@ 2016-03-02 21:47 ` Al Viro
2016-03-02 22:47 ` Dave Chinner
2016-03-03 11:18 ` Dave Wysochanski
6 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2016-03-02 21:47 UTC (permalink / raw)
To: Frank Sorenson; +Cc: linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 01:01:54PM -0600, Frank Sorenson wrote:
> Currently, the only visibility into filesystem freeze or
> thaw activity is when an error is returned from the actual
> call to freeze or thaw. If the action itself hangs, there
> is no indication that the freeze or thaw was in-progress,
> short of collecting a vmcore.
>
> There is also no record of what process froze a filesystem
> or when it happened, so if the process does not thaw it
> later, debugging the issue is difficult.
>
> These patches add tracepoints to the generic filesystem
> freeze and thaw functions. When enabled, trace events
> will create a record of these activities.
Let's provide a sane stable ABI for the things it's going to be
used for. Because _this_ is very likely to end up with
"something needs to keep track of those events boot-to-shutdown" ->
"something in systemd guts will be keeping track of those and
broadcasting the collected information on state over dbus for the
rest of dbus-infested system to see" -> "touch any details and you
break real userland code" -> "it's cast in stone forever". With
systemd folks not particularly happy about the details, but even
less happy about the need to make the already grotty code parsing
those depend on the kernel version.
And as much as I don't like Lennart et.al., in this case they would
be perfectly justified. Information in question is potentially interesting,
due to the form it is presented in one really needs to keep track of all
messages since boot to make any use of it and it's clearly a job for
a long-running daemon to collect those - nothing else would be able to.
And once such a daemon starts using that, its authors would have very good
reasons to demand the fucking ABI to be fucking stable. After all, they
weren't the ones who came up with the nasty details we might want to change.
So let's get it right. Preferably - without need for boot-to-shutdown
tracking just to mirror the state. What do we really want?
* an ioctl to query the state (frozen/freezing/not frozen) for something in
util-linux to use?
* /proc/fs/freezing and /proc/fs/frozen, with ->s_id of affected filesystems
or, pehaps, one file with (frozen|freezing) + ->s_id?
* ability to audit on state changes? That'd need some thought re what to
do when some joker freezes the fs syslogd is logging to...
* something else entirely?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Add trace events for filesystem freeze/thaw events
2016-03-02 21:47 ` [PATCH 0/5] Add trace events for " Al Viro
@ 2016-03-02 22:47 ` Dave Chinner
2016-03-02 23:22 ` Al Viro
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-03-02 22:47 UTC (permalink / raw)
To: Al Viro; +Cc: Frank Sorenson, linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 09:47:44PM +0000, Al Viro wrote:
> On Wed, Mar 02, 2016 at 01:01:54PM -0600, Frank Sorenson wrote:
> Let's provide a sane stable ABI for the things it's going to be
> used for. Because _this_ is very likely to end up with
> "something needs to keep track of those events boot-to-shutdown" ->
> "something in systemd guts will be keeping track of those and
> broadcasting the collected information on state over dbus for the
> rest of dbus-infested system to see" -> "touch any details and you
> break real userland code" -> "it's cast in stone forever". With
> systemd folks not particularly happy about the details, but even
> less happy about the need to make the already grotty code parsing
> those depend on the kernel version.
>
> And as much as I don't like Lennart et.al., in this case they would
> be perfectly justified. Information in question is potentially interesting,
> due to the form it is presented in one really needs to keep track of all
> messages since boot to make any use of it and it's clearly a job for
> a long-running daemon to collect those - nothing else would be able to.
> And once such a daemon starts using that, its authors would have very good
> reasons to demand the fucking ABI to be fucking stable. After all, they
> weren't the ones who came up with the nasty details we might want to change.
>
> So let's get it right. Preferably - without need for boot-to-shutdown
> tracking just to mirror the state. What do we really want?
>
> * an ioctl to query the state (frozen/freezing/not frozen) for something in
> util-linux to use?
>
> * /proc/fs/freezing and /proc/fs/frozen, with ->s_id of affected filesystems
> or, pehaps, one file with (frozen|freezing) + ->s_id?
>
> * ability to audit on state changes? That'd need some thought re what to
> do when some joker freezes the fs syslogd is logging to...
>
> * something else entirely?
Such as:
https://lkml.org/lkml/2015/6/16/456
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Add trace events for filesystem freeze/thaw events
2016-03-02 22:47 ` Dave Chinner
@ 2016-03-02 23:22 ` Al Viro
2016-03-02 23:52 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2016-03-02 23:22 UTC (permalink / raw)
To: Dave Chinner; +Cc: Frank Sorenson, linux-fsdevel, dwysocha, lvaz
On Thu, Mar 03, 2016 at 09:47:30AM +1100, Dave Chinner wrote:
> > So let's get it right. Preferably - without need for boot-to-shutdown
> > tracking just to mirror the state. What do we really want?
> >
> > * an ioctl to query the state (frozen/freezing/not frozen) for something in
> > util-linux to use?
> >
> > * /proc/fs/freezing and /proc/fs/frozen, with ->s_id of affected filesystems
> > or, pehaps, one file with (frozen|freezing) + ->s_id?
> >
> > * ability to audit on state changes? That'd need some thought re what to
> > do when some joker freezes the fs syslogd is logging to...
> >
> > * something else entirely?
>
> Such as:
>
> https://lkml.org/lkml/2015/6/16/456
IIRC, there had been a weird use of vfsmounts as proxies for superblocks and
even more strange requirements along the lines "you should just pick one and
use only it in case if there's more than one mountpoint for this fs". It's
been a while, though, so I might've forgotten if that stuff got resolved in
later thread...
Lifetime rules are really odd there; the object is killed as we are about
to gut vfsmount for good, even though the code in there appears to assume
that it would stay connected to mount tree all the way until that point.
Again, that might have been resolved later; I really can't recall right now...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Add trace events for filesystem freeze/thaw events
2016-03-02 23:22 ` Al Viro
@ 2016-03-02 23:52 ` Dave Chinner
0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2016-03-02 23:52 UTC (permalink / raw)
To: Al Viro; +Cc: Frank Sorenson, linux-fsdevel, dwysocha, lvaz
On Wed, Mar 02, 2016 at 11:22:54PM +0000, Al Viro wrote:
> On Thu, Mar 03, 2016 at 09:47:30AM +1100, Dave Chinner wrote:
>
> > > So let's get it right. Preferably - without need for boot-to-shutdown
> > > tracking just to mirror the state. What do we really want?
> > >
> > > * an ioctl to query the state (frozen/freezing/not frozen) for something in
> > > util-linux to use?
> > >
> > > * /proc/fs/freezing and /proc/fs/frozen, with ->s_id of affected filesystems
> > > or, pehaps, one file with (frozen|freezing) + ->s_id?
> > >
> > > * ability to audit on state changes? That'd need some thought re what to
> > > do when some joker freezes the fs syslogd is logging to...
> > >
> > > * something else entirely?
> >
> > Such as:
> >
> > https://lkml.org/lkml/2015/6/16/456
>
> IIRC, there had been a weird use of vfsmounts as proxies for superblocks and
> even more strange requirements along the lines "you should just pick one and
> use only it in case if there's more than one mountpoint for this fs". It's
> been a while, though, so I might've forgotten if that stuff got resolved in
> later thread...
>
> Lifetime rules are really odd there; the object is killed as we are about
> to gut vfsmount for good, even though the code in there appears to assume
> that it would stay connected to mount tree all the way until that point.
> Again, that might have been resolved later; I really can't recall right now...
Agreed, there were lots of unresolved problems with that patch set
(e.g. the duplicate space accounting infrastructure, rather than
just calling vfs_statfs() to grab the free/used space from the
filesystems).
I was just making the point that we really need a generic fs event
mechanism, not just something specific to freeze/unfreeze. I've kind
of been waiting for the kdbus stuff to work itself out, because if
there's going to be a generic kernel-wide event mechanism added to
the kernel, we should just be using that...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Add trace events for filesystem freeze/thaw events
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
` (5 preceding siblings ...)
2016-03-02 21:47 ` [PATCH 0/5] Add trace events for " Al Viro
@ 2016-03-03 11:18 ` Dave Wysochanski
6 siblings, 0 replies; 14+ messages in thread
From: Dave Wysochanski @ 2016-03-03 11:18 UTC (permalink / raw)
To: Frank Sorenson; +Cc: linux-fsdevel, lvaz
On Wed, 2016-03-02 at 13:01 -0600, Frank Sorenson wrote:
> Currently, the only visibility into filesystem freeze or
> thaw activity is when an error is returned from the actual
> call to freeze or thaw. If the action itself hangs, there
> is no indication that the freeze or thaw was in-progress,
> short of collecting a vmcore.
>
> There is also no record of what process froze a filesystem
> or when it happened, so if the process does not thaw it
> later, debugging the issue is difficult.
>
Thanks for getting the ball rolling Frank!
For some background, Frank can correct me if I'm wrong, but I think the
main "use case" which spurred these patches is as follows. Frank,
myself and others supporting enterprise customers on Linux very
frequently see vmcores come in with customers reporting "hung systems".
We analyze the vmcore and see that the root filesystem, and often every
other local filesystem is frozen. In the vmcore we see many processes
blocked waiting on a filesystem to become unfrozen. In short, as a
result of the filesystems being frozen, eventually the system grinds to
a halt and users wonder what happened.
So the discussion then turns to "Who froze the filesystem, and why
didn't they unfreeze it"? One approach we have is to give the customer
a systemtap script which basically does what these patches do and shows
which process issued a 'freeze' call and then 'unfreeze'. As "simple"
as systemtap is, many customers run into problems with installing the
requirements and/or they don't want to do it. It also requires they
reproduce the issue and give us another vmcore so we can see the output.
The discussion almost always ends with "Contact those responsible for
backup / snapshotting of your hung system, as this is most likely an
issue with snapshots / backup", but unfortunately we can't give many
more details than that since we don't have any log of what happened.
I thought about this some more and I realized the fact that if the
customer gives us a vmcore and it contains frozen filesystem, all
parties involved have "already lost". This indicates usually a couple
things:
1. They didn't know what happened
2. There's a bug in freeze / thaw, or backup/snapshot agent, which needs
fixed
On problem #1, I wonder now if a 'freeze' operation should be treated
more like a "shutdown" where all users are notified this is about to
happen. One downside of this is I can imagine this defeats the intent
of such "VM snapshots" or "transparent backups" where they probably
don't want people to notice a backup / snapshot is being taken. One the
diagnostic side though, we definitely want to know what happened when
something goes wrong, so there's probably a conflict there. The other
downside is such notification may not be possible since many of these
problems occur with 3rd party add-ons. Then again, the leading
virtualization vendor has recently open sourced their agent so it may be
possible.
On problem #2, this patchset or something similar will help only to hone
in on who is to blame faster, but it won't fix anything of course.
In closing, I want to point out that based on the above use case, a
simple one line printk in just the freeze path would help. That is, if
we just knew who the last process was which issued a freeze on a given
filesystem was, and that was always present in any system log / vmcore,
we'd be able to help customers in a more definitive way.
All this said, I don't think Frank, myself or anyone else working on
such problems for customers wants to add slightly better diagnostics,
but create a nightmare of maintenance of tracepoints or unknown side
effects that outweigh any benefit.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-03-03 11:18 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-02 19:01 [PATCH 0/5] Add trace events for filesystem freeze/thaw events Frank Sorenson
2016-03-02 19:01 ` [PATCH 1/5] fs: simplify freeze_super()/thaw_super() exit handling Frank Sorenson
2016-03-02 20:06 ` Al Viro
2016-03-02 19:01 ` [PATCH 2/5] fs/block_dev.c: simplify freeze_bdev() and thaw_bdev() " Frank Sorenson
2016-03-02 19:01 ` [PATCH 3/5] fs: add trace events for freeze_super() and thaw_super() Frank Sorenson
2016-03-02 20:12 ` Al Viro
2016-03-02 19:01 ` [PATCH 4/5] fs/block_dev.c: add trace events for freeze_bdev() and thaw_bdev() Frank Sorenson
2016-03-02 19:01 ` [PATCH 5/5] fs: enable filesystem freeze/thaw events Frank Sorenson
2016-03-02 20:15 ` Al Viro
2016-03-02 21:47 ` [PATCH 0/5] Add trace events for " Al Viro
2016-03-02 22:47 ` Dave Chinner
2016-03-02 23:22 ` Al Viro
2016-03-02 23:52 ` Dave Chinner
2016-03-03 11:18 ` Dave Wysochanski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).