* [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks
@ 2017-12-13 6:18 Darrick J. Wong
2017-12-13 6:21 ` [PATCH 1/2] xfs: eBPF user hacks insanity Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-12-13 6:18 UTC (permalink / raw)
To: linux-xfs; +Cc: Richard Wareing, david, hch
Heh.
So here's a kernel patch that builds on Josef's eBPF return value
override patch series to provide an eBPF-kprobe-overridable hook
function so that administrators can program XFS to redirect a file to
the rt device or the data device depending on their own funny
constraints any time that the system is doing the first write into an
empty file.
The second patch is against Brendan Gregg's bcc repository; it adds a
python script to compile and inject a sample eBPF program that behaves
(roughly) the same as Richard's earlier patches.
Soooo... rather than dumping a bunch of static code into XFS to support
his particular usecase, we're building him an eBPF Bazooka and telling
him to have fun. :P
(More generally, it's a science fair project for letting people
customize XFS behavior with eBPF in a controlled manner.)
--D
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] xfs: eBPF user hacks insanity
2017-12-13 6:18 [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Darrick J. Wong
@ 2017-12-13 6:21 ` Darrick J. Wong
2017-12-13 6:22 ` [PATCH 2/2] tools/xfs: use XFS hacks to override data block device placement Darrick J. Wong
2017-12-21 13:33 ` [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Christoph Hellwig
2 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-12-13 6:21 UTC (permalink / raw)
To: linux-xfs; +Cc: Richard Wareing, david, hch
From: Darrick J. Wong <darrick.wong@oracle.com>
Create some special filter functions to which userspace can attach eBPF
programs which override the return value and thereby allow userspace to
assist XFS in making contextualized decisions about where to put
files.
In other words, users can upload their own custom algorithms into XFS to
override the default rtdev/datadev placement code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Kconfig | 9 ++
fs/xfs/Makefile | 2 +
fs/xfs/xfs_bmap_util.c | 5 +
fs/xfs/xfs_hacks.c | 159 +++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_hacks.h | 29 ++++++++
fs/xfs/xfs_iomap.c | 7 ++
kernel/trace/trace_kprobe.c | 2 +
7 files changed, 213 insertions(+)
create mode 100644 fs/xfs/xfs_hacks.c
create mode 100644 fs/xfs/xfs_hacks.h
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 06be67d..1594822 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -143,3 +143,12 @@ config XFS_ASSERT_FATAL
result in warnings.
This behavior can be modified at runtime via sysfs.
+
+config XFS_HACKS
+ bool "XFS Userspace eBPF Hacks"
+ default n
+ depends on XFS_FS && BPF_KPROBE_OVERRIDE
+ help
+ Allow userspace to attach eBPF programs to various parts of XFS
+ in order to customize its decisions. This is insane; you get
+ to keep the pieces!
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9f4de14..03e2a37 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -176,3 +176,5 @@ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \
)
xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o
endif
+
+xfs-$(CONFIG_XFS_HACKS) += xfs_hacks.o
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6d37ab4..39ce418 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -45,6 +45,7 @@
#include "xfs_iomap.h"
#include "xfs_reflink.h"
#include "xfs_refcount.h"
+#include "xfs_hacks.h"
/* Kernel only BMAP related definitions and functions */
@@ -918,6 +919,10 @@ xfs_alloc_file_space(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
+ error = xfs_hacks_retarget_iflags(ip, offset, len);
+ if (error)
+ return error;
+
error = xfs_qm_dqattach(ip, 0);
if (error)
return error;
diff --git a/fs/xfs/xfs_hacks.c b/fs/xfs/xfs_hacks.c
new file mode 100644
index 0000000..28d2852
--- /dev/null
+++ b/fs/xfs/xfs_hacks.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright (C) 2017 Oracle. All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_btree.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_itable.h"
+#include "xfs_fsops.h"
+#include <linux/bpf.h>
+
+static void
+xfs_hacks_warn(
+ struct xfs_mount *mp)
+{
+ static struct ratelimit_state hack_warning = RATELIMIT_STATE_INIT(
+ __func__, 86400 * HZ, 1);
+
+ ratelimit_set_flags(&hack_warning, RATELIMIT_MSG_ON_RELEASE);
+ if (__ratelimit(&hack_warning))
+ xfs_alert(mp,
+"WARNING userspace eBPF hack feature in use. Use at your own risk!");
+}
+
+/*
+ * Return current xflags unless someone attaches an eBPF program to
+ * override the default return value to feed the inode different xflags.
+ * This is the mechanism through which userspace can make more
+ * contextual decisions about where to put a file.
+ *
+ * ftrace cannot attach to this function if it is too short, so we have
+ * three throwaway calls to trace_printk to ensure that we have enough
+ * bytes... or something.
+ */
+uint
+xfs_hack_filter_iflags(
+ struct xfs_fsop_geom *geo,
+ struct xfs_fsop_counts *stats,
+ xfs_ino_t ino,
+ loff_t offset,
+ loff_t length,
+ uint xflags)
+{
+ trace_printk("C: off=%llu len=%llu xflags=0x%x\\n",
+ offset, length, xflags);
+ trace_printk("C: dblocks=%llu rblocks=%llu\\n",
+ geo->datablocks, geo->rtblocks);
+ trace_printk("C: dfree=%llu rfree=%llu\\n",
+ stats->freedata, stats->freertx);
+
+ return xflags;
+}
+BPF_ALLOW_ERROR_INJECTION(xfs_hack_filter_iflags);
+
+/*
+ * Change flags on empty files, if so desired.
+ */
+#define XFS_XFLAGS_CAN_RETARGET (FS_XFLAG_REALTIME)
+int
+xfs_hacks_retarget_iflags(
+ struct xfs_inode *ip,
+ loff_t offset,
+ loff_t length)
+{
+ struct xfs_fsop_geom fsgeo;
+ struct xfs_fsop_counts stats;
+ struct xfs_trans *tp;
+ struct xfs_mount *mp = ip->i_mount;
+ uint16_t flags;
+ uint64_t flags2;
+ uint curr_xflags;
+ uint new_xflags;
+ int error = 0;
+
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
+ if (error)
+ return error;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+ flags = ip->i_d.di_flags;
+ flags2 = ip->i_d.di_flags2;
+
+ /* Only allow retargeting of empty files. */
+ if (i_size_read(VFS_I(ip)) || ip->i_d.di_nextents || ip->i_d.di_size)
+ goto out_unlock;
+
+ error = xfs_fs_geometry(mp, &fsgeo, 4);
+ if (error)
+ goto out_unlock;
+ error = xfs_fs_counts(mp, &stats);
+ if (error)
+ goto out_unlock;
+
+ curr_xflags = xfs_ip2xflags(ip);
+ new_xflags = xfs_hack_filter_iflags(&fsgeo, &stats, ip->i_ino, offset,
+ length, curr_xflags);
+
+ if (new_xflags == curr_xflags)
+ goto out_unlock;
+
+ xfs_hacks_warn(mp);
+
+ error = -EINVAL;
+ if ((new_xflags ^ curr_xflags) & ~XFS_XFLAGS_CAN_RETARGET)
+ goto out_unlock;
+
+ /* Change the rt flag. */
+ if (new_xflags & FS_XFLAG_REALTIME) {
+ if (!mp->m_rtdev_targp)
+ goto out_unlock;
+
+ if (xfs_is_reflink_inode(ip))
+ flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+ flags |= XFS_DIFLAG_REALTIME;
+ } else {
+ flags &= ~XFS_DIFLAG_REALTIME;
+ }
+
+ /* Log inode and get out. */
+ ip->i_d.di_flags = flags;
+ ip->i_d.di_flags2 = flags2;
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ return xfs_trans_commit(tp);
+
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_cancel(tp);
+ return error;
+}
diff --git a/fs/xfs/xfs_hacks.h b/fs/xfs/xfs_hacks.h
new file mode 100644
index 0000000..2c556f1
--- /dev/null
+++ b/fs/xfs/xfs_hacks.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2017 Oracle. All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+#ifndef __XFS_HACKS_H__
+#define __XFS_HACKS_H__
+
+#ifdef CONFIG_XFS_HACKS
+int xfs_hacks_retarget_iflags(struct xfs_inode *ip, loff_t offset, loff_t length);
+#else
+# define xfs_hacks_retarget_iflags(ip, off, len) (0)
+#endif
+
+#endif /* __XFS_HACKS_H__ */
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7ab52a8..f69d274 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -42,6 +42,7 @@
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
#include "xfs_reflink.h"
+#include "xfs_hacks.h"
#define XFS_WRITEIO_ALIGN(mp,off) (((off) >> mp->m_writeio_log) \
@@ -987,6 +988,12 @@ xfs_file_iomap_begin(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
+ if (flags & IOMAP_WRITE) {
+ error = xfs_hacks_retarget_iflags(ip, offset, length);
+ if (error)
+ return error;
+ }
+
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5db8498..fd948e3 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1215,8 +1215,10 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
if (__this_cpu_read(bpf_kprobe_override)) {
__this_cpu_write(bpf_kprobe_override, 0);
reset_current_kprobe();
+ preempt_enable();
return 1;
}
+ preempt_enable();
if (!ret)
return 0;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] tools/xfs: use XFS hacks to override data block device placement
2017-12-13 6:18 [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Darrick J. Wong
2017-12-13 6:21 ` [PATCH 1/2] xfs: eBPF user hacks insanity Darrick J. Wong
@ 2017-12-13 6:22 ` Darrick J. Wong
2017-12-21 13:33 ` [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Christoph Hellwig
2 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-12-13 6:22 UTC (permalink / raw)
To: linux-xfs; +Cc: Richard Wareing, david, hch
From: Darrick J. Wong <darrick.wong@oracle.com>
This (bcc) patch modifies bcc so that we can override some function
return values. We then create a new python script containing custom
logic to decide where a file's data goes (rtdev or datadev) and inject
the compiled eBPF code into the kernel.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
src/cc/compat/linux/bpf.h | 7 ++
src/cc/compat/linux/virtual_bpf.h | 3 +
src/cc/export/helpers.h | 2 +
tools/xfs_rt.py | 130 +++++++++++++++++++++++++++++++++++++
4 files changed, 140 insertions(+), 2 deletions(-)
create mode 100755 tools/xfs_rt.py
diff --git a/src/cc/compat/linux/bpf.h b/src/cc/compat/linux/bpf.h
index f896897..5a3ec0b 100644
--- a/src/cc/compat/linux/bpf.h
+++ b/src/cc/compat/linux/bpf.h
@@ -677,6 +677,10 @@ union bpf_attr {
* @buf: buf to fill
* @buf_size: size of the buf
* Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -736,7 +740,8 @@ union bpf_attr {
FN(xdp_adjust_meta), \
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
- FN(getsockopt),
+ FN(getsockopt), \
+ FN(override_return),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/src/cc/compat/linux/virtual_bpf.h b/src/cc/compat/linux/virtual_bpf.h
index a2bcf07..7fbc365 100644
--- a/src/cc/compat/linux/virtual_bpf.h
+++ b/src/cc/compat/linux/virtual_bpf.h
@@ -735,7 +735,8 @@ union bpf_attr {
FN(xdp_adjust_meta), \
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
- FN(getsockopt),
+ FN(getsockopt), \
+ FN(override_return),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/src/cc/export/helpers.h b/src/cc/export/helpers.h
index 2b64ee8..571191e 100644
--- a/src/cc/export/helpers.h
+++ b/src/cc/export/helpers.h
@@ -204,6 +204,8 @@ static int (*bpf_probe_read)(void *dst, u64 size, const void *unsafe_ptr) =
(void *) BPF_FUNC_probe_read;
static u64 (*bpf_ktime_get_ns)(void) =
(void *) BPF_FUNC_ktime_get_ns;
+static void (*bpf_override_return)(void *ctx, unsigned long rc) =
+ (void *) BPF_FUNC_override_return;
static u32 (*bpf_get_prandom_u32)(void) =
(void *) BPF_FUNC_get_prandom_u32;
static int (*bpf_trace_printk_)(const char *fmt, u64 fmt_size, ...) =
diff --git a/tools/xfs_rt.py b/tools/xfs_rt.py
new file mode 100755
index 0000000..b44fa14
--- /dev/null
+++ b/tools/xfs_rt.py
@@ -0,0 +1,130 @@
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# xfs_rt Decide on file data block device placement via custom algorithm.
+# Uses XFS hacks to inject... stuff.
+#
+# Copyright 2017 Oracle, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+
+from __future__ import print_function
+from bcc import BPF
+import argparse
+from time import sleep, strftime
+import ctypes as ct
+
+# arguments
+examples = """examples:
+ ./xfs_rt
+"""
+parser = argparse.ArgumentParser(
+ description="Custom placement of data file blocks on XFS",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=examples)
+args = parser.parse_args()
+debug = 0
+
+# define BPF program
+bpf_text = """
+#include <uapi/linux/ptrace.h>
+#include <linux/fs.h>
+
+struct xfs_fsop_geom {
+ __u32 blocksize; /* filesystem (data) block size */
+ __u32 rtextsize; /* realtime extent size */
+ __u32 agblocks; /* fsblocks in an AG */
+ __u32 agcount; /* number of allocation groups */
+ __u32 logblocks; /* fsblocks in the log */
+ __u32 sectsize; /* (data) sector size, bytes */
+ __u32 inodesize; /* inode size in bytes */
+ __u32 imaxpct; /* max allowed inode space(%) */
+ __u64 datablocks; /* fsblocks in data subvolume */
+ __u64 rtblocks; /* fsblocks in realtime subvol */
+ __u64 rtextents; /* rt extents in realtime subvol*/
+ __u64 logstart; /* starting fsblock of the log */
+ unsigned char uuid[16]; /* unique id of the filesystem */
+ __u32 sunit; /* stripe unit, fsblocks */
+ __u32 swidth; /* stripe width, fsblocks */
+ __s32 version; /* structure version */
+ __u32 flags; /* superblock version flags */
+ __u32 logsectsize; /* log sector size, bytes */
+ __u32 rtsectsize; /* realtime sector size, bytes */
+ __u32 dirblocksize; /* directory block size, bytes */
+ __u32 logsunit; /* log stripe unit, bytes */
+};
+
+/* Output for XFS_FS_COUNTS */
+struct xfs_fsop_counts {
+ __u64 freedata; /* free data section blocks */
+ __u64 freertx; /* free rt extents */
+ __u64 freeino; /* free inodes */
+ __u64 allocino; /* total allocated inodes */
+};
+
+typedef unsigned long long xfs_ino_t;
+
+int
+xfs_hack_filter_iflags_begin(
+ struct pt_regs *ctx,
+ struct xfs_fsop_geom *geo,
+ struct xfs_fsop_counts *stats,
+ xfs_ino_t ino,
+ loff_t offset,
+ loff_t length,
+ uint xflags)
+{
+ bool use_rt = false;
+
+#if 0
+ bpf_trace_printk("B: off=%llu len=%llu xflags=0x%x\\n", offset, length, xflags);
+ bpf_trace_printk("B: dblocks=%llu rblocks=%llu\\n", geo->datablocks, geo->rtblocks);
+ bpf_trace_printk("B: dfree=%llu rfree=%llu\\n", stats->freedata, stats->freertx);
+#endif
+
+ /*
+ * If the first allocation request is for >64k then we assume this
+ * is a "large" file and push it to the rt device.
+ */
+ if (length >= 65536)
+ use_rt = true;
+
+ /*
+ * Redirect files to the 'other' device if the chosen one is more
+ * than 80% full.
+ */
+ if (use_rt && stats->freertx < geo->rtblocks / 5)
+ use_rt = false;
+ else if (!use_rt && stats->freedata < geo->datablocks / 5)
+ use_rt = true;
+
+ if (use_rt)
+ xflags |= FS_XFLAG_REALTIME;
+ else
+ xflags &= ~FS_XFLAG_REALTIME;
+
+ bpf_override_return(ctx, xflags);
+ return 0;
+}
+
+"""
+if debug:
+ print(bpf_text)
+
+# initialize BPF
+b = BPF(text=bpf_text)
+
+# common file functions
+b.attach_kprobe(event="xfs_hack_filter_iflags", fn_name="xfs_hack_filter_iflags_begin")
+
+print("BPF HACKING XFS... Hit Ctrl-C to end.")
+
+# output
+exiting = 0
+while (1):
+ try:
+ sleep(99999999)
+ except KeyboardInterrupt:
+ exiting = 1
+
+ if exiting:
+ exit()
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks
2017-12-13 6:18 [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Darrick J. Wong
2017-12-13 6:21 ` [PATCH 1/2] xfs: eBPF user hacks insanity Darrick J. Wong
2017-12-13 6:22 ` [PATCH 2/2] tools/xfs: use XFS hacks to override data block device placement Darrick J. Wong
@ 2017-12-21 13:33 ` Christoph Hellwig
2017-12-21 16:45 ` Darrick J. Wong
2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:33 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, Richard Wareing, david, hch
Eek. Whie eBPF is a really nice debug tool we should never use
it for actual required kernel I/O functionality.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks
2017-12-21 13:33 ` [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Christoph Hellwig
@ 2017-12-21 16:45 ` Darrick J. Wong
2018-01-04 0:05 ` Richard Wareing
0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2017-12-21 16:45 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs, Richard Wareing, david
On Thu, Dec 21, 2017 at 05:33:38AM -0800, Christoph Hellwig wrote:
> Eek. Whie eBPF is a really nice debug tool we should never use
> it for actual required kernel I/O functionality.
Certainly not in its current hacky form. I'm curious if Richard has had
a chance to try out these patches to see if it affects performance in a
noticeable way?
I /think/ bpf has enough safety mechanisms (no loops, no direct writing
to kernel memory, bytecode verifiers, opcode count limits) that such a
beast could be hidden behind a kconfig option that isn't turned for the
general public. For people who have these particularly specific use
cases I think it better to have a general mechanism to accomodate them
vs. scattering code all over xfs vs. "no sorry go away", though this
ebpf thing isn't necessarily the final answer. We do validate that the
proposed iflags are allowed for the fs geometry, though I acknowledge
that the prospect of running ebpf with ilock_excl does give me pause.
I'm curious, though, what are your (and everyone else's) concerns about
this?
--D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks
2017-12-21 16:45 ` Darrick J. Wong
@ 2018-01-04 0:05 ` Richard Wareing
2018-01-04 0:52 ` Darrick J. Wong
0 siblings, 1 reply; 7+ messages in thread
From: Richard Wareing @ 2018-01-04 0:05 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, linux-xfs@vger.kernel.org, david@fromorbit.com
Hey Darrick,
I'll try to get to testing this out early next week, I was out on vacation
the last couple weeks so I kinda fell off the earth for a bit.
I will test for functionality & performance as best I can, but we'll probably
want to explore everyone's concerns on leveraging eBPF in this way as
well. I have pretty limited experience with BPF, so I'm probably not going
to be super useful in such a discussion, though I'll certainly try to get read
up on it.
Is there any precedence for doing this sort of thing with BPF anywhere
else in the kernel?
Richard
> On Dec 21, 2017, at 8:45 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Thu, Dec 21, 2017 at 05:33:38AM -0800, Christoph Hellwig wrote:
>> Eek. Whie eBPF is a really nice debug tool we should never use
>> it for actual required kernel I/O functionality.
>
> Certainly not in its current hacky form. I'm curious if Richard has had
> a chance to try out these patches to see if it affects performance in a
> noticeable way?
>
> I /think/ bpf has enough safety mechanisms (no loops, no direct writing
> to kernel memory, bytecode verifiers, opcode count limits) that such a
> beast could be hidden behind a kconfig option that isn't turned for the
> general public. For people who have these particularly specific use
> cases I think it better to have a general mechanism to accomodate them
> vs. scattering code all over xfs vs. "no sorry go away", though this
> ebpf thing isn't necessarily the final answer. We do validate that the
> proposed iflags are allowed for the fs geometry, though I acknowledge
> that the prospect of running ebpf with ilock_excl does give me pause.
>
> I'm curious, though, what are your (and everyone else's) concerns about
> this?
>
> --D
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks
2018-01-04 0:05 ` Richard Wareing
@ 2018-01-04 0:52 ` Darrick J. Wong
0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-01-04 0:52 UTC (permalink / raw)
To: Richard Wareing
Cc: Christoph Hellwig, linux-xfs@vger.kernel.org, david@fromorbit.com
On Thu, Jan 04, 2018 at 12:05:21AM +0000, Richard Wareing wrote:
> Hey Darrick,
>
> I'll try to get to testing this out early next week, I was out on vacation
> the last couple weeks so I kinda fell off the earth for a bit.
>
> I will test for functionality & performance as best I can, but we'll probably
> want to explore everyone's concerns on leveraging eBPF in this way as
> well. I have pretty limited experience with BPF, so I'm probably not going
> to be super useful in such a discussion, though I'll certainly try to get read
> up on it.
>
> Is there any precedence for doing this sort of thing with BPF anywhere
> else in the kernel?
Well, we use JIT compiled eBPF to allow userspace to read kernel memory
now, so it's probably fine to let it fiddle with XFS since we've all
already lost anyway. ;)
Seriously, no, there isn't any precedent, hence the ZOMFG RFCRAP tags.
I'm pretty sure that at best BPF XFS falls into a gray area where in
theory all the protections are sufficient but nobody will ever believe
it...
...but that doesn't mean it isn't worth giving the idea a healthy shake
on the mailing list. <cue flames>
--D
> Richard
>
>
> > On Dec 21, 2017, at 8:45 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Thu, Dec 21, 2017 at 05:33:38AM -0800, Christoph Hellwig wrote:
> >> Eek. Whie eBPF is a really nice debug tool we should never use
> >> it for actual required kernel I/O functionality.
> >
> > Certainly not in its current hacky form. I'm curious if Richard has had
> > a chance to try out these patches to see if it affects performance in a
> > noticeable way?
> >
> > I /think/ bpf has enough safety mechanisms (no loops, no direct writing
> > to kernel memory, bytecode verifiers, opcode count limits) that such a
> > beast could be hidden behind a kconfig option that isn't turned for the
> > general public. For people who have these particularly specific use
> > cases I think it better to have a general mechanism to accomodate them
> > vs. scattering code all over xfs vs. "no sorry go away", though this
> > ebpf thing isn't necessarily the final answer. We do validate that the
> > proposed iflags are allowed for the fs geometry, though I acknowledge
> > that the prospect of running ebpf with ilock_excl does give me pause.
> >
> > I'm curious, though, what are your (and everyone else's) concerns about
> > this?
> >
> > --D
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-01-04 0:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-13 6:18 [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Darrick J. Wong
2017-12-13 6:21 ` [PATCH 1/2] xfs: eBPF user hacks insanity Darrick J. Wong
2017-12-13 6:22 ` [PATCH 2/2] tools/xfs: use XFS hacks to override data block device placement Darrick J. Wong
2017-12-21 13:33 ` [ZOMG RFCRAP PATCH 0/2] xfs: horrifying eBPF hacks Christoph Hellwig
2017-12-21 16:45 ` Darrick J. Wong
2018-01-04 0:05 ` Richard Wareing
2018-01-04 0:52 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).