* [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net
@ 2026-04-07 5:05 Cindy Lu
2026-04-07 5:05 ` [RFC v4 1/5] net/filter: allow filters on vhost netdevs Cindy Lu
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
Hi, All
This series wires AF_PACKET-backed packet capture and inject support into
the existing socket chardev backend so filter-redirector can keep using exist
process
Example Usage
=============
Users are expected to create the AF_PACKET socket in userspace, bind it
to the target tap device, and then pass the resulting fd to QEMU via
the existing FD_PLACEHOLDER mechanism.
Creating such a socket requires CAP_NET_RAW (or running as root). A
typical setup looks like:
sock = socket.socket(socket.AF_PACKET, socket.SOCK_RAW,
socket.htons(ETH_P_ALL))
sock.bind((ifname, ETH_P_ALL))
The bound fd can then be passed to QEMU with:
-chardev socket,...,fd=${FD_PLACEHOLDER},...
Primary VM (mirror incoming packets to secondary via chardev socket):
-netdev "tap,id=net0,ifname=${TAP}...vhost=on"
-device "${VIRTIO_NET_DEVICE}"
-chardev "socket,id=chain_out,fd=${FD_PLACEHOLDER},af-packet-mode=capture"
-chardev "socket,id=mirror0,host=${MIRROR_HOST},port=${MIRROR_PORT},reconnect-ms=${MIRROR_RECONNECT_MS}"
-object "filter-redirector,id=r1,netdev=net0,queue=tx,indev=chain_out,status=on,vnet_hdr_support=off,position=head"
-object "filter-redirector,id=r1_mirror,netdev=net0,queue=tx,outdev=mirror0,status=on,vnet_hdr_support=off,insert=behind"
Secondary VM (receive mirrored packets):
-netdev "tap,id=net0,ifname=${TAP}...vhost=on"
-device "${VIRTIO_NET_DEVICE}"
-chardev "socket,id=red0,host=${MIRROR_BIND_HOST},port=${MIRROR_PORT},server=on,wait=off"
-chardev "socket,id=chain_in,fd=${FD_PLACEHOLDER},af-packet-mode=inject"
-object "filter-redirector,id=r1,netdev=net0,queue=tx,indev=red0,status=off,vnet_hdr_support=off,position=head"
-object "filter-redirector,id=r1_inject,netdev=net0,queue=tx,outdev=chain_in,status=off,vnet_hdr_support=off,position=id=r1,insert=behind"
changset
===========
change in v2:
1. add support for filter-buffer
2. remove the in_netdev and out_netdev for AF_PACKET bind port, now only use netdev
when the vhost=on start use AF_PACKET to capture and inject, when use vhost=off will use
the existing code
3. add CAP_NET_RAW check
4. address the comment
change in v3:
1. reuse the exist Capture/inject process
change in v4:
1.move the capture/inject to chardev
2.move the create/bind socket to user script
Testing
=======
- Tested with vhost=on/off TAP device on x86_64
Cindy Lu (5):
net/filter: allow filters on vhost netdevs
chardev/socket: add AF_PACKET initialization
io/channel-socket: tolerate AF_PACKET getpeername
chardev/socket: add AF_PACKET inject path
chardev/socket: add AF_PACKET capture path
chardev/char-socket.c | 385 +++++++++++++++++++++++++++++++++-
chardev/char.c | 3 +
include/chardev/char-socket.h | 13 ++
io/channel-socket.c | 6 +-
net/filter.c | 6 -
qapi/char.json | 23 +-
qemu-options.hx | 5 +-
7 files changed, 429 insertions(+), 12 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC v4 1/5] net/filter: allow filters on vhost netdevs
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
@ 2026-04-07 5:05 ` Cindy Lu
2026-04-07 5:05 ` [RFC v4 2/5] chardev/socket: add AF_PACKET initialization Cindy Lu
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
netfilter_complete() currently rejects netdev backends backed by
vhost_net and returns "Vhost is not supported". The AF_PACKET
capture/inject chardev work still needs filter objects to sit on top
of the existing netdev when the tap queue is owned by vhost, so
remove this
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
net/filter.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/net/filter.c b/net/filter.c
index 76345c1a9d..bb59785911 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -13,7 +13,6 @@
#include "net/filter.h"
#include "net/net.h"
-#include "net/vhost_net.h"
#include "qom/object_interfaces.h"
#include "qemu/iov.h"
#include "qemu/module.h"
@@ -254,11 +253,6 @@ static void netfilter_complete(UserCreatable *uc, Error **errp)
return;
}
- if (get_vhost_net(ncs[0])) {
- error_setg(errp, "Vhost is not supported");
- return;
- }
-
if (strcmp(nf->position, "head") && strcmp(nf->position, "tail")) {
Object *container;
Object *obj;
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC v4 2/5] chardev/socket: add AF_PACKET initialization
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
2026-04-07 5:05 ` [RFC v4 1/5] net/filter: allow filters on vhost netdevs Cindy Lu
@ 2026-04-07 5:05 ` Cindy Lu
2026-04-07 5:05 ` [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername Cindy Lu
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
Teach socket chardevs to recognize AF_PACKET fds and record the state
needed by later TX/RX support. Add af-packet-mode=capture|inject to
QAPI and command-line parsing, reject it for non-fd address types, and
validate after a client fd is attached that the supplied fd really is
AF_PACKET.
The socket chardev now tracks whether the underlying socket is
AF_PACKET, owns separate send/receive staging buffers, and resets that
state on connection teardown and reconnect. The follow-up TX and RX
commits use this setup to translate between the redirector's existing
length-prefixed chardev framing and raw L2 packets.
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
chardev/char-socket.c | 104 +++++++++++++++++++++++++++++++++-
chardev/char.c | 3 +
include/chardev/char-socket.h | 13 +++++
qapi/char.json | 23 +++++++-
qemu-options.hx | 5 +-
5 files changed, 145 insertions(+), 3 deletions(-)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 62852e3caf..c710fdb497 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -23,6 +23,9 @@
*/
#include "qemu/osdep.h"
+#ifdef CONFIG_LINUX
+#include <netpacket/packet.h>
+#endif
#include "chardev/char.h"
#include "io/channel-socket.h"
#include "io/channel-websock.h"
@@ -32,6 +35,7 @@
#include "qapi/error.h"
#include "qapi/clone-visitor.h"
#include "qapi/qapi-visit-sockets.h"
+#include "qapi/util.h"
#include "qemu/yank.h"
#include "trace.h"
@@ -326,6 +330,28 @@ static ssize_t tcp_chr_recv(Chardev *chr, char *buf, size_t len)
return ret;
}
+static bool tcp_chr_is_af_packet(SocketChardev *s)
+{
+#ifdef CONFIG_LINUX
+ return s->sioc && s->sioc->localAddr.ss_family == AF_PACKET;
+#else
+ return false;
+#endif
+}
+
+static void tcp_chr_reset_af_packet_buf(SocketChardev *s)
+{
+ s->af_packet_buf_len = 0;
+ s->af_packet_buf_offset = 0;
+}
+
+static void tcp_chr_reset_af_packet_send(SocketChardev *s)
+{
+ s->af_packet_send_len = 0;
+ s->af_packet_send_offset = 0;
+ s->af_packet_send_len_bytes = 0;
+}
+
static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
{
SocketChardev *s = SOCKET_CHARDEV(chr);
@@ -384,6 +410,15 @@ static void tcp_chr_free_connection(Chardev *chr)
s->sioc = NULL;
object_unref(OBJECT(s->ioc));
s->ioc = NULL;
+ g_free(s->af_packet_buf);
+ s->af_packet_buf = NULL;
+ s->af_packet_buf_size = 0;
+ tcp_chr_reset_af_packet_buf(s);
+ g_free(s->af_packet_send_buf);
+ s->af_packet_send_buf = NULL;
+ s->af_packet_send_buf_size = 0;
+ tcp_chr_reset_af_packet_send(s);
+ s->is_af_packet = false;
g_free(chr->filename);
chr->filename = NULL;
tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
@@ -889,6 +924,26 @@ static void tcp_chr_set_client_ioc_name(Chardev *chr,
}
+static bool tcp_chr_validate_af_packet_mode_fd(Chardev *chr,
+ QIOChannelSocket *sioc,
+ Error **errp)
+{
+ SocketChardev *s = SOCKET_CHARDEV(chr);
+
+ if (!s->af_packet_mode_set) {
+ return true;
+ }
+
+#ifdef CONFIG_LINUX
+ if (sioc->localAddr.ss_family == AF_PACKET) {
+ return true;
+ }
+#endif
+
+ error_setg(errp, "'af-packet-mode' requires an AF_PACKET fd");
+ return false;
+}
+
static int tcp_chr_new_client(Chardev *chr, QIOChannelSocket *sioc)
{
SocketChardev *s = SOCKET_CHARDEV(chr);
@@ -907,6 +962,9 @@ static int tcp_chr_new_client(Chardev *chr, QIOChannelSocket *sioc)
object_ref(OBJECT(sioc));
s->sioc = sioc;
object_ref(OBJECT(sioc));
+ s->is_af_packet = tcp_chr_is_af_packet(s);
+ tcp_chr_reset_af_packet_buf(s);
+ tcp_chr_reset_af_packet_send(s);
if (s->do_nodelay) {
qio_channel_set_delay(s->ioc, false);
@@ -951,6 +1009,11 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
char_socket_yank_iochannel,
QIO_CHANNEL(sioc));
}
+ if (!tcp_chr_validate_af_packet_mode_fd(chr, sioc, NULL)) {
+ tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+ object_unref(OBJECT(sioc));
+ return -1;
+ }
ret = tcp_chr_new_client(chr, sioc);
object_unref(OBJECT(sioc));
return ret;
@@ -990,7 +1053,17 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
char_socket_yank_iochannel,
QIO_CHANNEL(sioc));
}
- tcp_chr_new_client(chr, sioc);
+ if (!tcp_chr_validate_af_packet_mode_fd(chr, sioc, errp)) {
+ tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+ object_unref(OBJECT(sioc));
+ return -1;
+ }
+ if (tcp_chr_new_client(chr, sioc) < 0) {
+ tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
+ object_unref(OBJECT(sioc));
+ error_setg(errp, "failed to initialize socket chardev client");
+ return -1;
+ }
object_unref(OBJECT(sioc));
return 0;
}
@@ -1312,6 +1385,11 @@ static bool qmp_chardev_validate_socket(ChardevSocket *sock,
break;
case SOCKET_ADDRESS_TYPE_UNIX:
+ if (sock->has_af_packet_mode) {
+ error_setg(errp,
+ "'af-packet-mode' option requires 'fd' address type");
+ return false;
+ }
if (sock->tls_creds) {
error_setg(errp,
"'tls_creds' option is incompatible with "
@@ -1321,9 +1399,19 @@ static bool qmp_chardev_validate_socket(ChardevSocket *sock,
break;
case SOCKET_ADDRESS_TYPE_INET:
+ if (sock->has_af_packet_mode) {
+ error_setg(errp,
+ "'af-packet-mode' option requires 'fd' address type");
+ return false;
+ }
break;
case SOCKET_ADDRESS_TYPE_VSOCK:
+ if (sock->has_af_packet_mode) {
+ error_setg(errp,
+ "'af-packet-mode' option requires 'fd' address type");
+ return false;
+ }
if (sock->tls_creds) {
error_setg(errp,
"'tls_creds' option is incompatible with "
@@ -1386,6 +1474,10 @@ static void qmp_chardev_open_socket(Chardev *chr,
s->is_tn3270 = is_tn3270;
s->is_websock = is_websock;
s->do_nodelay = do_nodelay;
+ s->af_packet_mode_set = sock->has_af_packet_mode;
+ if (sock->has_af_packet_mode) {
+ s->af_packet_mode = sock->af_packet_mode;
+ }
if (sock->tls_creds) {
Object *creds;
creds = object_resolve_path_component(
@@ -1463,6 +1555,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
const char *host = qemu_opt_get(opts, "host");
const char *port = qemu_opt_get(opts, "port");
const char *fd = qemu_opt_get(opts, "fd");
+ const char *af_packet_mode = qemu_opt_get(opts, "af-packet-mode");
#ifdef CONFIG_LINUX
bool tight = qemu_opt_get_bool(opts, "tight", true);
bool abstract = qemu_opt_get_bool(opts, "abstract", false);
@@ -1516,6 +1609,15 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
sock->wait = qemu_opt_get_bool(opts, "wait", true);
sock->has_reconnect_ms = qemu_opt_find(opts, "reconnect-ms");
sock->reconnect_ms = qemu_opt_get_number(opts, "reconnect-ms", 0);
+ if (af_packet_mode) {
+ sock->af_packet_mode =
+ qapi_enum_parse(&ChardevSocketAfPacketMode_lookup,
+ af_packet_mode, -1, errp);
+ if (*errp) {
+ return;
+ }
+ sock->has_af_packet_mode = true;
+ }
sock->tls_creds = g_strdup(qemu_opt_get(opts, "tls-creds"));
sock->tls_authz = g_strdup(qemu_opt_get(opts, "tls-authz"));
diff --git a/chardev/char.c b/chardev/char.c
index 3e432195a5..39bb0d5b68 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -910,6 +910,9 @@ QemuOptsList qemu_chardev_opts = {
},{
.name = "websocket",
.type = QEMU_OPT_BOOL,
+ },{
+ .name = "af-packet-mode",
+ .type = QEMU_OPT_STRING,
},{
.name = "width",
.type = QEMU_OPT_NUMBER,
diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
index d6d13ad37f..8af8af6cf8 100644
--- a/include/chardev/char-socket.h
+++ b/include/chardev/char-socket.h
@@ -63,6 +63,19 @@ struct SocketChardev {
int *write_msgfds;
size_t write_msgfds_num;
bool registered_yank;
+ bool is_af_packet;
+ bool af_packet_mode_set;
+ ChardevSocketAfPacketMode af_packet_mode;
+ uint8_t *af_packet_buf;
+ size_t af_packet_buf_size;
+ size_t af_packet_buf_len;
+ size_t af_packet_buf_offset;
+ uint8_t *af_packet_send_buf;
+ size_t af_packet_send_buf_size;
+ size_t af_packet_send_len;
+ size_t af_packet_send_offset;
+ uint8_t af_packet_send_len_buf[sizeof(uint32_t)];
+ size_t af_packet_send_len_bytes;
SocketAddress *addr;
bool is_listen;
diff --git a/qapi/char.json b/qapi/char.json
index 140614f82c..61a785727d 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -237,6 +237,22 @@
'data': { 'device': 'str' },
'base': 'ChardevCommon' }
+##
+# @ChardevSocketAfPacketMode:
+#
+# AF_PACKET fd mode for socket chardevs.
+#
+# @capture: use recvfrom() to capture raw L2 frames and forward them
+# through the existing chardev packet framing
+#
+# @inject: use sendmsg() to inject framed chardev packets back as raw
+# L2 frames
+#
+# Since: 10.1
+##
+{ 'enum': 'ChardevSocketAfPacketMode',
+ 'data': [ 'capture', 'inject' ] }
+
##
# @ChardevSocket:
#
@@ -274,6 +290,10 @@
# Setting this to zero disables this function.
# (default: 0) (Since: 9.2)
#
+# @af-packet-mode: when @addr is an fd that refers to an AF_PACKET
+# socket, use raw packet capture or inject instead of stream
+# socket I/O (Since: 10.1)
+#
# Since: 1.4
##
{ 'struct': 'ChardevSocket',
@@ -286,7 +306,8 @@
'*telnet': 'bool',
'*tn3270': 'bool',
'*websocket': 'bool',
- '*reconnect-ms': 'int' },
+ '*reconnect-ms': 'int',
+ '*af-packet-mode': 'ChardevSocketAfPacketMode' },
'base': 'ChardevCommon' }
##
diff --git a/qemu-options.hx b/qemu-options.hx
index fca2b7bc74..0e3ce7f493 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4062,7 +4062,7 @@ The available backends are:
A void device. This device will not emit any data, and will drop any
data it receives. The null backend does not take any options.
-``-chardev socket,id=id[,TCP options or unix options][,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect-ms=milliseconds][,tls-creds=id][,tls-authz=id]``
+``-chardev socket,id=id[,TCP options or unix options][,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect-ms=milliseconds][,tls-creds=id][,tls-authz=id][,af-packet-mode=capture|inject]``
Create a two-way stream socket, which can be either a TCP or a unix
socket. A unix socket will be created if ``path`` is specified.
Behaviour is undefined if TCP options are specified for a unix
@@ -4095,6 +4095,9 @@ The available backends are:
deleted and recreated on the fly while the chardev server is active.
If missing, it will default to denying access.
+ ``af-packet-mode=capture|inject`` is only valid with ``fd=...`` when
+ the provided file descriptor is an AF_PACKET socket.
+
TCP and unix socket options are given below:
``TCP options: port=port[,host=host][,to=to][,ipv4=on|off][,ipv6=on|off][,nodelay=on|off]``
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
2026-04-07 5:05 ` [RFC v4 1/5] net/filter: allow filters on vhost netdevs Cindy Lu
2026-04-07 5:05 ` [RFC v4 2/5] chardev/socket: add AF_PACKET initialization Cindy Lu
@ 2026-04-07 5:05 ` Cindy Lu
2026-04-08 12:00 ` Daniel P. Berrangé
2026-04-07 5:05 ` [RFC v4 4/5] chardev/socket: add AF_PACKET inject path Cindy Lu
` (2 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
When -chardev socket,fd=... is handed an AF_PACKET socket,
getpeername() can fail with EOPNOTSUPP instead of ENOTCONN because
packet sockets are not connection-oriented. qio_channel_socket_set_fd()
currently treats that as fatal and refuses to wrap the fd, even though
getsockname() and the local address are still valid.
Treat EOPNOTSUPP the same way as ENOTCONN and leave remoteAddr empty.
That keeps existing stream-socket behavior unchanged while allowing
AF_PACKET fds to be adopted by QIOChannelSocket.
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
io/channel-socket.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 3053b35ad8..2ed26aefa3 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -115,7 +115,11 @@ qio_channel_socket_set_fd(QIOChannelSocket *sioc,
if (getpeername(fd, (struct sockaddr *)&sioc->remoteAddr,
&sioc->remoteAddrLen) < 0) {
- if (errno == ENOTCONN) {
+ if (errno == ENOTCONN
+#ifdef EOPNOTSUPP
+ || errno == EOPNOTSUPP
+#endif
+ ) {
memset(&sioc->remoteAddr, 0, sizeof(sioc->remoteAddr));
sioc->remoteAddrLen = sizeof(sioc->remoteAddr);
} else {
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC v4 4/5] chardev/socket: add AF_PACKET inject path
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
` (2 preceding siblings ...)
2026-04-07 5:05 ` [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername Cindy Lu
@ 2026-04-07 5:05 ` Cindy Lu
2026-04-08 12:07 ` Daniel P. Berrangé
2026-04-07 5:05 ` [RFC v4 5/5] chardev/socket: add AF_PACKET capture path Cindy Lu
2026-04-08 12:16 ` [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Daniel P. Berrangé
5 siblings, 1 reply; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
Add the AF_PACKET inject write path for socket chardevs. When a socket
backend is opened with af-packet-mode=inject, tcp_chr_write() no longer
sends the redirector stream framing through QIOChannel. Instead it
parses the existing 4-byte length header, accumulates one complete
packet, and frame on the AF_PACKET fd.
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
chardev/char-socket.c | 148 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 148 insertions(+)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index c710fdb497..45d06fda8f 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -108,11 +108,159 @@ static void tcp_chr_accept(QIONetListener *listener,
static int tcp_chr_read_poll(void *opaque);
static void tcp_chr_disconnect_locked(Chardev *chr);
+#define TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE 65536
+
+static bool tcp_chr_uses_af_packet_inject(SocketChardev *s)
+{
+ return s->is_af_packet &&
+ s->af_packet_mode_set &&
+ s->af_packet_mode == CHARDEV_SOCKET_AF_PACKET_MODE_INJECT;
+}
+
+static ssize_t tcp_chr_send_af_packet(SocketChardev *s,
+ const uint8_t *buf,
+ size_t len)
+{
+#ifdef CONFIG_LINUX
+ struct iovec iov = {
+ .iov_base = (void *)buf,
+ .iov_len = len,
+ };
+ struct msghdr msg = {
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ };
+ ssize_t ret;
+
+ if (!s->sioc || s->sioc->localAddr.ss_family != AF_PACKET) {
+ errno = ENOTSOCK;
+ return -1;
+ }
+
+ do {
+ ret = sendmsg(s->sioc->fd, &msg, 0);
+ } while (ret < 0 && errno == EINTR);
+
+ return ret;
+#else
+ errno = EPROTONOSUPPORT;
+ return -1;
+#endif
+}
+
+static bool tcp_chr_af_packet_prepare_send(SocketChardev *s, uint32_t frame_len)
+{
+ if (frame_len > TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE) {
+ errno = EMSGSIZE;
+ return false;
+ }
+
+ if (frame_len == 0) {
+ s->af_packet_send_len = 0;
+ s->af_packet_send_offset = 0;
+ s->af_packet_send_len_bytes = 0;
+ return true;
+ }
+
+ if (s->af_packet_send_buf_size < frame_len) {
+ s->af_packet_send_buf = g_realloc(s->af_packet_send_buf, frame_len);
+ s->af_packet_send_buf_size = frame_len;
+ }
+
+ s->af_packet_send_len = frame_len;
+ s->af_packet_send_offset = 0;
+ s->af_packet_send_len_bytes = sizeof(s->af_packet_send_len_buf);
+ return true;
+}
+
+static int tcp_chr_inject_af_packet(Chardev *chr,
+ SocketChardev *s,
+ const uint8_t *buf,
+ int len)
+{
+ size_t offset = 0;
+ uint32_t frame_len_be;
+
+ while (offset < len) {
+ size_t copy;
+
+ if (s->af_packet_send_len_bytes < sizeof(s->af_packet_send_len_buf)) {
+ copy = MIN(sizeof(s->af_packet_send_len_buf) -
+ s->af_packet_send_len_bytes,
+ (size_t)len - offset);
+ memcpy(s->af_packet_send_len_buf + s->af_packet_send_len_bytes,
+ buf + offset, copy);
+ s->af_packet_send_len_bytes += copy;
+ offset += copy;
+
+ if (s->af_packet_send_len_bytes <
+ sizeof(s->af_packet_send_len_buf)) {
+ continue;
+ }
+
+ memcpy(&frame_len_be, s->af_packet_send_len_buf,
+ sizeof(frame_len_be));
+ if (!tcp_chr_af_packet_prepare_send(s, ntohl(frame_len_be))) {
+ return -1;
+ }
+ if (s->af_packet_send_len == 0) {
+ continue;
+ }
+ }
+
+ copy = MIN(s->af_packet_send_len - s->af_packet_send_offset,
+ (size_t)len - offset);
+ memcpy(s->af_packet_send_buf + s->af_packet_send_offset,
+ buf + offset, copy);
+ s->af_packet_send_offset += copy;
+ offset += copy;
+
+ if (s->af_packet_send_offset == s->af_packet_send_len) {
+ ssize_t ret;
+
+ ret = tcp_chr_send_af_packet(s, s->af_packet_send_buf,
+ s->af_packet_send_len);
+
+ if (ret < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ return -1;
+ }
+ if (tcp_chr_read_poll(chr) <= 0) {
+ trace_chr_socket_poll_err(chr, chr->label);
+ tcp_chr_disconnect_locked(chr);
+ }
+ return -1;
+ }
+
+ if (ret != (ssize_t)s->af_packet_send_len) {
+ if (ret >= 0) {
+ errno = EIO;
+ }
+ if (tcp_chr_read_poll(chr) <= 0) {
+ trace_chr_socket_poll_err(chr, chr->label);
+ tcp_chr_disconnect_locked(chr);
+ }
+ return -1;
+ }
+
+ s->af_packet_send_len = 0;
+ s->af_packet_send_offset = 0;
+ s->af_packet_send_len_bytes = 0;
+ }
+ }
+
+ return len;
+}
+
/* Called with chr_write_lock held. */
static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
{
SocketChardev *s = SOCKET_CHARDEV(chr);
+ if (tcp_chr_uses_af_packet_inject(s)) {
+ return tcp_chr_inject_af_packet(chr, s, buf, len);
+ }
+
if (s->state == TCP_CHARDEV_STATE_CONNECTED) {
int ret = io_channel_send_full(s->ioc, buf, len,
s->write_msgfds,
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC v4 5/5] chardev/socket: add AF_PACKET capture path
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
` (3 preceding siblings ...)
2026-04-07 5:05 ` [RFC v4 4/5] chardev/socket: add AF_PACKET inject path Cindy Lu
@ 2026-04-07 5:05 ` Cindy Lu
2026-04-08 12:13 ` Daniel P. Berrangé
2026-04-08 12:16 ` [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Daniel P. Berrangé
5 siblings, 1 reply; 10+ messages in thread
From: Cindy Lu @ 2026-04-07 5:05 UTC (permalink / raw)
To: lulu, mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
Add the AF_PACKET capture read path for socket chardevs. When opened
with af-packet-mode=capture, the read side drains raw frames with
recvfrom(), keeps only PACKET_OUTGOING traffic, and feeds the result
through the normal chardev frontend interface.
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
chardev/char-socket.c | 133 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 131 insertions(+), 2 deletions(-)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 45d06fda8f..76a51a853d 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -107,9 +107,17 @@ static void tcp_chr_accept(QIONetListener *listener,
static int tcp_chr_read_poll(void *opaque);
static void tcp_chr_disconnect_locked(Chardev *chr);
+static void tcp_chr_deliver_af_packet(Chardev *chr);
#define TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE 65536
+static bool
+tcp_chr_uses_af_packet_capture(SocketChardev *s)
+{
+ return s->is_af_packet && s->af_packet_mode_set &&
+ s->af_packet_mode == CHARDEV_SOCKET_AF_PACKET_MODE_CAPTURE;
+}
+
static bool tcp_chr_uses_af_packet_inject(SocketChardev *s)
{
return s->is_af_packet &&
@@ -300,6 +308,9 @@ static int tcp_chr_read_poll(void *opaque)
return 0;
}
s->max_size = qemu_chr_be_can_write(chr);
+ if (tcp_chr_uses_af_packet_capture(s) && s->af_packet_buf_len) {
+ tcp_chr_deliver_af_packet(chr);
+ }
return s->max_size;
}
@@ -500,6 +511,98 @@ static void tcp_chr_reset_af_packet_send(SocketChardev *s)
s->af_packet_send_len_bytes = 0;
}
+/* Push buffered AF_PACKET capture data into the chardev frontend. */
+static void
+tcp_chr_deliver_af_packet(Chardev *chr)
+{
+ SocketChardev *s = SOCKET_CHARDEV(chr);
+
+ while (s->max_size > 0 && s->af_packet_buf_offset < s->af_packet_buf_len) {
+ size_t remaining = s->af_packet_buf_len - s->af_packet_buf_offset;
+ size_t chunk = MIN((size_t)s->max_size, remaining);
+
+ qemu_chr_be_write(chr, s->af_packet_buf + s->af_packet_buf_offset,
+ (int)chunk);
+ s->af_packet_buf_offset += chunk;
+ s->max_size = qemu_chr_be_can_write(chr);
+ }
+
+ if (s->af_packet_buf_offset == s->af_packet_buf_len) {
+ tcp_chr_reset_af_packet_buf(s);
+ }
+}
+
+/* Copy buffered AF_PACKET capture data into a synchronous read buffer. */
+static int tcp_chr_copy_af_packet_buf(SocketChardev *s, uint8_t *buf,
+ int len) {
+ size_t remaining = s->af_packet_buf_len - s->af_packet_buf_offset;
+ size_t copied = MIN((size_t)len, remaining);
+
+ memcpy(buf, s->af_packet_buf + s->af_packet_buf_offset, copied);
+ s->af_packet_buf_offset += copied;
+
+ if (s->af_packet_buf_offset == s->af_packet_buf_len) {
+ tcp_chr_reset_af_packet_buf(s);
+ }
+
+ return (int)copied;
+}
+
+static ssize_t
+tcp_chr_capture_af_packet(Chardev *chr)
+{
+#ifdef CONFIG_LINUX
+ SocketChardev *s = SOCKET_CHARDEV(chr);
+ struct sockaddr_ll sll;
+ socklen_t sll_len;
+ ssize_t size;
+ uint32_t len;
+
+ if (!tcp_chr_uses_af_packet_capture(s)) {
+ errno = EIO;
+ return -1;
+ }
+
+ if (s->af_packet_buf_size <
+ sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE) {
+ s->af_packet_buf =
+ g_realloc(s->af_packet_buf,
+ sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE);
+ s->af_packet_buf_size =
+ sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE;
+ }
+
+ for (;;) {
+ sll_len = sizeof(sll);
+ do {
+ size = recvfrom(s->sioc->fd, s->af_packet_buf + sizeof(len),
+ TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE, 0,
+ (struct sockaddr *)&sll, &sll_len);
+ } while (size < 0 && errno == EINTR);
+
+ if (size <= 0) {
+ if (size < 0 && errno != EAGAIN && errno != EWOULDBLOCK) {
+ trace_chr_socket_recv_err(chr, chr->label, g_strerror(errno));
+ }
+ return size;
+ }
+
+ if (sll.sll_pkttype != PACKET_OUTGOING) {
+ continue;
+ }
+
+ len = htonl(size);
+ memcpy(s->af_packet_buf, &len, sizeof(len));
+ s->af_packet_buf_len = sizeof(len) + size;
+ s->af_packet_buf_offset = 0;
+ return (ssize_t)s->af_packet_buf_len;
+ }
+#else
+ errno = EPROTONOSUPPORT;
+ return -1;
+#endif
+}
+
static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
{
SocketChardev *s = SOCKET_CHARDEV(chr);
@@ -682,6 +785,22 @@ static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
if (len > s->max_size) {
len = s->max_size;
}
+ if (tcp_chr_uses_af_packet_capture(s)) {
+ tcp_chr_deliver_af_packet(chr);
+ if (s->max_size <= 0 || s->af_packet_buf_len) {
+ return TRUE;
+ }
+
+ size = tcp_chr_capture_af_packet(chr);
+ if (size == 0 || (size == -1 && errno != EAGAIN)) {
+ tcp_chr_disconnect(chr);
+ } else if (size > 0) {
+ tcp_chr_deliver_af_packet(chr);
+ }
+
+ return TRUE;
+ }
+
size = tcp_chr_recv(chr, (void *)buf, len);
if (size == 0 || (size == -1 && errno != EAGAIN)) {
/* connection closed */
@@ -715,6 +834,10 @@ static int tcp_chr_sync_read(Chardev *chr, const uint8_t *buf, int len)
int saved_errno;
Error *local_err = NULL;
+ if (tcp_chr_uses_af_packet_capture(s) && s->af_packet_buf_len) {
+ return tcp_chr_copy_af_packet_buf(s, (uint8_t *)buf, len);
+ }
+
if (s->state != TCP_CHARDEV_STATE_CONNECTED) {
return 0;
}
@@ -723,7 +846,14 @@ static int tcp_chr_sync_read(Chardev *chr, const uint8_t *buf, int len)
error_report_err(local_err);
return -1;
}
- size = tcp_chr_recv(chr, (void *) buf, len);
+ if (tcp_chr_uses_af_packet_capture(s)) {
+ size = tcp_chr_capture_af_packet(chr);
+ if (size > 0) {
+ size = tcp_chr_copy_af_packet_buf(s, (uint8_t *)buf, len);
+ }
+ } else {
+ size = tcp_chr_recv(chr, (void *)buf, len);
+ }
saved_errno = errno;
if (s->state != TCP_CHARDEV_STATE_DISCONNECTED) {
if (!qio_channel_set_blocking(s->ioc, false, &local_err)) {
@@ -1448,7 +1578,6 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
return false;
}
-
static int qmp_chardev_open_socket_server(Chardev *chr,
bool is_telnet,
bool is_waitconnect,
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername
2026-04-07 5:05 ` [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername Cindy Lu
@ 2026-04-08 12:00 ` Daniel P. Berrangé
0 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2026-04-08 12:00 UTC (permalink / raw)
To: Cindy Lu; +Cc: mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
On Tue, Apr 07, 2026 at 01:05:50PM +0800, Cindy Lu wrote:
> When -chardev socket,fd=... is handed an AF_PACKET socket,
> getpeername() can fail with EOPNOTSUPP instead of ENOTCONN because
> packet sockets are not connection-oriented. qio_channel_socket_set_fd()
> currently treats that as fatal and refuses to wrap the fd, even though
> getsockname() and the local address are still valid.
>
> Treat EOPNOTSUPP the same way as ENOTCONN and leave remoteAddr empty.
> That keeps existing stream-socket behavior unchanged while allowing
> AF_PACKET fds to be adopted by QIOChannelSocket.
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
> io/channel-socket.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 3053b35ad8..2ed26aefa3 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -115,7 +115,11 @@ qio_channel_socket_set_fd(QIOChannelSocket *sioc,
>
> if (getpeername(fd, (struct sockaddr *)&sioc->remoteAddr,
> &sioc->remoteAddrLen) < 0) {
> - if (errno == ENOTCONN) {
> + if (errno == ENOTCONN
> +#ifdef EOPNOTSUPP
> + || errno == EOPNOTSUPP
> +#endif
Why conditionalize it ? We use this unconditionally throughout
QEMU code AFAICS, even on Windows.
> + ) {
> memset(&sioc->remoteAddr, 0, sizeof(sioc->remoteAddr));
> sioc->remoteAddrLen = sizeof(sioc->remoteAddr);
> } else {
> --
> 2.52.0
>
>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC v4 4/5] chardev/socket: add AF_PACKET inject path
2026-04-07 5:05 ` [RFC v4 4/5] chardev/socket: add AF_PACKET inject path Cindy Lu
@ 2026-04-08 12:07 ` Daniel P. Berrangé
0 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2026-04-08 12:07 UTC (permalink / raw)
To: Cindy Lu; +Cc: mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
On Tue, Apr 07, 2026 at 01:05:51PM +0800, Cindy Lu wrote:
> Add the AF_PACKET inject write path for socket chardevs. When a socket
> backend is opened with af-packet-mode=inject, tcp_chr_write() no longer
> sends the redirector stream framing through QIOChannel. Instead it
> parses the existing 4-byte length header, accumulates one complete
> packet, and frame on the AF_PACKET fd.
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
> chardev/char-socket.c | 148 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 148 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index c710fdb497..45d06fda8f 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -108,11 +108,159 @@ static void tcp_chr_accept(QIONetListener *listener,
> static int tcp_chr_read_poll(void *opaque);
> static void tcp_chr_disconnect_locked(Chardev *chr);
>
> +#define TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE 65536
> +
> +static bool tcp_chr_uses_af_packet_inject(SocketChardev *s)
> +{
> + return s->is_af_packet &&
> + s->af_packet_mode_set &&
> + s->af_packet_mode == CHARDEV_SOCKET_AF_PACKET_MODE_INJECT;
> +}
> +
> +static ssize_t tcp_chr_send_af_packet(SocketChardev *s,
> + const uint8_t *buf,
> + size_t len)
> +{
> +#ifdef CONFIG_LINUX
> + struct iovec iov = {
> + .iov_base = (void *)buf,
> + .iov_len = len,
> + };
> + struct msghdr msg = {
> + .msg_iov = &iov,
> + .msg_iovlen = 1,
> + };
> + ssize_t ret;
> +
> + if (!s->sioc || s->sioc->localAddr.ss_family != AF_PACKET) {
> + errno = ENOTSOCK;
> + return -1;
> + }
> +
> + do {
> + ret = sendmsg(s->sioc->fd, &msg, 0);
> + } while (ret < 0 && errno == EINTR);
> +
> + return ret;
> +#else
> + errno = EPROTONOSUPPORT;
> + return -1;
> +#endif
> +}
> +
> +static bool tcp_chr_af_packet_prepare_send(SocketChardev *s, uint32_t frame_len)
> +{
> + if (frame_len > TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE) {
> + errno = EMSGSIZE;
> + return false;
> + }
> +
> + if (frame_len == 0) {
> + s->af_packet_send_len = 0;
> + s->af_packet_send_offset = 0;
> + s->af_packet_send_len_bytes = 0;
> + return true;
> + }
> +
> + if (s->af_packet_send_buf_size < frame_len) {
> + s->af_packet_send_buf = g_realloc(s->af_packet_send_buf, frame_len);
> + s->af_packet_send_buf_size = frame_len;
> + }
> +
> + s->af_packet_send_len = frame_len;
> + s->af_packet_send_offset = 0;
> + s->af_packet_send_len_bytes = sizeof(s->af_packet_send_len_buf);
> + return true;
> +}
> +
> +static int tcp_chr_inject_af_packet(Chardev *chr,
> + SocketChardev *s,
> + const uint8_t *buf,
> + int len)
> +{
> + size_t offset = 0;
> + uint32_t frame_len_be;
> +
> + while (offset < len) {
> + size_t copy;
> +
> + if (s->af_packet_send_len_bytes < sizeof(s->af_packet_send_len_buf)) {
> + copy = MIN(sizeof(s->af_packet_send_len_buf) -
> + s->af_packet_send_len_bytes,
> + (size_t)len - offset);
> + memcpy(s->af_packet_send_len_buf + s->af_packet_send_len_bytes,
> + buf + offset, copy);
> + s->af_packet_send_len_bytes += copy;
> + offset += copy;
> +
> + if (s->af_packet_send_len_bytes <
> + sizeof(s->af_packet_send_len_buf)) {
> + continue;
> + }
> +
> + memcpy(&frame_len_be, s->af_packet_send_len_buf,
> + sizeof(frame_len_be));
> + if (!tcp_chr_af_packet_prepare_send(s, ntohl(frame_len_be))) {
> + return -1;
> + }
> + if (s->af_packet_send_len == 0) {
> + continue;
> + }
> + }
> +
> + copy = MIN(s->af_packet_send_len - s->af_packet_send_offset,
> + (size_t)len - offset);
> + memcpy(s->af_packet_send_buf + s->af_packet_send_offset,
> + buf + offset, copy);
> + s->af_packet_send_offset += copy;
> + offset += copy;
> +
> + if (s->af_packet_send_offset == s->af_packet_send_len) {
> + ssize_t ret;
> +
> + ret = tcp_chr_send_af_packet(s, s->af_packet_send_buf,
> + s->af_packet_send_len);
> +
> + if (ret < 0) {
> + if (errno == EAGAIN || errno == EWOULDBLOCK) {
> + return -1;
> + }
> + if (tcp_chr_read_poll(chr) <= 0) {
> + trace_chr_socket_poll_err(chr, chr->label);
> + tcp_chr_disconnect_locked(chr);
> + }
> + return -1;
> + }
> +
> + if (ret != (ssize_t)s->af_packet_send_len) {
> + if (ret >= 0) {
> + errno = EIO;
> + }
> + if (tcp_chr_read_poll(chr) <= 0) {
> + trace_chr_socket_poll_err(chr, chr->label);
> + tcp_chr_disconnect_locked(chr);
> + }
> + return -1;
> + }
> +
> + s->af_packet_send_len = 0;
> + s->af_packet_send_offset = 0;
> + s->af_packet_send_len_bytes = 0;
> + }
> + }
> +
> + return len;
> +}
> +
> /* Called with chr_write_lock held. */
> static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
> {
> SocketChardev *s = SOCKET_CHARDEV(chr);
>
> + if (tcp_chr_uses_af_packet_inject(s)) {
> + return tcp_chr_inject_af_packet(chr, s, buf, len);
> + }
> +
This code is pretty unpleasant, completely bypassing all of the
normal I/O path in the chardev, and completely ignoring the
QIOChannel too, just poking the socket directly. Essentially
this shares nothing in common with the socket chardev functionality.
If we do want to have AF_PACKET support in the socket chardev then
IMHO all this buffer parsing code needs to be in the netfilter
layer instead. The chardev should just accept a single packet
buffer at a time, such that it can directly pass it on to the
normal qio_channel_write API which will call sendmsg.
> if (s->state == TCP_CHARDEV_STATE_CONNECTED) {
> int ret = io_channel_send_full(s->ioc, buf, len,
> s->write_msgfds,
> --
> 2.52.0
>
>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC v4 5/5] chardev/socket: add AF_PACKET capture path
2026-04-07 5:05 ` [RFC v4 5/5] chardev/socket: add AF_PACKET capture path Cindy Lu
@ 2026-04-08 12:13 ` Daniel P. Berrangé
0 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2026-04-08 12:13 UTC (permalink / raw)
To: Cindy Lu; +Cc: mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
On Tue, Apr 07, 2026 at 01:05:52PM +0800, Cindy Lu wrote:
> Add the AF_PACKET capture read path for socket chardevs. When opened
> with af-packet-mode=capture, the read side drains raw frames with
> recvfrom(), keeps only PACKET_OUTGOING traffic, and feeds the result
> through the normal chardev frontend interface.
>
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> ---
> chardev/char-socket.c | 133 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 131 insertions(+), 2 deletions(-)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 45d06fda8f..76a51a853d 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -107,9 +107,17 @@ static void tcp_chr_accept(QIONetListener *listener,
>
> static int tcp_chr_read_poll(void *opaque);
> static void tcp_chr_disconnect_locked(Chardev *chr);
> +static void tcp_chr_deliver_af_packet(Chardev *chr);
>
> #define TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE 65536
>
> +static bool
> +tcp_chr_uses_af_packet_capture(SocketChardev *s)
> +{
> + return s->is_af_packet && s->af_packet_mode_set &&
> + s->af_packet_mode == CHARDEV_SOCKET_AF_PACKET_MODE_CAPTURE;
> +}
> +
> static bool tcp_chr_uses_af_packet_inject(SocketChardev *s)
> {
> return s->is_af_packet &&
> @@ -300,6 +308,9 @@ static int tcp_chr_read_poll(void *opaque)
> return 0;
> }
> s->max_size = qemu_chr_be_can_write(chr);
> + if (tcp_chr_uses_af_packet_capture(s) && s->af_packet_buf_len) {
> + tcp_chr_deliver_af_packet(chr);
> + }
> return s->max_size;
> }
>
> @@ -500,6 +511,98 @@ static void tcp_chr_reset_af_packet_send(SocketChardev *s)
> s->af_packet_send_len_bytes = 0;
> }
>
> +/* Push buffered AF_PACKET capture data into the chardev frontend. */
> +static void
> +tcp_chr_deliver_af_packet(Chardev *chr)
> +{
> + SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> + while (s->max_size > 0 && s->af_packet_buf_offset < s->af_packet_buf_len) {
> + size_t remaining = s->af_packet_buf_len - s->af_packet_buf_offset;
> + size_t chunk = MIN((size_t)s->max_size, remaining);
> +
> + qemu_chr_be_write(chr, s->af_packet_buf + s->af_packet_buf_offset,
> + (int)chunk);
> + s->af_packet_buf_offset += chunk;
> + s->max_size = qemu_chr_be_can_write(chr);
> + }
> +
> + if (s->af_packet_buf_offset == s->af_packet_buf_len) {
> + tcp_chr_reset_af_packet_buf(s);
> + }
> +}
> +
> +/* Copy buffered AF_PACKET capture data into a synchronous read buffer. */
> +static int tcp_chr_copy_af_packet_buf(SocketChardev *s, uint8_t *buf,
> + int len) {
> + size_t remaining = s->af_packet_buf_len - s->af_packet_buf_offset;
> + size_t copied = MIN((size_t)len, remaining);
> +
> + memcpy(buf, s->af_packet_buf + s->af_packet_buf_offset, copied);
> + s->af_packet_buf_offset += copied;
> +
> + if (s->af_packet_buf_offset == s->af_packet_buf_len) {
> + tcp_chr_reset_af_packet_buf(s);
> + }
> +
> + return (int)copied;
> +}
> +
> +static ssize_t
> +tcp_chr_capture_af_packet(Chardev *chr)
> +{
> +#ifdef CONFIG_LINUX
> + SocketChardev *s = SOCKET_CHARDEV(chr);
> + struct sockaddr_ll sll;
> + socklen_t sll_len;
> + ssize_t size;
> + uint32_t len;
> +
> + if (!tcp_chr_uses_af_packet_capture(s)) {
> + errno = EIO;
> + return -1;
> + }
> +
> + if (s->af_packet_buf_size <
> + sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE) {
> + s->af_packet_buf =
> + g_realloc(s->af_packet_buf,
> + sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE);
> + s->af_packet_buf_size =
> + sizeof(len) + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE;
> + }
> +
> + for (;;) {
> + sll_len = sizeof(sll);
> + do {
> + size = recvfrom(s->sioc->fd, s->af_packet_buf + sizeof(len),
> + TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE, 0,
> + (struct sockaddr *)&sll, &sll_len);
> + } while (size < 0 && errno == EINTR);
> +
> + if (size <= 0) {
> + if (size < 0 && errno != EAGAIN && errno != EWOULDBLOCK) {
> + trace_chr_socket_recv_err(chr, chr->label, g_strerror(errno));
> + }
> + return size;
> + }
> +
> + if (sll.sll_pkttype != PACKET_OUTGOING) {
> + continue;
> + }
> +
> + len = htonl(size);
> + memcpy(s->af_packet_buf, &len, sizeof(len));
> + s->af_packet_buf_len = sizeof(len) + size;
> + s->af_packet_buf_offset = 0;
> + return (ssize_t)s->af_packet_buf_len;
> + }
> +#else
> + errno = EPROTONOSUPPORT;
> + return -1;
> +#endif
> +}
> +
> static GSource *tcp_chr_add_watch(Chardev *chr, GIOCondition cond)
> {
> SocketChardev *s = SOCKET_CHARDEV(chr);
> @@ -682,6 +785,22 @@ static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
> if (len > s->max_size) {
> len = s->max_size;
> }
> + if (tcp_chr_uses_af_packet_capture(s)) {
> + tcp_chr_deliver_af_packet(chr);
> + if (s->max_size <= 0 || s->af_packet_buf_len) {
> + return TRUE;
> + }
> +
> + size = tcp_chr_capture_af_packet(chr);
> + if (size == 0 || (size == -1 && errno != EAGAIN)) {
> + tcp_chr_disconnect(chr);
> + } else if (size > 0) {
> + tcp_chr_deliver_af_packet(chr);
> + }
> +
> + return TRUE;
> + }
> +
> size = tcp_chr_recv(chr, (void *)buf, len);
> if (size == 0 || (size == -1 && errno != EAGAIN)) {
> /* connection closed */
> @@ -715,6 +834,10 @@ static int tcp_chr_sync_read(Chardev *chr, const uint8_t *buf, int len)
> int saved_errno;
> Error *local_err = NULL;
>
> + if (tcp_chr_uses_af_packet_capture(s) && s->af_packet_buf_len) {
> + return tcp_chr_copy_af_packet_buf(s, (uint8_t *)buf, len);
> + }
> +
> if (s->state != TCP_CHARDEV_STATE_CONNECTED) {
> return 0;
> }
> @@ -723,7 +846,14 @@ static int tcp_chr_sync_read(Chardev *chr, const uint8_t *buf, int len)
> error_report_err(local_err);
> return -1;
> }
> - size = tcp_chr_recv(chr, (void *) buf, len);
> + if (tcp_chr_uses_af_packet_capture(s)) {
> + size = tcp_chr_capture_af_packet(chr);
> + if (size > 0) {
> + size = tcp_chr_copy_af_packet_buf(s, (uint8_t *)buf, len);
> + }
> + } else {
> + size = tcp_chr_recv(chr, (void *)buf, len);
> + }
Similarly to the send side, I don't really think we should have this
packet re-assembly logic in the chardev code. We should just be
calling the normal qio_channel_read APIs and let the netfilter code
re-assemble packets it gets from the chardev. Mostly it seems we
would use TCP_CHARDEV_AF_PACKET_MAX_FRAME_SIZE instead of CHR_READ_BUF_LEN
in the existing code paths.
> saved_errno = errno;
> if (s->state != TCP_CHARDEV_STATE_DISCONNECTED) {
> if (!qio_channel_set_blocking(s->ioc, false, &local_err)) {
> @@ -1448,7 +1578,6 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
> return false;
> }
>
> -
> static int qmp_chardev_open_socket_server(Chardev *chr,
> bool is_telnet,
> bool is_waitconnect,
> --
> 2.52.0
>
>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
` (4 preceding siblings ...)
2026-04-07 5:05 ` [RFC v4 5/5] chardev/socket: add AF_PACKET capture path Cindy Lu
@ 2026-04-08 12:16 ` Daniel P. Berrangé
5 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrangé @ 2026-04-08 12:16 UTC (permalink / raw)
To: Cindy Lu; +Cc: mst, jasowang, zhangckid, lizhijian, jmarcin, qemu-devel
On Tue, Apr 07, 2026 at 01:05:47PM +0800, Cindy Lu wrote:
> Hi, All
>
> This series wires AF_PACKET-backed packet capture and inject support into
> the existing socket chardev backend so filter-redirector can keep using exist
> process
>
> Example Usage
> =============
> Users are expected to create the AF_PACKET socket in userspace, bind it
> to the target tap device, and then pass the resulting fd to QEMU via
> the existing FD_PLACEHOLDER mechanism.
>
> Creating such a socket requires CAP_NET_RAW (or running as root). A
> typical setup looks like:
>
> sock = socket.socket(socket.AF_PACKET, socket.SOCK_RAW,
> socket.htons(ETH_P_ALL))
> sock.bind((ifname, ETH_P_ALL))
While FD passing is certainly desirable, and indeed required, for
libvirt to manage QEMU, IMHO, the QIOChannelSocket should be made
capable of opening AF_PACKET sockets explicitly too.
I generally only consider "FD" passing for QIOCHannelSocket to be
supported for address families that we can explicitly open - we
shouldn't have address familys that are only supported via FD
passing
>
> The bound fd can then be passed to QEMU with:
>
> -chardev socket,...,fd=${FD_PLACEHOLDER},...
>
> Primary VM (mirror incoming packets to secondary via chardev socket):
>
> -netdev "tap,id=net0,ifname=${TAP}...vhost=on"
> -device "${VIRTIO_NET_DEVICE}"
> -chardev "socket,id=chain_out,fd=${FD_PLACEHOLDER},af-packet-mode=capture"
> -chardev "socket,id=mirror0,host=${MIRROR_HOST},port=${MIRROR_PORT},reconnect-ms=${MIRROR_RECONNECT_MS}"
> -object "filter-redirector,id=r1,netdev=net0,queue=tx,indev=chain_out,status=on,vnet_hdr_support=off,position=head"
> -object "filter-redirector,id=r1_mirror,netdev=net0,queue=tx,outdev=mirror0,status=on,vnet_hdr_support=off,insert=behind"
>
> Secondary VM (receive mirrored packets):
>
> -netdev "tap,id=net0,ifname=${TAP}...vhost=on"
> -device "${VIRTIO_NET_DEVICE}"
> -chardev "socket,id=red0,host=${MIRROR_BIND_HOST},port=${MIRROR_PORT},server=on,wait=off"
> -chardev "socket,id=chain_in,fd=${FD_PLACEHOLDER},af-packet-mode=inject"
> -object "filter-redirector,id=r1,netdev=net0,queue=tx,indev=red0,status=off,vnet_hdr_support=off,position=head"
> -object "filter-redirector,id=r1_inject,netdev=net0,queue=tx,outdev=chain_in,status=off,vnet_hdr_support=off,position=id=r1,insert=behind"
>
>
> changset
> ===========
> change in v2:
> 1. add support for filter-buffer
> 2. remove the in_netdev and out_netdev for AF_PACKET bind port, now only use netdev
> when the vhost=on start use AF_PACKET to capture and inject, when use vhost=off will use
> the existing code
> 3. add CAP_NET_RAW check
> 4. address the comment
>
> change in v3:
> 1. reuse the exist Capture/inject process
>
> change in v4:
> 1.move the capture/inject to chardev
> 2.move the create/bind socket to user script
>
> Testing
> =======
> - Tested with vhost=on/off TAP device on x86_64
>
>
>
> Cindy Lu (5):
> net/filter: allow filters on vhost netdevs
> chardev/socket: add AF_PACKET initialization
> io/channel-socket: tolerate AF_PACKET getpeername
> chardev/socket: add AF_PACKET inject path
> chardev/socket: add AF_PACKET capture path
>
> chardev/char-socket.c | 385 +++++++++++++++++++++++++++++++++-
> chardev/char.c | 3 +
> include/chardev/char-socket.h | 13 ++
> io/channel-socket.c | 6 +-
> net/filter.c | 6 -
> qapi/char.json | 23 +-
> qemu-options.hx | 5 +-
> 7 files changed, 429 insertions(+), 12 deletions(-)
>
> --
> 2.52.0
>
>
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-08 19:32 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 5:05 [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Cindy Lu
2026-04-07 5:05 ` [RFC v4 1/5] net/filter: allow filters on vhost netdevs Cindy Lu
2026-04-07 5:05 ` [RFC v4 2/5] chardev/socket: add AF_PACKET initialization Cindy Lu
2026-04-07 5:05 ` [RFC v4 3/5] io/channel-socket: tolerate AF_PACKET getpeername Cindy Lu
2026-04-08 12:00 ` Daniel P. Berrangé
2026-04-07 5:05 ` [RFC v4 4/5] chardev/socket: add AF_PACKET inject path Cindy Lu
2026-04-08 12:07 ` Daniel P. Berrangé
2026-04-07 5:05 ` [RFC v4 5/5] chardev/socket: add AF_PACKET capture path Cindy Lu
2026-04-08 12:13 ` Daniel P. Berrangé
2026-04-08 12:16 ` [RFC v4 0/5] net/filter: Add AF_PACKET support for vhost-net Daniel P. Berrangé
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.