* [PATCH ipsec-next v7 12/14] xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
From: Antony Antony @ 2026-04-12 11:16 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add a new netlink method to migrate a single xfrm_state.
Unlike the existing migration mechanism (SA + policy), this
supports migrating only the SA and allows changing the reqid.
The SA is looked up via xfrm_usersa_id, which uniquely
identifies it, so old_saddr is not needed. old_daddr is carried in
xfrm_usersa_id.daddr.
The reqid is invariant in the old migration.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v6->v7: - add flags field to xfrm_user_migrate_state (based on Sabrina's feedback)
- add XFRM_MIGRATE_STATE_NO_OFFLOAD (bit 0): suppresses offload
- omit-to-inherit; mutually exclusive with XFRMA_OFFLOAD_DEV
- zero-initialize struct xfrm_migrate m[XFRM_MAX_DEPTH]
- add struct xfrm_selector new_sel to xfrm_user_migrate_state
- add XFRM_MIGRATE_STATE_UPDATE_SEL: derive new selector
from SA addresses when old selector is a single-host match
v5->v6: - (Feedback from Sabrina's review)
- reqid change: use xfrm_state_add, not xfrm_state_insert
- encap and xuo: use nla_data() directly, no kmemdup needed
- notification failure is non-fatal: set extack warning, return 0
- drop state direction, x->dir, check, not required
- reverse xmas tree local variable ordering
- use NL_SET_ERR_MSG_WEAK for clone failure message
- fix implicit padding in xfrm_user_migrate_state uapi struct
- support XFRMA_SET_MARK/XFRMA_SET_MARK_MASK in XFRM_MSG_MIGRATE_STATE
v4->v5: - set portid, seq in XFRM_MSG_MIGRATE_STATE netlink notification
- rename error label to out for clarity
- add locking and synchronize after cloning
- change some if(x) to if(!x) for clarity
- call __xfrm_state_delete() inside the lock
- return error from xfrm_send_migrate_state() instead of always returning 0
v3->v4: preserve reqid invariant for each state migrated
v2->v3: free the skb on the error path
v1->v2: merged next patch here to fix use uninitialized value
- removed unnecessary inline
- added const when possible
---
include/net/xfrm.h | 16 ++-
include/uapi/linux/xfrm.h | 21 ++++
net/xfrm/xfrm_device.c | 2 +-
net/xfrm/xfrm_policy.c | 19 +++
net/xfrm/xfrm_state.c | 29 +++--
net/xfrm/xfrm_user.c | 287 +++++++++++++++++++++++++++++++++++++++++++-
security/selinux/nlmsgtab.c | 3 +-
7 files changed, 363 insertions(+), 14 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 4b29ab92c2a7..e33e524cd909 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -684,12 +684,20 @@ struct xfrm_migrate {
xfrm_address_t new_saddr;
struct xfrm_encap_tmpl *encap;
struct xfrm_user_offload *xuo;
+ struct xfrm_mark old_mark;
+ struct xfrm_mark *new_mark;
+ struct xfrm_mark smark;
u8 proto;
u8 mode;
- u16 reserved;
+ u16 msg_type; /* XFRM_MSG_MIGRATE or XFRM_MSG_MIGRATE_STATE */
+ u32 flags;
u32 old_reqid;
+ u32 new_reqid;
+ u32 nat_keepalive_interval;
+ u32 mapping_maxage;
u16 old_family;
u16 new_family;
+ const struct xfrm_selector *new_sel;
};
#define XFRM_KM_TIMEOUT 30
@@ -2104,7 +2112,7 @@ void xfrm_dev_resume(struct sk_buff *skb);
void xfrm_dev_backlog(struct softnet_data *sd);
struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t features, bool *again);
int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
- struct xfrm_user_offload *xuo,
+ const struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp,
struct xfrm_user_offload *xuo, u8 dir,
@@ -2175,7 +2183,9 @@ static inline struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_fea
return skb;
}
-static inline int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, struct xfrm_user_offload *xuo, struct netlink_ext_ack *extack)
+static inline int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
+ const struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index a23495c0e0a1..34d8ad5c4818 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -227,6 +227,9 @@ enum {
#define XFRM_MSG_SETDEFAULT XFRM_MSG_SETDEFAULT
XFRM_MSG_GETDEFAULT,
#define XFRM_MSG_GETDEFAULT XFRM_MSG_GETDEFAULT
+
+ XFRM_MSG_MIGRATE_STATE,
+#define XFRM_MSG_MIGRATE_STATE XFRM_MSG_MIGRATE_STATE
__XFRM_MSG_MAX
};
#define XFRM_MSG_MAX (__XFRM_MSG_MAX - 1)
@@ -507,6 +510,24 @@ struct xfrm_user_migrate {
__u16 new_family;
};
+struct xfrm_user_migrate_state {
+ struct xfrm_usersa_id id;
+ xfrm_address_t new_daddr;
+ xfrm_address_t new_saddr;
+ struct xfrm_mark old_mark;
+ struct xfrm_selector new_sel;
+ __u32 new_reqid;
+ __u32 flags;
+ __u16 new_family;
+ __u16 reserved;
+};
+
+/* Flags for xfrm_user_migrate_state.flags */
+enum xfrm_migrate_state_flags {
+ XFRM_MIGRATE_STATE_NO_OFFLOAD = 1, /* do not inherit offload from existing SA */
+ XFRM_MIGRATE_STATE_UPDATE_SEL = 2, /* update host-to-host selector from saddr and daddr */
+};
+
struct xfrm_user_mapping {
struct xfrm_usersa_id id;
__u32 reqid;
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 52ae0e034d29..9d4c1addb98f 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -229,7 +229,7 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
EXPORT_SYMBOL_GPL(validate_xmit_xfrm);
int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
- struct xfrm_user_offload *xuo,
+ const struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
int err;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 0b5c7b51183a..3d6c778d8645 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4635,6 +4635,22 @@ static int xfrm_migrate_check(const struct xfrm_migrate *m, int num_migrate,
return 0;
}
+/*
+ * Fill migrate fields that are invariant in XFRM_MSG_MIGRATE: inherited
+ * from the existing SA unchanged. XFRM_MSG_MIGRATE_STATE can update these.
+ */
+static void xfrm_migrate_copy_old(struct xfrm_migrate *mp,
+ const struct xfrm_state *x,
+ struct xfrm_mark *new_mark_buf)
+{
+ mp->smark = x->props.smark;
+ mp->new_reqid = x->props.reqid;
+ mp->nat_keepalive_interval = x->nat_keepalive_interval;
+ mp->mapping_maxage = x->mapping_maxage;
+ *new_mark_buf = x->mark;
+ mp->new_mark = new_mark_buf;
+}
+
int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_migrate,
struct xfrm_kmaddress *k, struct net *net,
@@ -4642,6 +4658,7 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct netlink_ext_ack *extack, struct xfrm_user_offload *xuo)
{
int i, err, nx_cur = 0, nx_new = 0;
+ struct xfrm_mark new_marks[XFRM_MAX_DEPTH] = {};
struct xfrm_policy *pol = NULL;
struct xfrm_state *x, *xc;
struct xfrm_state *x_cur[XFRM_MAX_DEPTH];
@@ -4674,6 +4691,8 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
nx_cur++;
mp->encap = encap;
mp->xuo = xuo;
+ xfrm_migrate_copy_old(mp, x, &new_marks[i]);
+
xc = xfrm_state_migrate(x, mp, net, extack);
if (xc) {
x_new[nx_new] = xc;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1ee114f8515d..25d54c44fd94 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1974,11 +1974,25 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
goto out;
memcpy(&x->id, &orig->id, sizeof(x->id));
- memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ if (m->msg_type == XFRM_MSG_MIGRATE_STATE) {
+ if (m->flags & XFRM_MIGRATE_STATE_UPDATE_SEL) {
+ u8 prefixlen = (m->new_family == AF_INET6) ? 128 : 32;
+
+ memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ x->sel.family = m->new_family;
+ x->sel.prefixlen_d = prefixlen;
+ x->sel.prefixlen_s = prefixlen;
+ memcpy(&x->sel.daddr, &m->new_daddr, sizeof(x->sel.daddr));
+ memcpy(&x->sel.saddr, &m->new_saddr, sizeof(x->sel.saddr));
+ } else {
+ x->sel = *m->new_sel;
+ }
+ } else {
+ memcpy(&x->sel, &orig->sel, sizeof(x->sel));
+ }
memcpy(&x->lft, &orig->lft, sizeof(x->lft));
x->props.mode = orig->props.mode;
x->props.replay_window = orig->props.replay_window;
- x->props.reqid = orig->props.reqid;
if (orig->aalg) {
x->aalg = xfrm_algo_auth_clone(orig->aalg);
@@ -2011,8 +2025,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->encap = kmemdup(m->encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
- x->mapping_maxage = orig->mapping_maxage;
- x->nat_keepalive_interval = orig->nat_keepalive_interval;
+ x->mapping_maxage = m->mapping_maxage;
+ x->nat_keepalive_interval = m->nat_keepalive_interval;
}
if (orig->security)
@@ -2029,8 +2043,9 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
if (xfrm_replay_clone(x, orig))
goto error;
- memcpy(&x->mark, &orig->mark, sizeof(x->mark));
- memcpy(&x->props.smark, &orig->props.smark, sizeof(x->props.smark));
+ x->mark = m->new_mark ? *m->new_mark : m->old_mark;
+
+ x->props.smark = m->smark;
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
@@ -2053,7 +2068,7 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
goto error;
}
-
+ x->props.reqid = m->new_reqid;
x->props.family = m->new_family;
memcpy(&x->id.daddr, &m->new_daddr, sizeof(x->id.daddr));
memcpy(&x->props.saddr, &m->new_saddr, sizeof(x->props.saddr));
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index fe0cf824f072..46e506548122 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1318,7 +1318,7 @@ static int copy_to_user_encap(struct xfrm_encap_tmpl *ep, struct sk_buff *skb)
return 0;
}
-static int xfrm_smark_put(struct sk_buff *skb, struct xfrm_mark *m)
+static int xfrm_smark_put(struct sk_buff *skb, const struct xfrm_mark *m)
{
int ret = 0;
@@ -3059,6 +3059,25 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
}
#ifdef CONFIG_XFRM_MIGRATE
+static void copy_from_user_migrate_state(struct xfrm_migrate *ma,
+ const struct xfrm_user_migrate_state *um)
+{
+ memcpy(&ma->old_daddr, &um->id.daddr, sizeof(ma->old_daddr));
+ memcpy(&ma->new_daddr, &um->new_daddr, sizeof(ma->new_daddr));
+ memcpy(&ma->new_saddr, &um->new_saddr, sizeof(ma->new_saddr));
+
+ ma->proto = um->id.proto;
+ ma->new_reqid = um->new_reqid;
+
+ ma->old_family = um->id.family;
+ ma->new_family = um->new_family;
+
+ ma->old_mark = um->old_mark;
+ ma->flags = um->flags;
+ ma->new_sel = &um->new_sel;
+ ma->msg_type = XFRM_MSG_MIGRATE_STATE;
+}
+
static int copy_from_user_migrate(struct xfrm_migrate *ma,
struct xfrm_kmaddress *k,
struct nlattr **attrs, int *num,
@@ -3098,6 +3117,7 @@ static int copy_from_user_migrate(struct xfrm_migrate *ma,
ma->old_family = um->old_family;
ma->new_family = um->new_family;
+ ma->msg_type = XFRM_MSG_MIGRATE;
}
*num = i;
@@ -3108,7 +3128,7 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs, struct netlink_ext_ack *extack)
{
struct xfrm_userpolicy_id *pi = nlmsg_data(nlh);
- struct xfrm_migrate m[XFRM_MAX_DEPTH];
+ struct xfrm_migrate m[XFRM_MAX_DEPTH] = {};
struct xfrm_kmaddress km, *kmp;
u8 type;
int err;
@@ -3161,7 +3181,268 @@ static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
kfree(xuo);
return err;
}
+
+static int build_migrate_state(struct sk_buff *skb,
+ const struct xfrm_user_migrate_state *um,
+ const struct xfrm_migrate *m,
+ u8 dir, u32 portid, u32 seq)
+{
+ int err;
+ struct nlmsghdr *nlh;
+ struct xfrm_user_migrate_state *hdr;
+
+ nlh = nlmsg_put(skb, portid, seq, XFRM_MSG_MIGRATE_STATE,
+ sizeof(struct xfrm_user_migrate_state), 0);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ hdr = nlmsg_data(nlh);
+ *hdr = *um;
+ hdr->new_sel = *m->new_sel;
+
+ if (m->encap) {
+ err = nla_put(skb, XFRMA_ENCAP, sizeof(*m->encap), m->encap);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->xuo) {
+ err = nla_put(skb, XFRMA_OFFLOAD_DEV, sizeof(*m->xuo), m->xuo);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->new_mark) {
+ err = nla_put(skb, XFRMA_MARK, sizeof(*m->new_mark),
+ m->new_mark);
+ if (err)
+ goto out_cancel;
+ }
+
+ err = xfrm_smark_put(skb, &m->smark);
+ if (err)
+ goto out_cancel;
+
+ if (m->mapping_maxage) {
+ err = nla_put_u32(skb, XFRMA_MTIMER_THRESH, m->mapping_maxage);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (m->nat_keepalive_interval) {
+ err = nla_put_u32(skb, XFRMA_NAT_KEEPALIVE_INTERVAL,
+ m->nat_keepalive_interval);
+ if (err)
+ goto out_cancel;
+ }
+
+ if (dir) {
+ err = nla_put_u8(skb, XFRMA_SA_DIR, dir);
+ if (err)
+ goto out_cancel;
+ }
+
+ nlmsg_end(skb, nlh);
+ return 0;
+
+out_cancel:
+ nlmsg_cancel(skb, nlh);
+ return err;
+}
+
+static unsigned int xfrm_migrate_state_msgsize(const struct xfrm_migrate *m,
+ u8 dir)
+{
+ return NLMSG_ALIGN(sizeof(struct xfrm_user_migrate_state)) +
+ (m->encap ? nla_total_size(sizeof(struct xfrm_encap_tmpl)) : 0) +
+ (m->xuo ? nla_total_size(sizeof(struct xfrm_user_offload)) : 0) +
+ (m->new_mark ? nla_total_size(sizeof(struct xfrm_mark)) : 0) +
+ (m->smark.v ? nla_total_size(sizeof(u32)) * 2 : 0) + /* SET_MARK + SET_MARK_MASK */
+ (m->mapping_maxage ? nla_total_size(sizeof(u32)) : 0) +
+ (m->nat_keepalive_interval ? nla_total_size(sizeof(u32)) : 0) +
+ (dir ? nla_total_size(sizeof(u8)) : 0); /* XFRMA_SA_DIR */
+}
+
+static int xfrm_send_migrate_state(const struct xfrm_user_migrate_state *um,
+ const struct xfrm_migrate *m,
+ u8 dir, u32 portid, u32 seq)
+{
+ int err;
+ struct sk_buff *skb;
+ struct net *net = &init_net;
+
+ skb = nlmsg_new(xfrm_migrate_state_msgsize(m, dir), GFP_ATOMIC);
+ if (!skb)
+ return -ENOMEM;
+
+ err = build_migrate_state(skb, um, m, dir, portid, seq);
+ if (err < 0) {
+ kfree_skb(skb);
+ return err;
+ }
+
+ return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MIGRATE);
+}
+
+static int xfrm_do_migrate_state(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct nlattr **attrs, struct netlink_ext_ack *extack)
+{
+ struct xfrm_user_migrate_state *um = nlmsg_data(nlh);
+ struct net *net = sock_net(skb->sk);
+ struct xfrm_user_offload xuo = {};
+ struct xfrm_migrate m = {};
+ struct xfrm_state *xc;
+ struct xfrm_state *x;
+ int err;
+
+ if (!um->id.spi) {
+ NL_SET_ERR_MSG(extack, "Invalid SPI 0x0");
+ return -EINVAL;
+ }
+
+ if (um->reserved) {
+ NL_SET_ERR_MSG(extack, "Reserved field must be zero");
+ return -EINVAL;
+ }
+
+ if (um->flags & ~(XFRM_MIGRATE_STATE_NO_OFFLOAD |
+ XFRM_MIGRATE_STATE_UPDATE_SEL)) {
+ NL_SET_ERR_MSG(extack, "Unknown flags in XFRM_MSG_MIGRATE_STATE");
+ return -EINVAL;
+ }
+
+ if ((um->flags & XFRM_MIGRATE_STATE_NO_OFFLOAD) &&
+ attrs[XFRMA_OFFLOAD_DEV]) {
+ NL_SET_ERR_MSG(extack,
+ "XFRM_MIGRATE_STATE_NO_OFFLOAD and XFRMA_OFFLOAD_DEV are mutually exclusive");
+ return -EINVAL;
+ }
+
+ copy_from_user_migrate_state(&m, um);
+
+ x = xfrm_state_lookup(net, m.old_mark.v & m.old_mark.m,
+ &um->id.daddr, um->id.spi,
+ um->id.proto, um->id.family);
+ if (!x) {
+ NL_SET_ERR_MSG(extack, "Can not find state");
+ return -ESRCH;
+ }
+
+ if (um->flags & XFRM_MIGRATE_STATE_UPDATE_SEL) {
+ u8 prefixlen = (x->sel.family == AF_INET6) ? 128 : 32;
+
+ if (x->sel.prefixlen_s != x->sel.prefixlen_d ||
+ x->sel.prefixlen_d != prefixlen ||
+ !xfrm_addr_equal(&x->sel.daddr, &x->id.daddr, x->sel.family) ||
+ !xfrm_addr_equal(&x->sel.saddr, &x->props.saddr, x->sel.family)) {
+ NL_SET_ERR_MSG(extack,
+ "SA selector is not a single-host match for SA addresses");
+ err = -EINVAL;
+ goto out;
+ }
+ }
+
+ if (attrs[XFRMA_ENCAP]) {
+ m.encap = nla_data(attrs[XFRMA_ENCAP]);
+ if (m.encap->encap_type == 0) {
+ m.encap = NULL; /* sentinel: remove encap */
+ } else if (m.encap->encap_type != UDP_ENCAP_ESPINUDP) {
+ NL_SET_ERR_MSG(extack, "Unsupported encapsulation type");
+ err = -EINVAL;
+ goto out;
+ }
+ } else {
+ m.encap = x->encap; /* omit-to-inherit */
+ }
+
+ if (attrs[XFRMA_MTIMER_THRESH]) {
+ err = verify_mtimer_thresh(!!m.encap, x->dir, extack);
+ if (err)
+ goto out;
+ }
+
+ if (attrs[XFRMA_NAT_KEEPALIVE_INTERVAL] &&
+ nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]) && !m.encap) {
+ NL_SET_ERR_MSG(extack,
+ "NAT_KEEPALIVE_INTERVAL requires encapsulation");
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (attrs[XFRMA_OFFLOAD_DEV]) {
+ m.xuo = nla_data(attrs[XFRMA_OFFLOAD_DEV]);
+ } else if (!(um->flags & XFRM_MIGRATE_STATE_NO_OFFLOAD) && x->xso.dev) {
+ xuo.ifindex = x->xso.dev->ifindex;
+ if (x->xso.dir == XFRM_DEV_OFFLOAD_IN)
+ xuo.flags = XFRM_OFFLOAD_INBOUND;
+ if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET)
+ xuo.flags |= XFRM_OFFLOAD_PACKET;
+ m.xuo = &xuo;
+ }
+
+ if (attrs[XFRMA_MARK])
+ m.new_mark = nla_data(attrs[XFRMA_MARK]);
+
+ if (attrs[XFRMA_SET_MARK])
+ xfrm_smark_init(attrs, &m.smark);
+ else
+ m.smark = x->props.smark;
+
+ m.mapping_maxage = attrs[XFRMA_MTIMER_THRESH] ?
+ nla_get_u32(attrs[XFRMA_MTIMER_THRESH]) : x->mapping_maxage;
+ m.nat_keepalive_interval = attrs[XFRMA_NAT_KEEPALIVE_INTERVAL] ?
+ nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]) :
+ x->nat_keepalive_interval;
+
+ xc = xfrm_state_migrate_create(x, &m, net, extack);
+ if (!xc) {
+ NL_SET_ERR_MSG_WEAK(extack, "State migration clone failed");
+ err = -EINVAL;
+ goto out;
+ }
+
+ spin_lock_bh(&x->lock);
+ xfrm_migrate_sync(xc, x); /* to prevent SN/IV reuse */
+ __xfrm_state_delete(x);
+ spin_unlock_bh(&x->lock);
+
+ err = xfrm_state_migrate_install(x, xc, &m, extack);
+ if (err < 0) {
+ /*
+ * In this rare case both the old SA and the new SA
+ * will disappear.
+ * Alternatives risk duplicate SN/IV usage which must not occur.
+ * Userspace must handle this error, -EEXIST.
+ */
+ goto out;
+ }
+
+ /* Restore encap cleared by sentinel (type=0) during migration. */
+ if (attrs[XFRMA_ENCAP])
+ m.encap = nla_data(attrs[XFRMA_ENCAP]);
+
+ m.new_sel = &xc->sel;
+
+ err = xfrm_send_migrate_state(um, &m, xc->dir,
+ nlh->nlmsg_pid, nlh->nlmsg_seq);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Failed to send migration notification");
+ err = 0;
+ }
+
+out:
+ xfrm_state_put(x);
+ return err;
+}
+
#else
+static int xfrm_do_migrate_state(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct nlattr **attrs, struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG(extack, "XFRM_MSG_MIGRATE_STATE is not supported");
+ return -ENOPROTOOPT;
+}
+
static int xfrm_do_migrate(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs, struct netlink_ext_ack *extack)
{
@@ -3314,6 +3595,7 @@ const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
[XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
[XFRM_MSG_SETDEFAULT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_default),
[XFRM_MSG_GETDEFAULT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_default),
+ [XFRM_MSG_MIGRATE_STATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_migrate_state),
};
EXPORT_SYMBOL_GPL(xfrm_msg_min);
@@ -3407,6 +3689,7 @@ static const struct xfrm_link {
[XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = { .doit = xfrm_get_spdinfo },
[XFRM_MSG_SETDEFAULT - XFRM_MSG_BASE] = { .doit = xfrm_set_default },
[XFRM_MSG_GETDEFAULT - XFRM_MSG_BASE] = { .doit = xfrm_get_default },
+ [XFRM_MSG_MIGRATE_STATE - XFRM_MSG_BASE] = { .doit = xfrm_do_migrate_state },
};
static int xfrm_reject_unused_attr(int type, struct nlattr **attrs,
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index 2c0b07f9fbbd..655d2616c9d2 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -128,6 +128,7 @@ static const struct nlmsg_perm nlmsg_xfrm_perms[] = {
{ XFRM_MSG_MAPPING, NETLINK_XFRM_SOCKET__NLMSG_READ },
{ XFRM_MSG_SETDEFAULT, NETLINK_XFRM_SOCKET__NLMSG_WRITE },
{ XFRM_MSG_GETDEFAULT, NETLINK_XFRM_SOCKET__NLMSG_READ },
+ { XFRM_MSG_MIGRATE_STATE, NETLINK_XFRM_SOCKET__NLMSG_WRITE },
};
static const struct nlmsg_perm nlmsg_audit_perms[] = {
@@ -203,7 +204,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
* structures at the top of this file with the new mappings
* before updating the BUILD_BUG_ON() macro!
*/
- BUILD_BUG_ON(XFRM_MSG_MAX != XFRM_MSG_GETDEFAULT);
+ BUILD_BUG_ON(XFRM_MSG_MAX != XFRM_MSG_MIGRATE_STATE);
if (selinux_policycap_netlink_xperm()) {
*perm = NETLINK_XFRM_SOCKET__NLMSG;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 11/14] xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Extract verify_mtimer_thresh() to consolidate the XFRMA_MTIMER_THRESH
validation logic shared between the add_sa and upcoming patch.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
net/xfrm/xfrm_user.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 26b82d94acc1..fe0cf824f072 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -239,6 +239,22 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
return 0;
}
+static int verify_mtimer_thresh(bool has_encap, u8 dir,
+ struct netlink_ext_ack *extack)
+{
+ if (!has_encap) {
+ NL_SET_ERR_MSG(extack,
+ "MTIMER_THRESH requires encapsulation");
+ return -EINVAL;
+ }
+ if (dir == XFRM_SA_DIR_OUT) {
+ NL_SET_ERR_MSG(extack,
+ "MTIMER_THRESH should not be set on output SA");
+ return -EINVAL;
+ }
+ return 0;
+}
+
static int verify_newsa_info(struct xfrm_usersa_info *p,
struct nlattr **attrs,
struct netlink_ext_ack *extack)
@@ -446,18 +462,9 @@ static int verify_newsa_info(struct xfrm_usersa_info *p,
err = 0;
if (attrs[XFRMA_MTIMER_THRESH]) {
- if (!attrs[XFRMA_ENCAP]) {
- NL_SET_ERR_MSG(extack, "MTIMER_THRESH attribute can only be set on ENCAP states");
- err = -EINVAL;
- goto out;
- }
-
- if (sa_dir == XFRM_SA_DIR_OUT) {
- NL_SET_ERR_MSG(extack,
- "MTIMER_THRESH attribute should not be set on output SA");
- err = -EINVAL;
+ err = verify_mtimer_thresh(!!attrs[XFRMA_ENCAP], sa_dir, extack);
+ if (err)
goto out;
- }
}
if (sa_dir == XFRM_SA_DIR_OUT) {
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 10/14] xfrm: move encap and xuo into struct xfrm_migrate
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
In preparation for an upcoming patch, move the xfrm_encap_tmpl and
xfrm_user_offload pointers from separate parameters into struct
xfrm_migrate, reducing the parameter count of
xfrm_state_migrate_create(), xfrm_state_migrate_install(), and
xfrm_state_migrate().
The fields are placed after the four xfrm_address_t members where
the struct is naturally 8-byte aligned, avoiding padding.
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch.
---
include/net/xfrm.h | 7 ++-----
net/xfrm/xfrm_policy.c | 4 +++-
net/xfrm/xfrm_state.c | 20 +++++++-------------
3 files changed, 12 insertions(+), 19 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index be22c26e4661..4b29ab92c2a7 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -682,6 +682,8 @@ struct xfrm_migrate {
xfrm_address_t old_saddr;
xfrm_address_t new_daddr;
xfrm_address_t new_saddr;
+ struct xfrm_encap_tmpl *encap;
+ struct xfrm_user_offload *xuo;
u8 proto;
u8 mode;
u16 reserved;
@@ -1897,20 +1899,15 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
u32 if_id);
struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
const struct xfrm_migrate *m,
- const struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_state_migrate_install(const struct xfrm_state *x,
struct xfrm_state *xc,
const struct xfrm_migrate *m,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack);
int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_bundles,
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 62218b52fd35..0b5c7b51183a 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4672,7 +4672,9 @@ int xfrm_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
if ((x = xfrm_migrate_state_find(mp, net, if_id))) {
x_cur[nx_cur] = x;
nx_cur++;
- xc = xfrm_state_migrate(x, mp, encap, net, xuo, extack);
+ mp->encap = encap;
+ mp->xuo = xuo;
+ xc = xfrm_state_migrate(x, mp, net, extack);
if (xc) {
x_new[nx_new] = xc;
nx_new++;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 06ba8f03eab3..1ee114f8515d 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1966,7 +1966,6 @@ static inline int clone_security(struct xfrm_state *x, struct xfrm_sec_ctx *secu
}
static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
- const struct xfrm_encap_tmpl *encap,
const struct xfrm_migrate *m)
{
struct net *net = xs_net(orig);
@@ -2008,8 +2007,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
}
x->props.calgo = orig->props.calgo;
- if (encap) {
- x->encap = kmemdup(encap, sizeof(*x->encap), GFP_KERNEL);
+ if (m->encap) {
+ x->encap = kmemdup(m->encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
x->mapping_maxage = orig->mapping_maxage;
@@ -2122,14 +2121,12 @@ EXPORT_SYMBOL(xfrm_migrate_state_find);
struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
const struct xfrm_migrate *m,
- const struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
- xc = xfrm_state_clone_and_setup(x, encap, m);
+ xc = xfrm_state_clone_and_setup(x, m);
if (!xc) {
NL_SET_ERR_MSG(extack, "Failed to clone and setup state");
return NULL;
@@ -2141,7 +2138,7 @@ struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
}
/* configure the hardware if offload is requested */
- if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
+ if (m->xuo && xfrm_dev_state_add(net, xc, m->xuo, extack))
goto error;
return xc;
@@ -2155,7 +2152,6 @@ EXPORT_SYMBOL(xfrm_state_migrate_create);
int xfrm_state_migrate_install(const struct xfrm_state *x,
struct xfrm_state *xc,
const struct xfrm_migrate *m,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
if (m->new_family == m->old_family &&
@@ -2168,7 +2164,7 @@ int xfrm_state_migrate_install(const struct xfrm_state *x,
} else {
if (xfrm_state_add(xc) < 0) {
NL_SET_ERR_MSG(extack, "Failed to add migrated state");
- if (xuo)
+ if (m->xuo)
xfrm_dev_state_delete(xc);
xc->km.state = XFRM_STATE_DEAD;
xfrm_state_put(xc);
@@ -2182,20 +2178,18 @@ EXPORT_SYMBOL(xfrm_state_migrate_install);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap,
struct net *net,
- struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
- xc = xfrm_state_migrate_create(x, m, encap, net, xuo, extack);
+ xc = xfrm_state_migrate_create(x, m, net, extack);
if (!xc)
return NULL;
xfrm_migrate_sync(xc, x);
- if (xfrm_state_migrate_install(x, xc, m, xuo, extack) < 0)
+ if (xfrm_state_migrate_install(x, xc, m, extack) < 0)
return NULL;
return xc;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 09/14] xfrm: add error messages to state migration
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add descriptive(extack) error messages for all error paths
in state migration. This improves diagnostics by
providing clear feedback when migration fails.
After xfrm_init_state() use NL_SET_ERR_MSG_WEAK() as fallback for
error paths not yet propagating extack e.g. mode_cbs->init_state()
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: - in case dev_state_add() extack already set
- after xfrm_init_state() use NL_SET_ERR_MSG_WEAK() as fallback
v4->v5: - added this patch
---
net/xfrm/xfrm_state.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8494c46118d9..06ba8f03eab3 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2130,11 +2130,15 @@ struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
struct xfrm_state *xc;
xc = xfrm_state_clone_and_setup(x, encap, m);
- if (!xc)
+ if (!xc) {
+ NL_SET_ERR_MSG(extack, "Failed to clone and setup state");
return NULL;
+ }
- if (xfrm_init_state(xc, extack) < 0)
+ if (xfrm_init_state(xc, extack) < 0) {
+ NL_SET_ERR_MSG_WEAK(extack, "Failed to initialize migrated state");
goto error;
+ }
/* configure the hardware if offload is requested */
if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
@@ -2163,6 +2167,7 @@ int xfrm_state_migrate_install(const struct xfrm_state *x,
xfrm_state_insert(xc);
} else {
if (xfrm_state_add(xc) < 0) {
+ NL_SET_ERR_MSG(extack, "Failed to add migrated state");
if (xuo)
xfrm_dev_state_delete(xc);
xc->km.state = XFRM_STATE_DEAD;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 08/14] xfrm: add state synchronization after migration
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add xfrm_migrate_sync() to copy curlft and replay state from the old SA
to the new one before installation. The function allocates no memory, so
it can be called under a spinlock. In preparation for a subsequent patch
in this series.
A subsequent patch calls this under x->lock, atomically capturing the
latest lifetime counters and replay state from the original SA and
deleting it in the same critical section to prevent SN/IV reuse
for XFRM_MSG_MIGRATE_STATE method.
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v6->v7: - rephrase commit message
v5->v6: - move the sync before install to avoid overwriting
v4->v5: - added this patch
---
include/net/xfrm.h | 46 +++++++++++++++++++++++++++++++++++++---------
net/xfrm/xfrm_state.c | 11 ++++-------
2 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 4137986f15e2..be22c26e4661 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -2024,23 +2024,51 @@ static inline unsigned int xfrm_replay_state_esn_len(struct xfrm_replay_state_es
#ifdef CONFIG_XFRM_MIGRATE
static inline int xfrm_replay_clone(struct xfrm_state *x,
- struct xfrm_state *orig)
+ const struct xfrm_state *orig)
{
+ /* Counters synced later in xfrm_replay_sync() */
- x->replay_esn = kmemdup(orig->replay_esn,
+ x->replay = orig->replay;
+ x->preplay = orig->preplay;
+
+ if (orig->replay_esn) {
+ x->replay_esn = kmemdup(orig->replay_esn,
xfrm_replay_state_esn_len(orig->replay_esn),
GFP_KERNEL);
- if (!x->replay_esn)
- return -ENOMEM;
- x->preplay_esn = kmemdup(orig->preplay_esn,
- xfrm_replay_state_esn_len(orig->preplay_esn),
- GFP_KERNEL);
- if (!x->preplay_esn)
- return -ENOMEM;
+ if (!x->replay_esn)
+ return -ENOMEM;
+ x->preplay_esn = kmemdup(orig->preplay_esn,
+ xfrm_replay_state_esn_len(orig->preplay_esn),
+ GFP_KERNEL);
+ if (!x->preplay_esn)
+ return -ENOMEM;
+ }
return 0;
}
+static inline void xfrm_replay_sync(struct xfrm_state *x, const struct xfrm_state *orig)
+{
+ x->replay = orig->replay;
+ x->preplay = orig->preplay;
+
+ if (orig->replay_esn) {
+ memcpy(x->replay_esn, orig->replay_esn,
+ xfrm_replay_state_esn_len(orig->replay_esn));
+
+ memcpy(x->preplay_esn, orig->preplay_esn,
+ xfrm_replay_state_esn_len(orig->preplay_esn));
+ }
+}
+
+static inline void xfrm_migrate_sync(struct xfrm_state *x,
+ const struct xfrm_state *orig)
+{
+ /* called under lock so no race conditions or mallocs allowed */
+ memcpy(&x->curlft, &orig->curlft, sizeof(x->curlft));
+ xfrm_replay_sync(x, orig);
+}
+
static inline struct xfrm_algo_aead *xfrm_algo_aead_clone(struct xfrm_algo_aead *orig)
{
return kmemdup(orig, aead_len(orig), GFP_KERNEL);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f7bcf1422358..8494c46118d9 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2027,10 +2027,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
goto error;
}
- if (orig->replay_esn) {
- if (xfrm_replay_clone(x, orig))
- goto error;
- }
+ if (xfrm_replay_clone(x, orig))
+ goto error;
memcpy(&x->mark, &orig->mark, sizeof(x->mark));
memcpy(&x->props.smark, &orig->props.smark, sizeof(x->props.smark));
@@ -2043,11 +2041,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->tfcpad = orig->tfcpad;
x->replay_maxdiff = orig->replay_maxdiff;
x->replay_maxage = orig->replay_maxage;
- memcpy(&x->curlft, &orig->curlft, sizeof(x->curlft));
x->km.state = orig->km.state;
x->km.seq = orig->km.seq;
- x->replay = orig->replay;
- x->preplay = orig->preplay;
x->lastused = orig->lastused;
x->new_mapping = 0;
x->new_mapping_sport = 0;
@@ -2193,6 +2188,8 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
if (!xc)
return NULL;
+ xfrm_migrate_sync(xc, x);
+
if (xfrm_state_migrate_install(x, xc, m, xuo, extack) < 0)
return NULL;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 07/14] xfrm: check family before comparing addresses in migrate
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
When migrating between different address families, xfrm_addr_equal()
cannot meaningfully compare addresses, different lengths.
Only call xfrm_addr_equal() when families match, and take
the xfrm_state_insert() path when addresses are equal.
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
net/xfrm/xfrm_state.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 9060a6c399fd..f7bcf1422358 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2159,10 +2159,11 @@ int xfrm_state_migrate_install(const struct xfrm_state *x,
struct xfrm_user_offload *xuo,
struct netlink_ext_ack *extack)
{
- if (xfrm_addr_equal(&x->id.daddr, &m->new_daddr, m->new_family)) {
+ if (m->new_family == m->old_family &&
+ xfrm_addr_equal(&x->id.daddr, &m->new_daddr, m->new_family)) {
/*
- * Care is needed when the destination address
- * of the state is to be updated as it is a part of triplet.
+ * Care is needed when the destination address of the state is
+ * to be updated as it is a part of triplet.
*/
xfrm_state_insert(xc);
} else {
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 06/14] xfrm: split xfrm_state_migrate into create and install functions
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
To prepare for subsequent patches, split
xfrm_state_migrate() into two functions:
- xfrm_state_migrate_create(): creates the migrated state
- xfrm_state_migrate_install(): installs it into the state table
splitting will help to avoid SN/IV reuse when migrating AEAD SA.
And add const whenever possible.
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v4->v5: - added this patch
---
include/net/xfrm.h | 11 ++++++++
net/xfrm/xfrm_state.c | 73 +++++++++++++++++++++++++++++++++++++--------------
2 files changed, 64 insertions(+), 20 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 368b1dc22e5c..4137986f15e2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1895,6 +1895,17 @@ int km_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
const struct xfrm_encap_tmpl *encap);
struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *net,
u32 if_id);
+struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
+ const struct xfrm_migrate *m,
+ const struct xfrm_encap_tmpl *encap,
+ struct net *net,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack);
+int xfrm_state_migrate_install(const struct xfrm_state *x,
+ struct xfrm_state *xc,
+ const struct xfrm_migrate *m,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack);
struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
struct xfrm_migrate *m,
struct xfrm_encap_tmpl *encap,
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a94f82f1354e..9060a6c399fd 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1966,8 +1966,8 @@ static inline int clone_security(struct xfrm_state *x, struct xfrm_sec_ctx *secu
}
static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
- struct xfrm_encap_tmpl *encap,
- struct xfrm_migrate *m)
+ const struct xfrm_encap_tmpl *encap,
+ const struct xfrm_migrate *m)
{
struct net *net = xs_net(orig);
struct xfrm_state *x = xfrm_state_alloc(net);
@@ -2125,12 +2125,12 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
}
EXPORT_SYMBOL(xfrm_migrate_state_find);
-struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
- struct xfrm_migrate *m,
- struct xfrm_encap_tmpl *encap,
- struct net *net,
- struct xfrm_user_offload *xuo,
- struct netlink_ext_ack *extack)
+struct xfrm_state *xfrm_state_migrate_create(struct xfrm_state *x,
+ const struct xfrm_migrate *m,
+ const struct xfrm_encap_tmpl *encap,
+ struct net *net,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
{
struct xfrm_state *xc;
@@ -2145,24 +2145,57 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
if (xuo && xfrm_dev_state_add(net, xc, xuo, extack))
goto error;
- /* add state */
+ return xc;
+error:
+ xc->km.state = XFRM_STATE_DEAD;
+ xfrm_state_put(xc);
+ return NULL;
+}
+EXPORT_SYMBOL(xfrm_state_migrate_create);
+
+int xfrm_state_migrate_install(const struct xfrm_state *x,
+ struct xfrm_state *xc,
+ const struct xfrm_migrate *m,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
+{
if (xfrm_addr_equal(&x->id.daddr, &m->new_daddr, m->new_family)) {
- /* a care is needed when the destination address of the
- state is to be updated as it is a part of triplet */
+ /*
+ * Care is needed when the destination address
+ * of the state is to be updated as it is a part of triplet.
+ */
xfrm_state_insert(xc);
} else {
- if (xfrm_state_add(xc) < 0)
- goto error_add;
+ if (xfrm_state_add(xc) < 0) {
+ if (xuo)
+ xfrm_dev_state_delete(xc);
+ xc->km.state = XFRM_STATE_DEAD;
+ xfrm_state_put(xc);
+ return -EEXIST;
+ }
}
+ return 0;
+}
+EXPORT_SYMBOL(xfrm_state_migrate_install);
+
+struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
+ struct xfrm_migrate *m,
+ struct xfrm_encap_tmpl *encap,
+ struct net *net,
+ struct xfrm_user_offload *xuo,
+ struct netlink_ext_ack *extack)
+{
+ struct xfrm_state *xc;
+
+ xc = xfrm_state_migrate_create(x, m, encap, net, xuo, extack);
+ if (!xc)
+ return NULL;
+
+ if (xfrm_state_migrate_install(x, xc, m, xuo, extack) < 0)
+ return NULL;
+
return xc;
-error_add:
- if (xuo)
- xfrm_dev_state_delete(xc);
-error:
- xc->km.state = XFRM_STATE_DEAD;
- xfrm_state_put(xc);
- return NULL;
}
EXPORT_SYMBOL(xfrm_state_migrate);
#endif
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 05/14] xfrm: rename reqid in xfrm_migrate
From: Antony Antony @ 2026-04-12 11:15 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
In preparation for a later patch in this series s/reqid/old_reqid/.
No functional change.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
include/net/xfrm.h | 2 +-
net/key/af_key.c | 10 +++++-----
net/xfrm/xfrm_policy.c | 4 ++--
net/xfrm/xfrm_state.c | 6 +++---
net/xfrm/xfrm_user.c | 4 ++--
5 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 0c035955d87d..368b1dc22e5c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -685,7 +685,7 @@ struct xfrm_migrate {
u8 proto;
u8 mode;
u16 reserved;
- u32 reqid;
+ u32 old_reqid;
u16 old_family;
u16 new_family;
};
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 41afb9e82a58..ccd2e2d65688 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -2538,7 +2538,7 @@ static int ipsecrequests_to_migrate(struct sadb_x_ipsecrequest *rq1, int len,
if ((mode = pfkey_mode_to_xfrm(rq1->sadb_x_ipsecrequest_mode)) < 0)
return -EINVAL;
m->mode = mode;
- m->reqid = rq1->sadb_x_ipsecrequest_reqid;
+ m->old_reqid = rq1->sadb_x_ipsecrequest_reqid;
return ((int)(rq1->sadb_x_ipsecrequest_len +
rq2->sadb_x_ipsecrequest_len));
@@ -3634,15 +3634,15 @@ static int pfkey_send_migrate(const struct xfrm_selector *sel, u8 dir, u8 type,
if (mode < 0)
goto err;
if (set_ipsecrequest(skb, mp->proto, mode,
- (mp->reqid ? IPSEC_LEVEL_UNIQUE : IPSEC_LEVEL_REQUIRE),
- mp->reqid, mp->old_family,
+ (mp->old_reqid ? IPSEC_LEVEL_UNIQUE : IPSEC_LEVEL_REQUIRE),
+ mp->old_reqid, mp->old_family,
&mp->old_saddr, &mp->old_daddr) < 0)
goto err;
/* new ipsecrequest */
if (set_ipsecrequest(skb, mp->proto, mode,
- (mp->reqid ? IPSEC_LEVEL_UNIQUE : IPSEC_LEVEL_REQUIRE),
- mp->reqid, mp->new_family,
+ (mp->old_reqid ? IPSEC_LEVEL_UNIQUE : IPSEC_LEVEL_REQUIRE),
+ mp->old_reqid, mp->new_family,
&mp->new_saddr, &mp->new_daddr) < 0)
goto err;
}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7bcb6583e84c..62218b52fd35 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4530,7 +4530,7 @@ static int migrate_tmpl_match(const struct xfrm_migrate *m, const struct xfrm_tm
int match = 0;
if (t->mode == m->mode && t->id.proto == m->proto &&
- (m->reqid == 0 || t->reqid == m->reqid)) {
+ (m->old_reqid == 0 || t->reqid == m->old_reqid)) {
switch (t->mode) {
case XFRM_MODE_TUNNEL:
case XFRM_MODE_BEET:
@@ -4624,7 +4624,7 @@ static int xfrm_migrate_check(const struct xfrm_migrate *m, int num_migrate,
sizeof(m[i].old_saddr)) &&
m[i].proto == m[j].proto &&
m[i].mode == m[j].mode &&
- m[i].reqid == m[j].reqid &&
+ m[i].old_reqid == m[j].old_reqid &&
m[i].old_family == m[j].old_family) {
NL_SET_ERR_MSG(extack, "Entries in the MIGRATE attribute's list must be unique");
return -EINVAL;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index defa753b26ae..a94f82f1354e 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2081,14 +2081,14 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
spin_lock_bh(&net->xfrm.xfrm_state_lock);
- if (m->reqid) {
+ if (m->old_reqid) {
h = xfrm_dst_hash(net, &m->old_daddr, &m->old_saddr,
- m->reqid, m->old_family);
+ m->old_reqid, m->old_family);
hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
if (x->props.mode != m->mode ||
x->id.proto != m->proto)
continue;
- if (m->reqid && x->props.reqid != m->reqid)
+ if (m->old_reqid && x->props.reqid != m->old_reqid)
continue;
if (if_id != 0 && x->if_id != if_id)
continue;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 403b5ecac2c5..26b82d94acc1 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3087,7 +3087,7 @@ static int copy_from_user_migrate(struct xfrm_migrate *ma,
ma->proto = um->proto;
ma->mode = um->mode;
- ma->reqid = um->reqid;
+ ma->old_reqid = um->reqid;
ma->old_family = um->old_family;
ma->new_family = um->new_family;
@@ -3170,7 +3170,7 @@ static int copy_to_user_migrate(const struct xfrm_migrate *m, struct sk_buff *sk
memset(&um, 0, sizeof(um));
um.proto = m->proto;
um.mode = m->mode;
- um.reqid = m->reqid;
+ um.reqid = m->old_reqid;
um.old_family = m->old_family;
memcpy(&um.old_daddr, &m->old_daddr, sizeof(um.old_daddr));
memcpy(&um.old_saddr, &m->old_saddr, sizeof(um.old_saddr));
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 04/14] xfrm: fix NAT-related field inheritance in SA migration
From: Antony Antony @ 2026-04-12 11:14 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
During SA migration via xfrm_state_clone_and_setup(),
nat_keepalive_interval was silently dropped and never copied to the new
SA. mapping_maxage was unconditionally copied even when migrating to a
non-encapsulated SA.
Both fields are only meaningful when UDP encapsulation (NAT-T) is in
use. Move mapping_maxage and add nat_keepalive_interval inside the
existing if (encap) block, so both are inherited when migrating with
encapsulation and correctly absent when migrating without it.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
net/xfrm/xfrm_state.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 20ebd10dbee5..defa753b26ae 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2012,6 +2012,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->encap = kmemdup(encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
+ x->mapping_maxage = orig->mapping_maxage;
+ x->nat_keepalive_interval = orig->nat_keepalive_interval;
}
if (orig->security)
@@ -2046,7 +2048,6 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->km.seq = orig->km.seq;
x->replay = orig->replay;
x->preplay = orig->preplay;
- x->mapping_maxage = orig->mapping_maxage;
x->lastused = orig->lastused;
x->new_mapping = 0;
x->new_mapping_sport = 0;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 03/14] xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
From: Antony Antony @ 2026-04-12 11:14 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
The current code prevents migrating an SA from UDP encapsulation to
plain ESP. This is needed when moving from a NATed path to a non-NATed
one, for example when switching from IPv4+NAT to IPv6.
Only copy the existing encapsulation during migration if the encap
attribute is explicitly provided.
Note: PF_KEY's SADB_X_MIGRATE always passes encap=NULL and never
supported encapsulation in migration. PF_KEY is deprecated and was
in feature freeze when UDP encapsulation was added to xfrm.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Tested-by: Yan Yan <evitayan@google.com>
---
net/xfrm/xfrm_state.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 86f21a19a0ee..20ebd10dbee5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2008,14 +2008,8 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
}
x->props.calgo = orig->props.calgo;
- if (encap || orig->encap) {
- if (encap)
- x->encap = kmemdup(encap, sizeof(*x->encap),
- GFP_KERNEL);
- else
- x->encap = kmemdup(orig->encap, sizeof(*x->encap),
- GFP_KERNEL);
-
+ if (encap) {
+ x->encap = kmemdup(encap, sizeof(*x->encap), GFP_KERNEL);
if (!x->encap)
goto error;
}
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 02/14] xfrm: add extack to xfrm_init_state
From: Antony Antony @ 2026-04-12 11:13 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
Add a struct extack parameter to xfrm_init_state() and pass it
through to __xfrm_init_state(). This allows validation errors detected
during state initialization to propagate meaningful error messages back
to userspace.
xfrm_state_migrate_create() now passes extack so that errors from the
XFRM_MSG_MIGRATE_STATE path are properly reported. Callers without an
extack context (af_key, ipcomp4, ipcomp6) pass NULL, preserving their
existing behaviour.
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v5->v6: added this patch
---
include/net/xfrm.h | 2 +-
net/ipv4/ipcomp.c | 2 +-
net/ipv6/ipcomp6.c | 2 +-
net/key/af_key.c | 2 +-
net/xfrm/xfrm_state.c | 6 +++---
5 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 10d3edde6b2f..0c035955d87d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1774,7 +1774,7 @@ u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
int xfrm_init_replay(struct xfrm_state *x, struct netlink_ext_ack *extack);
u32 xfrm_state_mtu(struct xfrm_state *x, int mtu);
int __xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack);
-int xfrm_init_state(struct xfrm_state *x);
+int xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack);
int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index 9a45aed508d1..b1ea2d37e8c5 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -77,7 +77,7 @@ static struct xfrm_state *ipcomp_tunnel_create(struct xfrm_state *x)
memcpy(&t->mark, &x->mark, sizeof(t->mark));
t->if_id = x->if_id;
- if (xfrm_init_state(t))
+ if (xfrm_init_state(t, NULL))
goto error;
atomic_set(&t->tunnel_users, 1);
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 8607569de34f..b340d67eb1d9 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -95,7 +95,7 @@ static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x)
memcpy(&t->mark, &x->mark, sizeof(t->mark));
t->if_id = x->if_id;
- if (xfrm_init_state(t))
+ if (xfrm_init_state(t, NULL))
goto error;
atomic_set(&t->tunnel_users, 1);
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 571200433aa9..41afb9e82a58 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1283,7 +1283,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
}
}
- err = xfrm_init_state(x);
+ err = xfrm_init_state(x, NULL);
if (err)
goto out;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3ee92f93dbd2..86f21a19a0ee 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2143,7 +2143,7 @@ struct xfrm_state *xfrm_state_migrate(struct xfrm_state *x,
if (!xc)
return NULL;
- if (xfrm_init_state(xc) < 0)
+ if (xfrm_init_state(xc, extack) < 0)
goto error;
/* configure the hardware if offload is requested */
@@ -3236,11 +3236,11 @@ int __xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack)
EXPORT_SYMBOL(__xfrm_init_state);
-int xfrm_init_state(struct xfrm_state *x)
+int xfrm_init_state(struct xfrm_state *x, struct netlink_ext_ack *extack)
{
int err;
- err = __xfrm_init_state(x, NULL);
+ err = __xfrm_init_state(x, extack);
if (err)
return err;
--
2.47.3
^ permalink raw reply related
* [PATCH ipsec-next v7 01/14] xfrm: remove redundant assignments
From: Antony Antony @ 2026-04-12 11:13 UTC (permalink / raw)
To: Antony Antony, Steffen Klassert, Herbert Xu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
David Ahern, Masahide NAKAMURA, Paul Moore, Stephen Smalley,
Ondrej Mosnacek, Jonathan Corbet, Shuah Khan
Cc: netdev, linux-kernel, selinux, linux-doc, Chiachang Wang, Yan Yan,
devel
In-Reply-To: <migrate-state-v7-0-44eb2440b91c@secunet.com>
These assignments are overwritten within the same function further down
commit e8961c50ee9cc ("xfrm: Refactor migration setup
during the cloning process")
x->props.family = m->new_family;
Which actually moved it in the
commit e03c3bba351f9 ("xfrm: Fix xfrm migrate issues when address family changes")
And the initial
commit 80c9abaabf428 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
added x->props.saddr = orig->props.saddr; and
memcpy(&xc->props.saddr, &m->new_saddr, sizeof(xc->props.saddr));
Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
v1->v2: remove extra saddr copy, previous line
---
net/xfrm/xfrm_state.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 98b362d51836..3ee92f93dbd2 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1980,8 +1980,6 @@ static struct xfrm_state *xfrm_state_clone_and_setup(struct xfrm_state *orig,
x->props.mode = orig->props.mode;
x->props.replay_window = orig->props.replay_window;
x->props.reqid = orig->props.reqid;
- x->props.family = orig->props.family;
- x->props.saddr = orig->props.saddr;
if (orig->aalg) {
x->aalg = xfrm_algo_auth_clone(orig->aalg);
--
2.47.3
^ permalink raw reply related
* [PATCH v4] Documentation: Refactored watchdog old doc
From: Sunny Patel @ 2026-04-12 9:53 UTC (permalink / raw)
To: linux-doc
Cc: linux-watchdog, linux-kernel, corbet, wim, linux, rdunlap,
Sunny Patel
Mark WDIOC_GETTEMP and WDIOS_TEMPPANIC as deprecated since
neither is implemented by the watchdog core and both are only
present in a small number of legacy drivers.
Add documentation for previously undocumented status bits
WDIOF_MAGICCLOSE and WDIOF_ALARMONLY in the options field.
Add documentation for WDIOF_PRETIMEOUT and WDIOF_SETTIMEOUT
status bits describing their respective ioctls.
Fix the following issues in existing documentation:
- Remove version-specific reference to Linux 2.4.18 from
the GETTIMEOUT ioctl description
- Fix duplicate "was is" in printf format strings
- Replace [FIXME] placeholder with proper descriptions for
WDIOS_DISABLECARD, WDIOS_ENABLECARD and WDIOS_TEMPPANIC
Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
---
Changes in v4:
- Fixed WDIOS_DISABLECARD description: corrected inverted logic —
the ioctl disables the hardware timer entirely rather than
stopping pings. Clarified that userspace, not the kernel driver,
is primarily responsible for pinging under normal operation.
Apologies for the broken mail threading on v2 and v3 as well.
Documentation/watchdog/watchdog-api.rst | 65 +++++++++++++++++++++----
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
index 78e228c272cf..43ca6b2bbeff 100644
--- a/Documentation/watchdog/watchdog-api.rst
+++ b/Documentation/watchdog/watchdog-api.rst
@@ -2,7 +2,7 @@
The Linux Watchdog driver API
=============================
-Last reviewed: 10/05/2007
+Last reviewed: 04/08/2026
@@ -42,7 +42,7 @@ activates as soon as /dev/watchdog is opened and will reboot unless
the watchdog is pinged within a certain time, this time is called the
timeout or margin. The simplest way to ping the watchdog is to write
some data to the device. So a very simple watchdog daemon would look
-like this source file: see samples/watchdog/watchdog-simple.c
+like this source file: see samples/watchdog/watchdog-simple.c
A more advanced driver could for example check that a HTTP server is
still responding before doing the write call to ping the watchdog.
@@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
This example might actually print "The timeout was set to 60 seconds"
if the device has a granularity of minutes for its timeout.
-Starting with the Linux 2.4.18 kernel, it is possible to query the
-current timeout using the GETTIMEOUT ioctl::
+It is also possible to get the current timeout with the GETTIMEOUT ioctl::
ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
- printf("The timeout was is %d seconds\n", timeout);
+ printf("The timeout is %d seconds\n", timeout);
Pretimeouts
===========
@@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
There is also a get function for getting the pretimeout::
ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
- printf("The pretimeout was is %d seconds\n", timeout);
+ printf("The pretimeout is %d seconds\n", timeout);
Not all watchdog drivers will support a pretimeout.
@@ -145,12 +144,12 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
that returns the number of seconds before reboot::
ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
- printf("The timeout was is %d seconds\n", timeleft);
+ printf("The timeout is %d seconds\n", timeleft);
Environmental monitoring
========================
-All watchdog drivers are required return more information about the system,
+All watchdog drivers are required to return more information about the system,
some do temperature, fan and power level monitoring, some can tell you
the reason for the last reboot of the system. The GETSUPPORT ioctl is
available to ask what the device can do::
@@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
WDIOF_SETTIMEOUT Can set/get the timeout
================ =======================
-The watchdog can do pretimeouts.
+The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
+WDIOC_GETTIMEOUT ioctls.
================ ================================
WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
================ ================================
+The watchdog supports a pretimeout, a warning interrupt that fires before
+the actual reboot timeout. Use WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
+to set/get the pretimeout.
+
+ ================ ================================
+ WDIOF_MAGICCLOSE Supports magic close char
+ ================ ================================
+
+The driver supports the Magic Close feature. The watchdog is only disabled
+if the character 'V' is written to /dev/watchdog before the file descriptor
+is closed. Without writing 'V' before closing, the watchdog remains active
+and will trigger a reboot after the timeout expires.
+
+ ================ ================================
+ WDIOF_ALARMONLY Not a reboot watchdog
+ ================ ================================
+
+The watchdog will not reboot the system when it expires. Instead it
+triggers a management or other external alarm. Userspace should not
+rely on a system reboot occurring.
For those drivers that return any bits set in the option field, the
GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
@@ -254,6 +274,11 @@ returned value is the temperature in degrees Fahrenheit::
int temperature;
ioctl(fd, WDIOC_GETTEMP, &temperature);
+.. note::
+ ``WDIOC_GETTEMP`` is not implemented by the watchdog core and is
+ considered deprecated. It is only supported by a small number of
+ legacy drivers. New drivers should not implement it.
+
Finally the SETOPTIONS ioctl can be used to control some aspects of
the cards operation::
@@ -268,4 +293,24 @@ The following options are available:
WDIOS_TEMPPANIC Kernel panic on temperature trip
================= ================================
-[FIXME -- better explanations]
+``WDIOS_DISABLECARD`` disables the hardware watchdog timer entirely,
+allowing a controlled system shutdown without triggering a reboot.
+Userspace is responsible for pinging the watchdog under normal
+operation; this ioctl stops the underlying hardware timer so that
+the absence of pings no longer causes a system reset.
+
+``WDIOS_ENABLECARD`` starts the watchdog timer. If the watchdog was
+previously stopped via ``WDIOS_DISABLECARD``, this will re-enable it. The
+hardware watchdog will begin counting down from the configured timeout.
+
+``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
+the driver will call ``panic()`` (or ``kernel_power_off()`` on some
+drivers) if the hardware temperature sensor exceeds its threshold,
+rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
+for this option is driver-specific; not all watchdog drivers implement
+temperature monitoring.
+
+.. note::
+ ``WDIOS_TEMPPANIC`` is not implemented by the watchdog core and is
+ considered deprecated. It is only present in a small number of
+ legacy drivers. New drivers should not implement it.
--
2.43.0
^ permalink raw reply related
* [PATCH v3] Documentation: Refactored watchdog old doc
From: Sunny Patel @ 2026-04-12 8:22 UTC (permalink / raw)
To: linux-doc
Cc: linux-watchdog, linux-kernel, corbet, wim, linux, rdunlap,
Sunny Patel
In-Reply-To: <20260411150922.20536-1-nueralspacetech@gmail.com>
Mark WDIOC_GETTEMP and WDIOS_TEMPPANIC as deprecated since
neither is implemented by the watchdog core and both are only
present in a small number of legacy drivers.
Add documentation for previously undocumented status bits
WDIOF_MAGICCLOSE and WDIOF_ALARMONLY in the options field.
Add documentation for WDIOF_PRETIMEOUT and WDIOF_SETTIMEOUT
status bits describing their respective ioctls.
Fix the following issues in existing documentation:
- Remove version-specific reference to Linux 2.4.18 from
the GETTIMEOUT ioctl description
- Fix duplicate "was is" in printf format strings
- Replace [FIXME] placeholder with proper descriptions for
WDIOS_DISABLECARD, WDIOS_ENABLECARD and WDIOS_TEMPPANIC
Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
---
Changes in v3:
- Replaced .. deprecated:: with .. note:: to avoid Sphinx
build warning due to missing version argument
- Fixed WDIOF_MAGICCLOSE description: corrected inverted
behaviour — without 'V', watchdog remains active and
reboots, it does not disable unconditionally
- Fixed comma splice: "Magic Close feature, The" ->
"Magic Close feature. The"
- Fixed WDIOF_ALARMONLY table alignment (malformed table)
- Fixed "driver-specific," -> "driver-specific;" (semicolon)
- Fixed identity row indentation in watchdog_info table
Note: This is sent as v3 and not v4. The previous submission
was v2 and this directly addresses the review comments on it.
The version was not bumped to v4 to keep the revision history
consistent with the actual number of review cycles this patch
has gone through.
Documentation/watchdog/watchdog-api.rst | 63 +++++++++++++++++++++----
1 file changed, 53 insertions(+), 10 deletions(-)
diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
index 78e228c272cf..83848d02959d 100644
--- a/Documentation/watchdog/watchdog-api.rst
+++ b/Documentation/watchdog/watchdog-api.rst
@@ -2,7 +2,7 @@
The Linux Watchdog driver API
=============================
-Last reviewed: 10/05/2007
+Last reviewed: 04/08/2026
@@ -42,7 +42,7 @@ activates as soon as /dev/watchdog is opened and will reboot unless
the watchdog is pinged within a certain time, this time is called the
timeout or margin. The simplest way to ping the watchdog is to write
some data to the device. So a very simple watchdog daemon would look
-like this source file: see samples/watchdog/watchdog-simple.c
+like this source file: see samples/watchdog/watchdog-simple.c
A more advanced driver could for example check that a HTTP server is
still responding before doing the write call to ping the watchdog.
@@ -106,11 +106,10 @@ the requested one due to limitation of the hardware::
This example might actually print "The timeout was set to 60 seconds"
if the device has a granularity of minutes for its timeout.
-Starting with the Linux 2.4.18 kernel, it is possible to query the
-current timeout using the GETTIMEOUT ioctl::
+It is also possible to get the current timeout with the GETTIMEOUT ioctl::
ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
- printf("The timeout was is %d seconds\n", timeout);
+ printf("The timeout is %d seconds\n", timeout);
Pretimeouts
===========
@@ -133,7 +132,7 @@ seconds. Setting a pretimeout to zero disables it.
There is also a get function for getting the pretimeout::
ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
- printf("The pretimeout was is %d seconds\n", timeout);
+ printf("The pretimeout is %d seconds\n", timeout);
Not all watchdog drivers will support a pretimeout.
@@ -145,12 +144,12 @@ before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
that returns the number of seconds before reboot::
ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
- printf("The timeout was is %d seconds\n", timeleft);
+ printf("The timeout is %d seconds\n", timeleft);
Environmental monitoring
========================
-All watchdog drivers are required return more information about the system,
+All watchdog drivers are required to return more information about the system,
some do temperature, fan and power level monitoring, some can tell you
the reason for the last reboot of the system. The GETSUPPORT ioctl is
available to ask what the device can do::
@@ -227,12 +226,33 @@ The watchdog saw a keepalive ping since it was last queried.
WDIOF_SETTIMEOUT Can set/get the timeout
================ =======================
-The watchdog can do pretimeouts.
+The watchdog supports timeout set/get via the WDIOC_SETTIMEOUT and
+WDIOC_GETTIMEOUT ioctls.
================ ================================
WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
================ ================================
+The watchdog supports a pretimeout, a warning interrupt that fires before
+the actual reboot timeout. Use WDIOC_SETPRETIMEOUT and WDIOC_GETPRETIMEOUT
+to set/get the pretimeout.
+
+ ================ ================================
+ WDIOF_MAGICCLOSE Supports magic close char
+ ================ ================================
+
+The driver supports the Magic Close feature. The watchdog is only disabled
+if the character 'V' is written to /dev/watchdog before the file descriptor
+is closed. Without writing 'V' before closing, the watchdog remains active
+and will trigger a reboot after the timeout expires.
+
+ ================ ================================
+ WDIOF_ALARMONLY Not a reboot watchdog
+ ================ ================================
+
+The watchdog will not reboot the system when it expires. Instead it
+triggers a management or other external alarm. Userspace should not
+rely on a system reboot occurring.
For those drivers that return any bits set in the option field, the
GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
@@ -254,6 +274,11 @@ returned value is the temperature in degrees Fahrenheit::
int temperature;
ioctl(fd, WDIOC_GETTEMP, &temperature);
+.. note::
+ ``WDIOC_GETTEMP`` is not implemented by the watchdog core and is
+ considered deprecated. It is only supported by a small number of
+ legacy drivers. New drivers should not implement it.
+
Finally the SETOPTIONS ioctl can be used to control some aspects of
the cards operation::
@@ -268,4 +293,22 @@ The following options are available:
WDIOS_TEMPPANIC Kernel panic on temperature trip
================= ================================
-[FIXME -- better explanations]
+``WDIOS_DISABLECARD`` stops the watchdog timer. The driver will cease
+pinging the hardware watchdog, allowing a controlled shutdown without
+a forced reboot. This is equivalent to the watchdog being disarmed.
+
+``WDIOS_ENABLECARD`` starts the watchdog timer. If the watchdog was
+previously stopped via ``WDIOS_DISABLECARD``, this will re-enable it. The
+hardware watchdog will begin counting down from the configured timeout.
+
+``WDIOS_TEMPPANIC`` enables temperature-based kernel panic. When set,
+the driver will call ``panic()`` (or ``kernel_power_off()`` on some
+drivers) if the hardware temperature sensor exceeds its threshold,
+rather than only setting the ``WDIOF_OVERHEAT`` status bit. Support
+for this option is driver-specific; not all watchdog drivers implement
+temperature monitoring.
+
+.. note::
+ ``WDIOS_TEMPPANIC`` is not implemented by the watchdog core and is
+ considered deprecated. It is only present in a small number of
+ legacy drivers. New drivers should not implement it.
--
2.43.0
^ permalink raw reply related
* Re: maintainer profiles
From: Mauro Carvalho Chehab @ 2026-04-12 6:31 UTC (permalink / raw)
To: Randy Dunlap
Cc: Linux Documentation, Linux Kernel Mailing List, Jonathan Corbet,
Linux Kernel Workflows, Dan Williams, Thomas Gleixner
In-Reply-To: <a7421a41-458a-4925-a804-e31e2552c79e@infradead.org>
On Sat, 11 Apr 2026 17:02:56 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:
> On 4/11/26 4:54 PM, Randy Dunlap wrote:
> > Hi,
> >
> > On 4/10/26 1:12 AM, Mauro Carvalho Chehab wrote:
> >> On Thu, 9 Apr 2026 17:18:39 -0700
> >> Randy Dunlap <rdunlap@infradead.org> wrote:
> >>
> >>> Hi,
> >>>
> >>> Is there supposed to be a difference (or distinction) in the contents of
> >>>
> >>> Documentation/process/maintainer-handbooks.rst
> >>> and
> >>> Documentation/maintainer/maintainer-entry-profile.rst
> >>> ?
> >>>
> >>> Can they be combined into one location?
> >>
> >> Heh, from the 5 entries at maintainer-handbooks.rst:
> >>
> >> maintainer-netdev
> >> maintainer-soc
> >> maintainer-soc-clean-dts
> >> maintainer-tip
> >> maintainer-kvm-x86
> >>
> >> we have 3 of them already there at maintainer-entry-profile.rst:
> >>
> >> $ grep process/ Documentation/maintainer/maintainer-entry-profile.rst
> >> ../process/maintainer-soc
> >> ../process/maintainer-soc-clean-dts
> >> ../process/maintainer-netdev
> >>
> >> It sounds to me that moving maintainer-tip and maintainer-kvm-x86
> >> to maintainer-entry-profile.rst would be enough to drop
> >> maintainer-handbooks.rst, keeping them consolidated on a single
> >> place.
> >
> > Yes, maybe. How about in the other direction:
> > move them all to maintainer-handbooks.rst?
> >
> > After all, maintainer-entry-profile.rst says:
> > For now, existing maintainer profiles are listed here; we will likely want
> > to do something different in the near future.
(added Don and Thomas to the thread)
I don't have strong preferences, but the maintainer-entry-profile.rst
contains a "default" maintainership model, so, whatever file name,
I would preserve at least most of its contents somewhere.
Probably a more important discussions is where they should would
sit:
- at Documentation/process;
- at Documentation/maintainer;
Another option would be to move the contents from/two those two
books.
> >
> > Also, does anyone know why some of these profiles are numbered and some
> > are not? See
> > https://docs.kernel.org/maintainer/maintainer-entry-profile.html#existing-profiles
> > for odd numbering.
>
> Because they are numbered in their own respective documentation areas...
>
Yes: they're actually links to other places:
.. toctree::
:maxdepth: 1
../doc-guide/maintainer-profile
../nvdimm/maintainer-entry-profile
../arch/riscv/patch-acceptance
../process/maintainer-soc
../process/maintainer-soc-clean-dts
../driver-api/media/maintainer-entry-profile
../process/maintainer-netdev
../driver-api/vfio-pci-device-specific-driver-acceptance
../nvme/feature-and-quirk-policy
../filesystems/nfs/nfsd-maintainer-entry-profile
../filesystems/xfs/xfs-maintainer-entry-profile
../mm/damon/maintainer-profile
On most cases, the profile is located together with other
subsystem-specific docs, as it makes easier to maintain there,
together with other documents from a given subsystem.
It also saves the need to add extra entries at MAINTAINERS
file.
Thanks,
Mauro
^ permalink raw reply
* Re: [PATCH v5 00/21] Virtual Swap Space
From: Nhat Pham @ 2026-04-12 1:40 UTC (permalink / raw)
To: YoungJun Park
Cc: kasong, Liam.Howlett, akpm, apopple, axelrasmussen, baohua,
baolin.wang, bhe, byungchul, cgroups, chengming.zhou, chrisl,
corbet, david, dev.jain, gourry, hannes, hughd, jannh,
joshua.hahnjy, lance.yang, lenb, linux-doc, linux-kernel,
linux-mm, linux-pm, lorenzo.stoakes, matthew.brost, mhocko,
muchun.song, npache, pavel, peterx, peterz, pfalcato, rafael,
rakie.kim, roman.gushchin, rppt, ryan.roberts, shakeel.butt,
shikemeng, surenb, tglx, vbabka, weixugc, ying.huang, yosry.ahmed,
yuanchu, zhengqi.arch, ziy, kernel-team, riel
In-Reply-To: <acQrQYHJgqof0yx4@yjaykim-PowerEdge-T330>
n Wed, Mar 25, 2026 at 11:36 AM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Fri, Mar 20, 2026 at 12:27:14PM -0700, Nhat Pham wrote:
> >
> > This patch series is based on 6.19. There are a couple more
> > swap-related changes in mainline that I would need to coordinate
> > with, but I still want to send this out as an update for the
> > regressions reported by Kairui Song in [15]. It's probably easier
> > to just build this thing rather than dig through that series of
> > emails to get the fix patch :)
>
> Hi Nhat,
>
> I wanted to fully understand the patches before asking questions,
> but reviewing everything takes time, and I didn't want to miss the
> timing. So let me share some thoughts and ask about your direction.
>
> These are the perspectives I'm coming from:
>
> Pros:
> - The architecture is very clean.
> - Zero entries currently consume swap space, which can prevent
> actual swap usage in some cases.
Yeah not just zero entries. Compressed entries consuming a static
space also makes no sense to me.
> - It resolves zswap's dependency on swap device size.
> - And so on.
>
> Cons:
> - An additional virtual allocation step is introduced per every swap.
> - not easy to merge (change swap infrastructure totally?)
>
> To address the cons, I think if we can demonstrate that the
> benefits always outweigh the costs, it could fully replace the
> existing mechanism. However, if this can be applied selectively,
> we get only the pros without the cons.
>
> 1. Modularization
>
> You removed CONFIG_* and went with a unified approach. I recall
> you were also considering a module-based structure at some point.
> What are your thoughts on that direction?
>
The CONFIG-based approach was a huge mess. It makes me not want to
look at the code, and I'm the author :)
> If we take that approach, we could extend the recent swap ops
> patchset (https://lore.kernel.org/linux-mm/20260302104016.163542-1-bhe@redhat.com/)
> as follows:
> - Make vswap a swap module
> - Have cluster allocation functions reside in swapops
> - Enable vswap through swapon
Hmmmmm.
>
> I think this could result in a similar structure. An additional
> benefit would be that it enables various configurations:
>
> - vswap + regular swap together
> - vswap only
> - And other combinations
>
> And merge is not that hard. it is not the total change of swap infra structure.
>
> But, swapoff fastness might disappear? it is not that critical as I think.
Yeah that's not critical. It's a cool beans optimization but nobody
does swapoff and expect fast ;)
(It is a lot cleaner tho but again not my first priority).
>
> 2. Flash-friendly swap integration (for my use case)
>
> I've been thinking about the flash-friendly swap concept that
> I mentioned before and recently proposed:
> (https://lore.kernel.org/linux-mm/aZW0voL4MmnMQlaR@yjaykim-PowerEdge-T330/)
>
> One of its core functions requires buffering RAM-swapped pages
> and writing them sequentially at an appropriate time -- not
> immediately, but in proper block-sized units, sequentially.
>
> This means allocated offsets must essentially be virtual, and
> physical offsets need to be managed separately at the actual
> write time.
>
> If we integrate this into the current vswap, we would either
> need vswap itself to handle the sequential writes (bypassing
> the physical device and receiving pages directly), or swapon
> a swap device and have vswap obtain physical offsets from it.
> But since those offsets cannot be used directly (due to
> buffering and sequential write requirements), they become
> virtual too, resulting in:
>
> virtual -> virtual -> physical
>
> This triple indirection is not ideal.
>
> However, if the modularization from point 1 is achieved and
> vswap acts as a swap device itself, then we can cleanly
> establish a:
>
> virtual -> physical
I read that thread sometimes ago. Some remarks:
1. I think Christoph has a point. Seems like some of your ideas ( are
broadly applicable to swap in general. Maybe fixing swap infra
generally would make a lot of sense?
2. Why do we need to do two virtual layers here? For example, If you
want to buffer multiple swap outs and turn them into a sequential
request, you can:
a. Allocate virtual swap space for them as you wish. They don't even
need to be sequential.
b. At swap_writeout() time, don't allocate physical swap space for
them right away. Instead, accumulate them into a buffer. You can add a
new virtual swap entry type to flag it if necessary.
c. Once that buffer reaches a certain size, you can now allocate
contiguous physical swap space for them. Then flush etc. You can flush
at swap_writeout() time, or use a dedicated threads etc.
Deduplication sounds like something that should live at a lower layer
- I was thinking about it for zswap/zsmalloc back then. I mean, I
assume you don't want content sharing across different swap media? :)
Something along the line of:
1. Maintain an content index for swapped out pages.
2. For the swap media that support deduplication, you'll need to add
some sort of reference count (more overhead ew).
3. Each time we swapped out, we can content-check to see if the same
piece of conent has been swapped out before. If so, set the vswap
backend to the physical location of the data, increment some sort of
reference count (perhaps we can use swap count) of the older entry,
and have the swap type point to it.
But have you considered the implications of sharing swap data like
this? I need to read the paper you cite - seems like a potential fun
read. But what happen when these two pages that share the content
belong to two different cgroups? How does the
charging/uncharging/charge transferring story work? That's one of the
things that made me pause when I wanted to implement deduplication for
zswap/zsmalloc. Zram does not charge memory towards cgroup, but zswap
does, so we'll need to handle this somehow, and at that point all the
complexity might no longer be worth it.
>
> relationship within it.
>
> I noticed you seem to be exploring collaboration with Kairui
> as well. I'm curious whether you have a compromise direction
> in mind, or if you plan to stick with the current approach.
I do have some ideas while discussing with Kairui. I'm still figuring
that part out though.
What I'm working on right now is tracing all the inherent overhead of
swap virtualization, regardless of the method we use.
>
> P.S. I definitely want to review the vswap code in detail
> when I get the time. great work and code.
>
> Thanks,
> Youngjun Park
>
^ permalink raw reply
* Re: [PATCH v5 00/21] Virtual Swap Space
From: Nhat Pham @ 2026-04-12 1:03 UTC (permalink / raw)
To: YoungJun Park
Cc: Kairui Song, Liam.Howlett, akpm, apopple, axelrasmussen, baohua,
baolin.wang, bhe, byungchul, cgroups, chengming.zhou, chrisl,
corbet, david, dev.jain, gourry, hannes, hughd, jannh,
joshua.hahnjy, lance.yang, lenb, linux-doc, linux-kernel,
linux-mm, linux-pm, lorenzo.stoakes, matthew.brost, mhocko,
muchun.song, npache, pavel, peterx, peterz, pfalcato, rafael,
rakie.kim, roman.gushchin, rppt, ryan.roberts, shakeel.butt,
shikemeng, surenb, tglx, vbabka, weixugc, ying.huang, yosry.ahmed,
yuanchu, zhengqi.arch, ziy, kernel-team, riel
In-Reply-To: <acQvNRLpHwnHt7i+@yjaykim-PowerEdge-T330>
On Wed, Mar 25, 2026 at 11:53 AM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Mon, Mar 23, 2026 at 11:32:57AM -0400, Nhat Pham wrote:
>
> > Interesting. Normally "lots of zero-filled page" is a very beneficial
> > case for vswap. You don't need a swapfile, or any zram/zswap metadata
> > overhead - it's a native swap backend. If production workload has this
> > many zero-filled pages, I think the numbers of vswap would be much
> > less alarming - perhaps even matching memory overhead because you
> > don't need to maintain a zram entry metadata (it's at least 2 words
> > per zram entry right?), while there's no reverse map overhead induced
> > (so it's 24 bytes on both side), and no need to do zram-side locking
> > :)
> >
> > So I was surprised to see that it's not working out very well here. I
> > checked the implementation of memhog - let me know if this is wrong
> > place to look:
> >
> > https://man7.org/linux/man-pages/man8/memhog.8.html
> > https://github.com/numactl/numactl/blob/master/memhog.c#L52
> >
> > I think this is what happened here: memhog was populating the memory
> > 0xff, which triggers the full overhead of a swapfile-backed swap entry
> > because even though it's "same-filled" it's not zero-filled! I was
> > following Usama's observation - "less than 1% of the same-filled pages
> > were non-zero" - and so I only handled the zero-filled case here:
> >
> > https://lore.kernel.org/all/20240530102126.357438-1-usamaarif642@gmail.com/
> >
> > This sounds a bit artificial IMHO - as Usama pointed out above, I
> > think most samefilled pages are zero pages, in real production
> > workloads. However, if you think there are real use cases with a lot
> > of non-zero samefilled pages, please let me know I can fix this real
> > quick. We can support this in vswap with zero extra metadata overhead
> > - change the VSWAP_ZERO swap entry type to VSWAP_SAME_FILLED, then use
> > the backend field to store that value. I can send you a patch if
> > you're interested.
>
> This brings back memories -- I'm pretty sure we talked about
> exactly this at LPC. Our custom swap device already handles both
> zero-filled and same-filled pages on its own, so what we really
> wanted was a way to tell the swap layer "just skip the detection
> and let it through."
>
> I looked at two approaches back then but never submitted either:
>
> - A per-swap_info flag to opt out of zero/same-filled handling.
> But this felt wrong from vswap's perspective -- if even one
> device opts out of the zeromap, the model gets messy.
>
> - Revisiting Usama's patch 2 approach.
> Sounded good in theory, but as you said,
> it's not as simple to verify in practice. And it is more clean design
> swapout time zero check as I see. So, I gave up on it.
>
> Seeing this come up again is actually kind of nice :)
>
> One thought -- maybe a compile-time CONFIG or a boot param to
> control the scope? e.g. zero-only, same-filled, or disabled.
> That way vendors like us just turn it off, and setups like
> Kairui's can opt into broader detection. Just an idea though --
> open to other approaches if you have something in mind.
Yeah for vswap it's probably going to be a CONFIG or boot param.
But in the status quo, we can always add a swapfile flag. That one
should work already, right?
Thanks for thinking about it :) FWIW I think zero check is really
cheap, but yeah it's just wasted work.
(ZRAM folks - do you feel the overhead here?)
>
> Thanks,
> Youngjun Park
>
^ permalink raw reply
* Re: [PATCH v2] Documentation: Refactored watchdog old doc
From: Guenter Roeck @ 2026-04-12 0:09 UTC (permalink / raw)
To: Randy Dunlap, Sunny Patel, Jonathan Corbet
Cc: Wim Van Sebroeck, Shuah Khan, linux-watchdog, linux-doc,
linux-kernel
In-Reply-To: <33f18499-96d8-4b17-badf-4de957a29a20@roeck-us.net>
On 4/11/26 12:07, Guenter Roeck wrote:
> On 4/11/26 10:22, Randy Dunlap wrote:
>>
>>
>> On 4/11/26 8:09 AM, Sunny Patel wrote:
>>> Mark WDIOC_GETTEMP and WDIOS_TEMPPANIC as deprecated since
>>> neither is implemented by the watchdog core and both are only
>>> present in a small number of legacy drivers.
>>>
>>> Add documentation for previously undocumented status bits
>>> WDIOF_MAGICCLOSE and WDIOF_ALARMONLY in the options field.
>>>
>>> Add documentation for WDIOF_PRETIMEOUT and WDIOF_SETTIMEOUT
>>> status bits describing their respective ioctls.
>>>
>>> Fix the following issues in existing documentation:
>>> - Remove version-specific reference to Linux 2.4.18 from
>>> the GETTIMEOUT ioctl description
>>> - Fix duplicate "was is" in printf format strings
>>> - Replace [FIXME] placeholder with proper descriptions for
>>> WDIOS_DISABLECARD, WDIOS_ENABLECARD and WDIOS_TEMPPANIC
>>>
>>> Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
>>> ---
>>>
>>> Changes in v2:
>>> - Fixed typos: "tiemout" -> "timeout", "characted" -> "character"
>>> - Fixed "small number if legacy" -> "of legacy"
>>> - Fixed capitalization: "New Drivers" -> "New drivers", "USE" -> "Use"
>>> - Fixed spacing: "WDIOS_DISABLECARD,this" -> "WDIOS_DISABLECARD, this"
>>> - Fixed double spaces in two places
>>> - Added missing newline at end of file
>>> - Rewrote commit message
>>
>> However, you failed to fix a malformed table warning that I reported here:
>> https://lore.kernel.org/linux-doc/9e3403a0-4ec2-4fbe-a50f-53f939c1d841@infradead.org/
>>
>
> On top of that, it should have been v3, not v2.
>
On top of that, again, it was sent as response to the previous patch.
There is also some Sashiko feedback:
https://sashiko.dev/#/patchset/20260411150922.20536-1-nueralspacetech%40gmail.com
Guenter
^ permalink raw reply
* Re: maintainer profiles
From: Randy Dunlap @ 2026-04-12 0:02 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Linux Documentation, Linux Kernel Mailing List, Jonathan Corbet,
Linux Kernel Workflows
In-Reply-To: <d8804a85-dd2b-481e-903f-c6fea5d24c97@infradead.org>
On 4/11/26 4:54 PM, Randy Dunlap wrote:
> Hi,
>
> On 4/10/26 1:12 AM, Mauro Carvalho Chehab wrote:
>> On Thu, 9 Apr 2026 17:18:39 -0700
>> Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>>> Hi,
>>>
>>> Is there supposed to be a difference (or distinction) in the contents of
>>>
>>> Documentation/process/maintainer-handbooks.rst
>>> and
>>> Documentation/maintainer/maintainer-entry-profile.rst
>>> ?
>>>
>>> Can they be combined into one location?
>>
>> Heh, from the 5 entries at maintainer-handbooks.rst:
>>
>> maintainer-netdev
>> maintainer-soc
>> maintainer-soc-clean-dts
>> maintainer-tip
>> maintainer-kvm-x86
>>
>> we have 3 of them already there at maintainer-entry-profile.rst:
>>
>> $ grep process/ Documentation/maintainer/maintainer-entry-profile.rst
>> ../process/maintainer-soc
>> ../process/maintainer-soc-clean-dts
>> ../process/maintainer-netdev
>>
>> It sounds to me that moving maintainer-tip and maintainer-kvm-x86
>> to maintainer-entry-profile.rst would be enough to drop
>> maintainer-handbooks.rst, keeping them consolidated on a single
>> place.
>
> Yes, maybe. How about in the other direction:
> move them all to maintainer-handbooks.rst?
>
> After all, maintainer-entry-profile.rst says:
> For now, existing maintainer profiles are listed here; we will likely want
> to do something different in the near future.
>
> Also, does anyone know why some of these profiles are numbered and some
> are not? See
> https://docs.kernel.org/maintainer/maintainer-entry-profile.html#existing-profiles
> for odd numbering.
Because they are numbered in their own respective documentation areas...
--
~Randy
^ permalink raw reply
* Re: maintainer profiles
From: Randy Dunlap @ 2026-04-11 23:54 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Linux Documentation, Linux Kernel Mailing List, Jonathan Corbet,
Linux Kernel Workflows
In-Reply-To: <20260410101239.04c87f26@foz.lan>
Hi,
On 4/10/26 1:12 AM, Mauro Carvalho Chehab wrote:
> On Thu, 9 Apr 2026 17:18:39 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
>
>> Hi,
>>
>> Is there supposed to be a difference (or distinction) in the contents of
>>
>> Documentation/process/maintainer-handbooks.rst
>> and
>> Documentation/maintainer/maintainer-entry-profile.rst
>> ?
>>
>> Can they be combined into one location?
>
> Heh, from the 5 entries at maintainer-handbooks.rst:
>
> maintainer-netdev
> maintainer-soc
> maintainer-soc-clean-dts
> maintainer-tip
> maintainer-kvm-x86
>
> we have 3 of them already there at maintainer-entry-profile.rst:
>
> $ grep process/ Documentation/maintainer/maintainer-entry-profile.rst
> ../process/maintainer-soc
> ../process/maintainer-soc-clean-dts
> ../process/maintainer-netdev
>
> It sounds to me that moving maintainer-tip and maintainer-kvm-x86
> to maintainer-entry-profile.rst would be enough to drop
> maintainer-handbooks.rst, keeping them consolidated on a single
> place.
Yes, maybe. How about in the other direction:
move them all to maintainer-handbooks.rst?
After all, maintainer-entry-profile.rst says:
For now, existing maintainer profiles are listed here; we will likely want
to do something different in the near future.
Also, does anyone know why some of these profiles are numbered and some
are not? See
https://docs.kernel.org/maintainer/maintainer-entry-profile.html#existing-profiles
for odd numbering.
thanks.
--
~Randy
^ permalink raw reply
* Re: [bvanassche:thread-safety 95/95] htmldocs: Documentation/mm/highmem:211: ./include/linux/highmem.h:222: WARNING: Error in declarator or parameters
From: Randy Dunlap @ 2026-04-11 23:37 UTC (permalink / raw)
To: kernel test robot, Bart Van Assche; +Cc: oe-kbuild-all, linux-doc
In-Reply-To: <202604120025.jtlnpWff-lkp@intel.com>
On 4/11/26 3:24 PM, kernel test robot wrote:
> tree: https://github.com/bvanassche/linux thread-safety
> head: 834588da5a3bc2696586cdc98024dcebec97aeed
> commit: 834588da5a3bc2696586cdc98024dcebec97aeed [95/95] treewide: Build fixes
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
> reproduce: (https://download.01.org/0day-ci/archive/20260412/202604120025.jtlnpWff-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202604120025.jtlnpWff-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> Runtime Survivability
> ===================== [docutils]
> WARNING: ./include/linux/highmem.h:235 function parameter '__maybe_unused' not described in 'clear_user_pages'
> WARNING: ./include/linux/highmem.h:235 function parameter '__maybe_unused' not described in 'clear_user_pages'
>>> Documentation/mm/highmem:211: ./include/linux/highmem.h:222: WARNING: Error in declarator or parameters
> Invalid C declaration: Expecting "," or ")" in parameters, got "_". [error at 55]
> void clear_user_pages (void *addr, unsigned long vaddr __maybe_unused, struct page *page, unsigned int npages)
> -------------------------------------------------------^
> Documentation/mm/memfd_preservation:7: ./mm/memfd_luo.c:13: ERROR: Unexpected section title.
>
Patch for allowing __maybe_unused is here:
https://lore.kernel.org/linux-doc/20260411233526.3909303-1-rdunlap@infradead.org/T/#u
--
~Randy
^ permalink raw reply
* [PATCH] docs: xforms_lists: allow __maybe_unused in func parameters
From: Randy Dunlap @ 2026-04-11 23:35 UTC (permalink / raw)
To: linux-kernel
Cc: Randy Dunlap, kernel test robot, Bart Van Assche, Jonathan Corbet,
Shuah Khan, linux-doc, Mauro Carvalho Chehab
Bart has a patch (not yet merged) that causes kernel-doc warnings:
WARNING: ./include/linux/highmem.h:235 function parameter '__maybe_unused' not described in 'clear_user_pages'
Documentation/mm/highmem:211: ./include/linux/highmem.h:222: WARNING: Error in declarator or parameters
Handle this by adding "__maybe_unused" to the list of known function
parameter modifiers.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604120025.jtlnpWff-lkp@intel.com/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bart Van Assche <bvanassche@acm.org>
---
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
tools/lib/python/kdoc/xforms_lists.py | 1 +
1 file changed, 1 insertion(+)
--- linext-2026-0410.orig/tools/lib/python/kdoc/xforms_lists.py
+++ linext-2026-0410/tools/lib/python/kdoc/xforms_lists.py
@@ -93,6 +93,7 @@ class CTransforms:
(CMatch("__weak"), ""),
(CMatch("__sched"), ""),
(CMatch("__always_unused"), ""),
+ (CMatch("__maybe_unused"), ""),
(CMatch("__printf"), ""),
(CMatch("__(?:re)?alloc_size"), ""),
(CMatch("__diagnose_as"), ""),
^ permalink raw reply
* [bvanassche:thread-safety 95/95] htmldocs: Documentation/mm/highmem:211: ./include/linux/highmem.h:222: WARNING: Error in declarator or parameters
From: kernel test robot @ 2026-04-11 22:24 UTC (permalink / raw)
To: Bart Van Assche; +Cc: oe-kbuild-all, linux-doc
tree: https://github.com/bvanassche/linux thread-safety
head: 834588da5a3bc2696586cdc98024dcebec97aeed
commit: 834588da5a3bc2696586cdc98024dcebec97aeed [95/95] treewide: Build fixes
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260412/202604120025.jtlnpWff-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604120025.jtlnpWff-lkp@intel.com/
All warnings (new ones prefixed by >>):
Runtime Survivability
===================== [docutils]
WARNING: ./include/linux/highmem.h:235 function parameter '__maybe_unused' not described in 'clear_user_pages'
WARNING: ./include/linux/highmem.h:235 function parameter '__maybe_unused' not described in 'clear_user_pages'
>> Documentation/mm/highmem:211: ./include/linux/highmem.h:222: WARNING: Error in declarator or parameters
Invalid C declaration: Expecting "," or ")" in parameters, got "_". [error at 55]
void clear_user_pages (void *addr, unsigned long vaddr __maybe_unused, struct page *page, unsigned int npages)
-------------------------------------------------------^
Documentation/mm/memfd_preservation:7: ./mm/memfd_luo.c:13: ERROR: Unexpected section title.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH v3 0/2] docs: advanced search with benchmark harness
From: Randy Dunlap @ 2026-04-11 19:52 UTC (permalink / raw)
To: Rito Rhymes; +Cc: linux-doc, linux-kernel
In-Reply-To: <DHOI5DYYBXEN.3MWP1RMB6ECES@ritovision.com>
On 4/9/26 2:04 AM, Rito Rhymes wrote:
>> I like it. I think it's useful -- the old search could give a bit too
>> much output. The search result tabs (groups) are helpful.
>
> Thanks for taking the time to test it out again and give feedback.
> I'm glad you see potential utility for it.
>
>> I mostly use 'grep' for searching Documentation/ and I expect lots
>> of other developers also do that (if they bother to look).
>
> That's definitely what I expect kernel hackers to default to.
>
> I'd like to get a clearer sense of your perspective, it may represent
> others too, and I can weigh it against my own assumptions here.
>
> So my question framed for you is:
>
> You know a particular concept you want to look up, but you do not know
> the exact file, and related words repeat a lot across the source.
>
> Could you imagine yourself going through grep results, not quickly
> finding what you need, burning mental bandwidth and then deciding:
> "let me just go on docs.kernel.org real quick, hit the advanced search,
> and see what I find"?
>
> Is that something you could actually see ever happening?
It would take a while for me to give up on grep, but eventually
I could see using Advanced Search - but not quickly.
> Maybe even, in that type of situation, eventually defaulting to that
> mode first to avoid spending time scanning through noisy grep results.
That would only come after several cases where Adv. Search proved to
be more useful than grep.
> Or is grep and staying in the terminal a comfortable enough place to
> remain even when the results are not very fruitful and the time spent
> there is not especially efficient?
Nice characterization :)
But yes, grep + terminal will remain the default for now.
> Or does that situation just not come up often enough to justify a
> separate mental workflow for it outside the grep norm?
For me it probably does not.
>> I do notice under the Pages tab that all of the pages listed say
>> "Summary unavailable." I don't know what should be there instead of
>> that message.
>
> It's supposed to be populated with an excerpt from the page related
> to the search criterion; 2-3 lines of text or so.
>
> I encountered that same issue after an incremental rebuild, doing a
> full rebuild fixed it.
>
> Could you please confirm if it works after a full rebuild?
Same result for me - Summary unavailable.
--
~Randy
^ permalink raw reply
* Re: [PATCH v2] Documentation: Refactored watchdog old doc
From: Guenter Roeck @ 2026-04-11 19:07 UTC (permalink / raw)
To: Randy Dunlap, Sunny Patel, Jonathan Corbet
Cc: Wim Van Sebroeck, Shuah Khan, linux-watchdog, linux-doc,
linux-kernel
In-Reply-To: <303dcd9e-ca40-48b7-851e-6cd283cb96ad@infradead.org>
On 4/11/26 10:22, Randy Dunlap wrote:
>
>
> On 4/11/26 8:09 AM, Sunny Patel wrote:
>> Mark WDIOC_GETTEMP and WDIOS_TEMPPANIC as deprecated since
>> neither is implemented by the watchdog core and both are only
>> present in a small number of legacy drivers.
>>
>> Add documentation for previously undocumented status bits
>> WDIOF_MAGICCLOSE and WDIOF_ALARMONLY in the options field.
>>
>> Add documentation for WDIOF_PRETIMEOUT and WDIOF_SETTIMEOUT
>> status bits describing their respective ioctls.
>>
>> Fix the following issues in existing documentation:
>> - Remove version-specific reference to Linux 2.4.18 from
>> the GETTIMEOUT ioctl description
>> - Fix duplicate "was is" in printf format strings
>> - Replace [FIXME] placeholder with proper descriptions for
>> WDIOS_DISABLECARD, WDIOS_ENABLECARD and WDIOS_TEMPPANIC
>>
>> Signed-off-by: Sunny Patel <nueralspacetech@gmail.com>
>> ---
>>
>> Changes in v2:
>> - Fixed typos: "tiemout" -> "timeout", "characted" -> "character"
>> - Fixed "small number if legacy" -> "of legacy"
>> - Fixed capitalization: "New Drivers" -> "New drivers", "USE" -> "Use"
>> - Fixed spacing: "WDIOS_DISABLECARD,this" -> "WDIOS_DISABLECARD, this"
>> - Fixed double spaces in two places
>> - Added missing newline at end of file
>> - Rewrote commit message
>
> However, you failed to fix a malformed table warning that I reported here:
> https://lore.kernel.org/linux-doc/9e3403a0-4ec2-4fbe-a50f-53f939c1d841@infradead.org/
>
On top of that, it should have been v3, not v2.
Guenter
> Documentation/watchdog/watchdog-api.rst:250: ERROR: Malformed table.
> Text in column margin in table line 2.
>
> ================ ================================
> WDIOF_ALARMONLY Not a reboot watchdog
> ================ ================================
>
>
> So I repeat, please test your patches.
>
>>
>> Documentation/watchdog/watchdog-api.rst | 59 +++++++++++++++++++++----
>> 1 file changed, 51 insertions(+), 8 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox