* [PATCH 0/2] IPVS: Add Generic Netlink configuration interface @ 2008-07-09 15:11 Julius Volz 2008-07-09 15:11 ` [PATCH 1/2] IPVS: Add genetlink interface definitions to ip_vs.h Julius Volz 2008-07-09 15:11 ` [PATCH 2/2] IPVS: Add genetlink interface implementation Julius Volz 0 siblings, 2 replies; 19+ messages in thread From: Julius Volz @ 2008-07-09 15:11 UTC (permalink / raw) To: netdev, lvs-devel; +Cc: vbusam, horms, kaber, davem These two patches add a Generic Netlink interface to IPVS while keeping the old get/setsockopt interface for userspace backwards compatibility. The motivation for this is to have a more extensible interface for future changes, such as the planned IPv6 support. ipvsadm is currently being extended to support the new interface and features. The ip_vs.h header change depends on this patch I sent yesterday: "IPVS: Move userspace definitions to include/linux/ip_vs.h" Below is an overview over the attribute types and message formats used in this interface. The Netlink interface follows the old interface closely, so the commands and the received / returned data are almost the same: ====================================== | IPVS NETLINK ATTRIBUTE TYPES | | (enums grouped by empty lines) | ====================================== IPVS_ENTRY_ATTR_SERVICE - NLA_NESTED IPVS_ENTRY_ATTR_DEST - NLA_NESTED IPVS_ENTRY_ATTR_DAEMON - NLA_NESTED IPVS_SVC_ATTR_AF - NLA_U16 IPVS_SVC_ATTR_PROTOCOL - NLA_U16 IPVS_SVC_ATTR_ADDR - NLA_BINARY IPVS_SVC_ATTR_PORT - NLA_U16 IPVS_SVC_ATTR_FWMARK - NLA_U32 IPVS_SVC_ATTR_SCHED_NAME - NLA_NUL_STRING IPVS_SVC_ATTR_FLAGS - NLA_BINARY IPVS_SVC_ATTR_TIMEOUT - NLA_U32 IPVS_SVC_ATTR_NETMASK - NLA_U32 IPVS_SVC_ATTR_STATS - NLA_NESTED IPVS_DEST_ATTR_ADDR - NLA_BINARY IPVS_DEST_ATTR_PORT - NLA_U16 IPVS_DEST_ATTR_FWD_METHOD - NLA_U32 IPVS_DEST_ATTR_WEIGHT - NLA_U32 IPVS_DEST_ATTR_U_THRESH - NLA_U32 IPVS_DEST_ATTR_L_THRESH - NLA_U32 IPVS_DEST_ATTR_ACTIVE_CONNS - NLA_U32 IPVS_DEST_ATTR_INACT_CONNS - NLA_U32 IPVS_DEST_ATTR_PERSIST_CONNS - NLA_U32 IPVS_DEST_ATTR_STATS - NLA_NESTED IPVS_STATS_ATTR_CONNS - NLA_U32 IPVS_STATS_ATTR_INPKTS - NLA_U32 IPVS_STATS_ATTR_OUTPKTS - NLA_U32 IPVS_STATS_ATTR_INBYTES - NLA_U64 IPVS_STATS_ATTR_OUTBYTES - NLA_U64 IPVS_STATS_ATTR_CPS - NLA_U32 IPVS_STATS_ATTR_INPPS - NLA_U32 IPVS_STATS_ATTR_OUTPPS - NLA_U32 IPVS_STATS_ATTR_INBPS - NLA_U32 IPVS_STATS_ATTR_OUTBPS - NLA_U32 IPVS_TIMEOUT_ATTR_TCP - NLA_U32 IPVS_TIMEOUT_ATTR_TCP_FIN - NLA_U32 IPVS_TIMEOUT_ATTR_UDP - NLA_U32 IPVS_DAEMON_ATTR_STATE - NLA_U32 IPVS_DAEMON_ATTR_MCAST_IFN - NLA_NUL_STRING IPVS_DAEMON_ATTR_SYNC_ID - NLA_U32 IPVS_INFO_ATTR_VERSION - NLA_U32 IPVS_INFO_ATTR_CONN_TAB_SIZE - NLA_U32 ========================== | COMMAND MESSAGES | ========================== IPVS_CMD_ADD_SERVICE IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_SVC_ATTR_SCHED_NAME IPVS_SVC_ATTR_FLAGS IPVS_SVC_ATTR_TIMEOUT IPVS_SVC_ATTR_NETMASK IPVS_CMD_EDIT_SERVICE IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_SVC_ATTR_SCHED_NAME IPVS_SVC_ATTR_FLAGS IPVS_SVC_ATTR_TIMEOUT IPVS_SVC_ATTR_NETMASK IPVS_CMD_DEL_SERVICE IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_CMD_ADD_DEST IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_ENTRY_ATTR_DEST IPVS_DEST_ATTR_ADDR IPVS_DEST_ATTR_PORT IPVS_DEST_ATTR_FWD_METHOD IPVS_DEST_ATTR_WEIGHT IPVS_DEST_ATTR_U_THRESH IPVS_DEST_ATTR_L_THRESH IPVS_CMD_EDIT_DEST IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_ENTRY_ATTR_DEST IPVS_DEST_ATTR_ADDR IPVS_DEST_ATTR_PORT IPVS_DEST_ATTR_FWD_METHOD IPVS_DEST_ATTR_WEIGHT IPVS_DEST_ATTR_U_THRESH IPVS_DEST_ATTR_L_THRESH IPVS_CMD_DEL_DEST IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_ENTRY_ATTR_DEST IPVS_DEST_ATTR_ADDR IPVS_DEST_ATTR_PORT IPVS_CMD_FLUSH (no arguments) IPVS_CMD_SET_TIMEOUT IPVS_TIMEOUT_ATTR_TCP IPVS_TIMEOUT_ATTR_TCP_FIN IPVS_TIMEOUT_ATTR_UDP IPVS_CMD_START_DAEMON IPVS_ENTRY_ATTR_DAEMON IPVS_DAEMON_ATTR_STATE IPVS_DAEMON_ATTR_MCAST_IFN IPVS_DAEMON_ATTR_SYNC_ID IPVS_CMD_STOP_DAEMON IPVS_ENTRY_ATTR_DAEMON IPVS_DAEMON_ATTR_STATE IPVS_CMD_ZERO IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_CMD_GET_INFO (no arguments) IPVS_CMD_GET_SERVICES (no arguments) IPVS_CMD_GET_SERVICE IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_CMD_GET_DESTS IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_CMD_GET_TIMEOUT (no arguments) IPVS_CMD_GET_DAEMON (no arguments) ========================= | COMMAND REPLIES | ========================= IPVS_CMD_ADD_SERVICE (only return code) IPVS_CMD_DEL_SERVICE (only return code) IPVS_CMD_ADD_DEST (only return code) IPVS_CMD_DEL_DEST (only return code) IPVS_CMD_FLUSH (only return code) IPVS_CMD_SET_TIMEOUT (only return code) IPVS_CMD_START_DAEMON (only return code) IPVS_CMD_STOP_DAEMON (only return code) IPVS_CMD_ZERO (only return code) IPVS_CMD_GET_INFO IPVS_INFO_ATTR_VERSION IPVS_INFO_ATTR_CONNTAB_SIZE IPVS_CMD_GET_SERVICES IPVS_ENTRY_ATTR_SERVICE (one entry per multipart message) IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_SVC_ATTR_SCHED_NAME IPVS_SVC_ATTR_FLAGS IPVS_SVC_ATTR_TIMEOUT IPVS_SVC_ATTR_NETMASK IPVS_SVC_ATTR_STATS IPVS_CMD_GET_SERVICE IPVS_ENTRY_ATTR_SERVICE IPVS_SVC_ATTR_AF (IPVS_SVC_ATTR_PROTOCOL IPVS_SVC_ATTR_ADDR IPVS_SVC_ATTR_PORT) or IPVS_SVC_ATTR_FWMARK IPVS_SVC_ATTR_SCHED_NAME IPVS_SVC_ATTR_FLAGS IPVS_SVC_ATTR_TIMEOUT IPVS_SVC_ATTR_NETMASK IPVS_SVC_ATTR_STATS IPVS_CMD_GET_DESTS IPVS_ENTRY_ATTR_DEST (one entry per multipart message) IPVS_DEST_ATTR_ADDR IPVS_DEST_ATTR_PORT IPVS_DEST_ATTR_FWD_METHOD IPVS_DEST_ATTR_WEIGHT IPVS_DEST_ATTR_U_THRESH IPVS_DEST_ATTR_L_THRESH IPVS_DEST_ATTR_ACTIVE_CONNS IPVS_DEST_ATTR_INACT_CONNS IPVS_DEST_ATTR_PERSIST_CONNS IPVS_DEST_ATTR_STATS IPVS_CMD_GET_TIMEOUT IPVS_TIMEOUT_ATTR_TCP IPVS_TIMEOUT_ATTR_TCP_FIN IPVS_TIMEOUT_ATTR_UDP IPVS_CMD_GET_DAEMON IPVS_ENTRY_ATTR_DAEMON (one entry per multipart message) IPVS_DAEMON_ATTR_STATE IPVS_DAEMON_ATTR_MCAST_IFN IPVS_DAEMON_ATTR_SYNC_ID ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] IPVS: Add genetlink interface definitions to ip_vs.h 2008-07-09 15:11 [PATCH 0/2] IPVS: Add Generic Netlink configuration interface Julius Volz @ 2008-07-09 15:11 ` Julius Volz 2008-07-09 15:11 ` [PATCH 2/2] IPVS: Add genetlink interface implementation Julius Volz 1 sibling, 0 replies; 19+ messages in thread From: Julius Volz @ 2008-07-09 15:11 UTC (permalink / raw) To: netdev, lvs-devel; +Cc: vbusam, horms, kaber, davem, Julius Volz Add IPVS Generic Netlink interface definitions to include/linux/ip_vs.h. This depends on this patch I sent yesterday: "IPVS: Move userspace definitions to include/linux/ip_vs.h" Signed-off-by: Julius Volz <juliusv@google.com> 1 files changed, 168 insertions(+), 0 deletions(-) diff --git a/include/linux/ip_vs.h b/include/linux/ip_vs.h index 2d4eb68..83ed646 100644 --- a/include/linux/ip_vs.h +++ b/include/linux/ip_vs.h @@ -245,4 +245,172 @@ struct ip_vs_daemon_user { int syncid; }; +/* + * + * IPVS Generic Netlink interface definitions + * + */ + +/* Generic Netlink family info */ + +#define IPVS_GENL_NAME "IPVS" +#define IPVS_GENL_VERSION 0x1 + +struct ip_vs_flags { + __be32 flags; + __be32 mask; +}; + +/* Generic Netlink command attributes */ +enum { + IPVS_CMD_UNSPEC = 0, + + /* SET commands */ + IPVS_CMD_ADD_SERVICE, /* add or modify service */ + IPVS_CMD_EDIT_SERVICE, /* add or modify service */ + IPVS_CMD_DEL_SERVICE, /* delete service */ + IPVS_CMD_ADD_DEST, /* add or modify destination */ + IPVS_CMD_EDIT_DEST, /* add or modify destination */ + IPVS_CMD_DEL_DEST, /* delete destination */ + IPVS_CMD_FLUSH, /* flush all services and dests */ + IPVS_CMD_SET_TIMEOUT, /* set TCP and UDP timeouts */ + IPVS_CMD_START_DAEMON, /* start sync daemon */ + IPVS_CMD_STOP_DAEMON, /* stop sync daemon */ + IPVS_CMD_ZERO, /* zero all counters and stats */ + + /* GET commands */ + IPVS_CMD_GET_INFO, /* get general IPVS info */ + IPVS_CMD_GET_SERVICES, /* get list of all services */ + IPVS_CMD_GET_SERVICE, /* get info about specific service */ + IPVS_CMD_GET_DESTS, /* get list of all service dests */ + IPVS_CMD_GET_TIMEOUT, /* get TCP and UDP timeouts */ + IPVS_CMD_GET_DAEMONS, /* get sync daemon status */ + __IPVS_CMD_MAX, +}; + +#define IPVS_CMD_MAX (__IPVS_CMD_MAX - 1) + +/* + * Attributes used in the first level of commands that maintain multiple entries + * of the same element type (services, destinations, sync daemons) + */ +enum { + IPVS_ENTRY_ATTR_UNSPEC = 0, + IPVS_ENTRY_ATTR_SERVICE, /* nested service attribute */ + IPVS_ENTRY_ATTR_DEST, /* nested destination attribute */ + IPVS_ENTRY_ATTR_DAEMON, /* nested sync daemon attribute */ + __IPVS_ENTRY_ATTR_MAX, +}; + +#define IPVS_ENTRY_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1) + +/* + * Attributes used to describe a service + * + * Used inside nested attribute IPVS_ENTRY_ATTR_SERVICE + */ +enum { + IPVS_SVC_ATTR_UNSPEC = 0, + IPVS_SVC_ATTR_AF, /* address family */ + IPVS_SVC_ATTR_PROTOCOL, /* virtual service protocol */ + IPVS_SVC_ATTR_ADDR, /* virtual service address */ + IPVS_SVC_ATTR_PORT, /* virtual service port */ + IPVS_SVC_ATTR_FWMARK, /* firewall mark of service */ + + IPVS_SVC_ATTR_SCHED_NAME, /* name of scheduler */ + IPVS_SVC_ATTR_FLAGS, /* virtual service flags */ + IPVS_SVC_ATTR_TIMEOUT, /* persistent timeout */ + IPVS_SVC_ATTR_NETMASK, /* persistent netmask */ + + IPVS_SVC_ATTR_STATS, /* nested attribute for service stats */ + __IPVS_SVC_ATTR_MAX, +}; + +#define IPVS_SVC_ATTR_MAX (__IPVS_SVC_ATTR_MAX - 1) + +/* + * Attributes used to describe a destination (real server) + * + * Used inside nested attribute IPVS_ENTRY_ATTR_DEST + */ +enum { + IPVS_DEST_ATTR_UNSPEC = 0, + IPVS_DEST_ATTR_ADDR, /* real server address */ + IPVS_DEST_ATTR_PORT, /* real server port */ + + IPVS_DEST_ATTR_FWD_METHOD, /* forwarding method */ + IPVS_DEST_ATTR_WEIGHT, /* destination weight */ + + IPVS_DEST_ATTR_U_THRESH, /* upper threshold */ + IPVS_DEST_ATTR_L_THRESH, /* lower threshold */ + + IPVS_DEST_ATTR_ACTIVE_CONNS, /* active connections */ + IPVS_DEST_ATTR_INACT_CONNS, /* inactive connections */ + IPVS_DEST_ATTR_PERSIST_CONNS, /* persistent connections */ + + IPVS_DEST_ATTR_STATS, /* nested attribute for dest stats */ + __IPVS_DEST_ATTR_MAX, +}; + +#define IPVS_DEST_ATTR_MAX (__IPVS_DEST_ATTR_MAX - 1) + +/* + * Attributes describing a sync daemon + * + * Used inside nested attribute IPVS_ENTRY_ATTR_DAEMON + */ +enum { + IPVS_DAEMON_ATTR_UNSPEC = 0, + IPVS_DAEMON_ATTR_STATE, /* sync daemon state (master/backup) */ + IPVS_DAEMON_ATTR_MCAST_IFN, /* multicast interface name */ + IPVS_DAEMON_ATTR_SYNC_ID, /* SyncID we belong to */ + __IPVS_DAEMON_ATTR_MAX, +}; + +#define IPVS_DAEMON_ATTR_MAX (__IPVS_DAEMON_ATTR_MAX - 1) + +/* + * Attributes used to describe service or destination entry statistics + * + * Used inside nested attributes IPVS_SVC_ATTR_STATS and IPVS_DEST_ATTR_STATS + */ +enum { + IPVS_STATS_ATTR_UNSPEC = 0, + IPVS_STATS_ATTR_CONNS, /* connections scheduled */ + IPVS_STATS_ATTR_INPKTS, /* incoming packets */ + IPVS_STATS_ATTR_OUTPKTS, /* outgoing packets */ + IPVS_STATS_ATTR_INBYTES, /* incoming bytes */ + IPVS_STATS_ATTR_OUTBYTES, /* outgoing bytes */ + + IPVS_STATS_ATTR_CPS, /* current connection rate */ + IPVS_STATS_ATTR_INPPS, /* current in packet rate */ + IPVS_STATS_ATTR_OUTPPS, /* current out packet rate */ + IPVS_STATS_ATTR_INBPS, /* current in byte rate */ + IPVS_STATS_ATTR_OUTBPS, /* current out byte rate */ + __IPVS_STATS_ATTR_MAX, +}; + +#define IPVS_STATS_ATTR_MAX (__IPVS_STATS_ATTR_MAX - 1) + +/* Attributes used in IPVS_CMD_SET_TIMEOUT and IPVS_CMD_GET_TIMEOUT commands */ +enum { + IPVS_TIMEOUT_ATTR_UNSPEC = 0, + IPVS_TIMEOUT_ATTR_TCP, /* TCP connection timeout */ + IPVS_TIMEOUT_ATTR_TCP_FIN, /* TCP FIN wait timeout */ + IPVS_TIMEOUT_ATTR_UDP, /* UDP timeout */ + __IPVS_TIMEOUT_ATTR_MAX, +}; + +#define IPVS_TIMEOUT_ATTR_MAX (__IPVS_TIMEOUT_ATTR_MAX - 1) + +/* Attributes used in response to IPVS_CMD_GET_INFO command */ +enum { + IPVS_INFO_ATTR_UNSPEC = 0, + IPVS_INFO_ATTR_VERSION, /* IPVS version number */ + IPVS_INFO_ATTR_CONN_TAB_SIZE, /* size of connection hash table */ + __IPVS_INFO_ATTR_MAX, +}; + +#define IPVS_INFO_ATTR_MAX (__IPVS_INFO_ATTR_MAX - 1) + #endif /* _IP_VS_H */ -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 15:11 [PATCH 0/2] IPVS: Add Generic Netlink configuration interface Julius Volz 2008-07-09 15:11 ` [PATCH 1/2] IPVS: Add genetlink interface definitions to ip_vs.h Julius Volz @ 2008-07-09 15:11 ` Julius Volz 2008-07-09 15:17 ` YOSHIFUJI Hideaki / 吉藤英明 2008-07-09 16:43 ` Patrick McHardy 1 sibling, 2 replies; 19+ messages in thread From: Julius Volz @ 2008-07-09 15:11 UTC (permalink / raw) To: netdev, lvs-devel; +Cc: vbusam, horms, kaber, davem, Julius Volz Add the implementation of the new Generic Netlink interface to IPVS and keep the old set/getsockopt interface for userspace backwards compatibility. Signed-off-by: Julius Volz <juliusv@google.com> 1 files changed, 873 insertions(+), 0 deletions(-) diff --git a/net/ipv4/ipvs/ip_vs_ctl.c b/net/ipv4/ipvs/ip_vs_ctl.c index 94c5767..e0ad6ed 100644 --- a/net/ipv4/ipvs/ip_vs_ctl.c +++ b/net/ipv4/ipvs/ip_vs_ctl.c @@ -39,6 +39,7 @@ #include <net/ip.h> #include <net/route.h> #include <net/sock.h> +#include <net/genetlink.h> #include <asm/uaccess.h> @@ -2307,6 +2308,870 @@ static struct nf_sockopt_ops ip_vs_sockopts = { .owner = THIS_MODULE, }; +/* + * Generic Netlink interface + */ + +/* IPVS genetlink family */ +static struct genl_family ip_vs_genl_family = { + .id = GENL_ID_GENERATE, + .hdrsize = 0, + .name = IPVS_GENL_NAME, + .version = IPVS_GENL_VERSION, + .maxattr = IPVS_CMD_MAX +}; + +/* + * Policy used for commands that operate on service, destination + * or daemon entries + */ +static struct nla_policy ip_vs_entries_policy[IPVS_ENTRY_ATTR_MAX + 1] +__read_mostly = { + [IPVS_ENTRY_ATTR_SERVICE] = { .type = NLA_NESTED }, + [IPVS_ENTRY_ATTR_DEST] = { .type = NLA_NESTED }, + [IPVS_ENTRY_ATTR_DAEMON] = { .type = NLA_NESTED }, +}; + +/* Policy used for IPVS_CMD_SET_TIMEOUT command attributes */ +static struct nla_policy ip_vs_timeout_policy[IPVS_TIMEOUT_ATTR_MAX + 1] +__read_mostly = { + [IPVS_TIMEOUT_ATTR_TCP] = { .type = NLA_U32 }, + [IPVS_TIMEOUT_ATTR_TCP_FIN] = { .type = NLA_U32 }, + [IPVS_TIMEOUT_ATTR_UDP] = { .type = NLA_U32 }, +}; + +/* Policy used for IPVS_CMD_SET_TIMEOUT command attributes */ +static struct nla_policy ip_vs_daemon_policy[IPVS_DAEMON_ATTR_MAX + 1] +__read_mostly = { + [IPVS_DAEMON_ATTR_STATE] = { .type = NLA_U32 }, + [IPVS_DAEMON_ATTR_MCAST_IFN] = { .type = NLA_STRING, + .len = IP_VS_IFNAME_MAXLEN }, + [IPVS_DAEMON_ATTR_SYNC_ID] = { .type = NLA_U32 }, +}; + +/* Policy used for attributes in nested attribute IPVS_ENTRY_ATTR_SERVICE */ +static struct nla_policy ip_vs_svc_policy[IPVS_SVC_ATTR_MAX + 1] +__read_mostly = { + [IPVS_SVC_ATTR_AF] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_PROTOCOL] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_ADDR] = { .type = NLA_BINARY, + .len = sizeof(union nf_inet_addr) }, + [IPVS_SVC_ATTR_PORT] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_FWMARK] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_SCHED_NAME] = { .type = NLA_STRING, + .len = IP_VS_SCHEDNAME_MAXLEN }, + [IPVS_SVC_ATTR_FLAGS] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_NETMASK] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_STATS] = { .type = NLA_NESTED }, +}; + +/* Policy used for attributes in nested attribute IPVS_ENTRY_ATTR_DEST */ +static struct nla_policy ip_vs_dest_policy[IPVS_DEST_ATTR_MAX + 1] +__read_mostly = { + [IPVS_DEST_ATTR_ADDR] = { .type = NLA_BINARY, + .len = sizeof(union nf_inet_addr) }, + [IPVS_DEST_ATTR_PORT] = { .type = NLA_U16 }, + [IPVS_DEST_ATTR_FWD_METHOD] = { .type = NLA_BINARY, + .len = sizeof(struct ip_vs_flags) }, + [IPVS_DEST_ATTR_WEIGHT] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_U_THRESH] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_L_THRESH] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_ACTIVE_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_INACT_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_PERSIST_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_STATS] = { .type = NLA_NESTED }, +}; + +static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type, + struct ip_vs_stats *stats) +{ + struct nlattr *nl_stats = nla_nest_start(skb, container_type); + if (!nl_stats) + goto nla_put_failure; + + spin_lock_bh(&stats->lock); + + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CONNS, stats->conns); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPKTS, stats->inpkts); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPKTS, stats->outpkts); + NLA_PUT_U64(skb, IPVS_STATS_ATTR_INBYTES, stats->inbytes); + NLA_PUT_U64(skb, IPVS_STATS_ATTR_OUTBYTES, stats->outbytes); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CPS, stats->cps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPPS, stats->inpps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPPS, stats->outpps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INBPS, stats->inbps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTBPS, stats->outbps); + + spin_unlock_bh(&stats->lock); + + nla_nest_end(skb, nl_stats); + + return 0; + +nla_put_failure: + spin_unlock_bh(&stats->lock); + + nla_nest_cancel(skb, nl_stats); + return -EMSGSIZE; +} + +static int ip_vs_genl_fill_service(struct sk_buff *skb, struct ip_vs_service *svc) +{ + struct nlattr *nl_service; + struct ip_vs_flags flags = { .flags = svc->flags, + .mask = 0 }; + + nl_service = nla_nest_start(skb, IPVS_ENTRY_ATTR_SERVICE); + if (!nl_service) + return -EMSGSIZE; + + NLA_PUT_U16(skb, IPVS_SVC_ATTR_AF, AF_INET); + + if (svc->fwmark) { + NLA_PUT_U32(skb, IPVS_SVC_ATTR_FWMARK, svc->fwmark); + } else { + NLA_PUT_U16(skb, IPVS_SVC_ATTR_PROTOCOL, svc->protocol); + NLA_PUT(skb, IPVS_SVC_ATTR_ADDR, sizeof(svc->addr), &svc->addr); + NLA_PUT_U16(skb, IPVS_SVC_ATTR_PORT, svc->port); + } + + NLA_PUT_STRING(skb, IPVS_SVC_ATTR_SCHED_NAME, svc->scheduler->name); + NLA_PUT(skb, IPVS_SVC_ATTR_FLAGS, sizeof(flags), &flags); + NLA_PUT_U32(skb, IPVS_SVC_ATTR_TIMEOUT, svc->timeout / HZ); + NLA_PUT_U32(skb, IPVS_SVC_ATTR_NETMASK, svc->netmask); + + if (ip_vs_genl_fill_stats(skb, IPVS_SVC_ATTR_STATS, &svc->stats)) + goto nla_put_failure; + + nla_nest_end(skb, nl_service); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_service); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_service(struct sk_buff *skb, struct ip_vs_service *svc, + struct netlink_callback *cb) +{ + void *hdr; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_SERVICES); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_service(skb, svc) < 0) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_services(struct sk_buff *skb, + struct netlink_callback *cb) +{ + int idx = 0, i; + int start = cb->args[0]; + struct ip_vs_service *svc; + + mutex_lock(&__ip_vs_mutex); + for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) { + list_for_each_entry(svc, &ip_vs_svc_table[i], s_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_service(skb, svc, cb) < 0) { + idx--; + goto nla_put_failure; + } + } + } + + for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) { + list_for_each_entry(svc, &ip_vs_svc_fwm_table[i], f_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_service(skb, svc, cb) < 0 ) { + idx--; + goto nla_put_failure; + } + } + } + +nla_put_failure: + mutex_unlock(&__ip_vs_mutex); + cb->args[0] = idx; + + return skb->len; +} + +static int ip_vs_genl_parse_service(struct ip_vs_service_user *usvc, + struct nlattr *nla, int full_entry) +{ + struct nlattr *attrs[IPVS_SVC_ATTR_MAX + 1]; + struct nlattr *nla_af, *nla_port, *nla_fwmark, *nla_protocol, *nla_addr; + + /* Parse mandatory identifying service fields first */ + if (nla == NULL || + nla_parse_nested(attrs, IPVS_SVC_ATTR_MAX, nla, ip_vs_svc_policy)) + return -EINVAL; + + nla_af = attrs[IPVS_SVC_ATTR_AF]; + nla_protocol = attrs[IPVS_SVC_ATTR_PROTOCOL]; + nla_addr = attrs[IPVS_SVC_ATTR_ADDR]; + nla_port = attrs[IPVS_SVC_ATTR_PORT]; + nla_fwmark = attrs[IPVS_SVC_ATTR_FWMARK]; + + if (!(nla_af && (nla_fwmark || (nla_port && nla_protocol && nla_addr)))) + return -EINVAL; + + /* For now, only support IPv4 */ + if (nla_get_u16(nla_af) != AF_INET) + return -EAFNOSUPPORT; + + if (nla_fwmark) { + usvc->protocol = IPPROTO_TCP; + usvc->fwmark = nla_get_u32(nla_fwmark); + } else { + usvc->protocol = nla_get_u16(nla_protocol); + nla_memcpy(&usvc->addr, nla_addr, sizeof(usvc->addr)); + usvc->port = nla_get_u16(nla_port); + usvc->fwmark = 0; + } + + /* If a full entry was requested, check for the additional fields */ + if (full_entry) { + struct nlattr *nla_sched, *nla_flags, *nla_timeout, + *nla_netmask; + struct ip_vs_flags flags; + + nla_sched = attrs[IPVS_SVC_ATTR_SCHED_NAME]; + nla_flags = attrs[IPVS_SVC_ATTR_FLAGS]; + nla_timeout = attrs[IPVS_SVC_ATTR_TIMEOUT]; + nla_netmask = attrs[IPVS_SVC_ATTR_NETMASK]; + + if (!(nla_sched && nla_flags && nla_timeout && nla_netmask)) + return -EINVAL; + + nla_memcpy(&flags, nla_flags, sizeof(flags)); + usvc->flags = (usvc->flags & ~flags.mask) | + (flags.flags & flags.mask); + strlcpy(usvc->sched_name, nla_data(nla_sched), + sizeof(usvc->sched_name)); + usvc->timeout = nla_get_u32(nla_timeout); + usvc->netmask = nla_get_u32(nla_netmask); + } + + return 0; +} + +static struct ip_vs_service *ip_vs_genl_find_service(struct nlattr *nla) +{ + struct ip_vs_service_user usvc; + int ret; + + ret = ip_vs_genl_parse_service(&usvc, nla, 0); + if (ret) + return ERR_PTR(ret); + + if (usvc.fwmark) + return __ip_vs_svc_fwm_get(usvc.fwmark); + else + return __ip_vs_service_get(usvc.protocol, usvc.addr, + usvc.port); +} + +static int ip_vs_genl_parse_dest(struct ip_vs_dest_user *udest, + struct nlattr *nla, int full_entry) +{ + struct nlattr *attrs[IPVS_SVC_ATTR_MAX + 1]; + struct nlattr *nla_addr, *nla_port; + + /* Parse mandatory identifying destination fields first */ + if (nla == NULL || + nla_parse_nested(attrs, IPVS_DEST_ATTR_MAX, nla, ip_vs_dest_policy)) + return -EINVAL; + + nla_addr = attrs[IPVS_DEST_ATTR_ADDR]; + nla_port = attrs[IPVS_DEST_ATTR_PORT]; + + if (!(nla_addr && nla_port)) + return -EINVAL; + + nla_memcpy(&udest->addr, nla_addr, sizeof(udest->addr)); + udest->port = nla_get_u16(nla_port); + + /* If a full entry was requested, check for the additional fields */ + if (full_entry) { + struct nlattr *nla_fwd, *nla_weight, *nla_u_thresh, + *nla_l_thresh; + + nla_fwd = attrs[IPVS_DEST_ATTR_FWD_METHOD]; + nla_weight = attrs[IPVS_DEST_ATTR_WEIGHT]; + nla_u_thresh = attrs[IPVS_DEST_ATTR_U_THRESH]; + nla_l_thresh = attrs[IPVS_DEST_ATTR_L_THRESH]; + + if (!(nla_fwd && nla_weight && nla_u_thresh && nla_l_thresh)) + return -EINVAL; + + udest->conn_flags = nla_get_u32(nla_fwd) & IP_VS_CONN_F_FWD_MASK; + udest->weight = nla_get_u32(nla_weight); + udest->u_threshold = nla_get_u32(nla_u_thresh); + udest->l_threshold = nla_get_u32(nla_l_thresh); + } + + return 0; +} + +static int ip_vs_genl_fill_dest(struct sk_buff *skb, struct ip_vs_dest *dest) +{ + struct nlattr *nl_dest; + + nl_dest = nla_nest_start(skb, IPVS_ENTRY_ATTR_DEST); + if (!nl_dest) + return -EMSGSIZE; + + NLA_PUT(skb, IPVS_DEST_ATTR_ADDR, sizeof(dest->addr), &dest->addr); + NLA_PUT_U16(skb, IPVS_DEST_ATTR_PORT, dest->port); + + NLA_PUT_U32(skb, IPVS_DEST_ATTR_FWD_METHOD, + atomic_read(&dest->conn_flags) & IP_VS_CONN_F_FWD_MASK); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_WEIGHT, atomic_read(&dest->weight)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_U_THRESH, dest->u_threshold); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_L_THRESH, dest->l_threshold); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_ACTIVE_CONNS, + atomic_read(&dest->activeconns)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_INACT_CONNS, + atomic_read(&dest->inactconns)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_PERSIST_CONNS, + atomic_read(&dest->persistconns)); + + if (ip_vs_genl_fill_stats(skb, IPVS_DEST_ATTR_STATS, &dest->stats)) + goto nla_put_failure; + + nla_nest_end(skb, nl_dest); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_dest); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_dest(struct sk_buff *skb, struct ip_vs_dest *dest, + struct netlink_callback *cb) +{ + void *hdr; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_DESTS); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_dest(skb, dest) < 0) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + + +static int ip_vs_genl_dump_dests(struct sk_buff *skb, + struct netlink_callback *cb) +{ + int idx = 0; + int start = cb->args[0]; + struct ip_vs_service *svc; + struct ip_vs_dest *dest; + struct nlattr *attrs[IPVS_ENTRY_ATTR_MAX + 1]; + + mutex_lock(&__ip_vs_mutex); + + /* Try to find the service for which to dump destinations */ + if (nlmsg_parse(cb->nlh, GENL_HDRLEN, attrs, + IPVS_ENTRY_ATTR_MAX, ip_vs_entries_policy)) + goto out_err; + + svc = ip_vs_genl_find_service(attrs[IPVS_ENTRY_ATTR_SERVICE]); + if (IS_ERR(svc) || svc == NULL) + goto out_err; + + /* Dump the destinations */ + list_for_each_entry(dest, &svc->destinations, n_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_dest(skb, dest, cb) < 0) { + idx--; + goto nla_put_failure; + } + } + +nla_put_failure: + cb->args[0] = idx; + ip_vs_service_put(svc); + +out_err: + mutex_unlock(&__ip_vs_mutex); + + return skb->len; +} + +static int ip_vs_genl_fill_daemon(struct sk_buff *skb, __be32 state, + const char *mcast_ifn, __be32 syncid) +{ + struct nlattr *nl_daemon; + + nl_daemon = nla_nest_start(skb, IPVS_ENTRY_ATTR_DAEMON); + if (!nl_daemon) + return -EMSGSIZE; + + NLA_PUT_U32(skb, IPVS_DAEMON_ATTR_STATE, state); + NLA_PUT_STRING(skb, IPVS_DAEMON_ATTR_MCAST_IFN, mcast_ifn); + NLA_PUT_U32(skb, IPVS_DAEMON_ATTR_SYNC_ID, syncid); + + nla_nest_end(skb, nl_daemon); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_daemon); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_daemon(struct sk_buff *skb, __be32 state, + const char *mcast_ifn, __be32 syncid, + struct netlink_callback *cb) +{ + void *hdr; + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_DAEMONS); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_daemon(skb, state, mcast_ifn, syncid)) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_daemons(struct sk_buff *skb, + struct netlink_callback *cb) +{ + mutex_lock(&__ip_vs_mutex); + if ((ip_vs_sync_state & IP_VS_STATE_MASTER) && !cb->args[0]) { + if (ip_vs_genl_dump_daemon(skb, IP_VS_STATE_MASTER, + ip_vs_master_mcast_ifn, + ip_vs_master_syncid, cb) < 0) + goto nla_put_failure; + + cb->args[0] = 1; + } + + if ((ip_vs_sync_state & IP_VS_STATE_BACKUP) && !cb->args[1]) { + if (ip_vs_genl_dump_daemon(skb, IP_VS_STATE_BACKUP, + ip_vs_backup_mcast_ifn, + ip_vs_backup_syncid, cb) < 0) + goto nla_put_failure; + + cb->args[1] = 1; + } + +nla_put_failure: + mutex_unlock(&__ip_vs_mutex); + + return skb->len; +} + +static int ip_vs_genl_set_timeout(struct nlattr **attrs) +{ + struct ip_vs_timeout_user t; + + if (attrs[IPVS_TIMEOUT_ATTR_TCP]) + t.tcp_timeout = nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_TCP]); + + if (attrs[IPVS_TIMEOUT_ATTR_TCP_FIN]) + t.tcp_fin_timeout = + nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_TCP_FIN]); + + if (attrs[IPVS_TIMEOUT_ATTR_UDP]) + t.udp_timeout = nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_UDP]); + + return ip_vs_set_timeout(&t); +} + +static int ip_vs_genl_start_daemon(struct nlattr **attrs) +{ + if (!(attrs[IPVS_DAEMON_ATTR_STATE] && + attrs[IPVS_DAEMON_ATTR_MCAST_IFN] && + attrs[IPVS_DAEMON_ATTR_SYNC_ID])) + return -EINVAL; + + return start_sync_thread(nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE]), + nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]), + nla_get_u32(attrs[IPVS_DAEMON_ATTR_SYNC_ID])); +} + +static int ip_vs_genl_stop_daemon(struct nlattr **attrs) +{ + if (!attrs[IPVS_DAEMON_ATTR_STATE]) + return -EINVAL; + + return stop_sync_thread(nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE])); +} + +static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct ip_vs_service *svc; + struct ip_vs_service_user usvc; + struct ip_vs_dest_user udest; + int ret = 0, cmd, flags; + int need_full_svc = 0, need_full_dest = 0; + + cmd = info->genlhdr->cmd; + flags = info->nlhdr->nlmsg_flags; + + /* increase the module use count */ + ip_vs_use_count_inc(); + mutex_lock(&__ip_vs_mutex); + + if (cmd == IPVS_CMD_FLUSH) { + /* Flush the virtual service */ + ret = ip_vs_flush(); + goto out; + } else if (cmd == IPVS_CMD_SET_TIMEOUT) { + /* Set timeout values for (tcp tcpfin udp) */ + ret = ip_vs_genl_set_timeout(info->attrs); + goto out; + } else if (cmd == IPVS_CMD_START_DAEMON || + cmd == IPVS_CMD_STOP_DAEMON) { + + struct nlattr *daemon_attrs[IPVS_DAEMON_ATTR_MAX + 1]; + + if (!info->attrs[IPVS_ENTRY_ATTR_DAEMON] || + nla_parse_nested(daemon_attrs, IPVS_DAEMON_ATTR_MAX, + info->attrs[IPVS_ENTRY_ATTR_DAEMON], + ip_vs_daemon_policy)) { + ret = -EINVAL; + goto out; + } + + if (cmd == IPVS_CMD_START_DAEMON) + ret = ip_vs_genl_start_daemon(daemon_attrs); + else + ret = ip_vs_genl_stop_daemon(daemon_attrs); + goto out; + } else if (cmd == IPVS_CMD_ZERO && + !info->attrs[IPVS_ENTRY_ATTR_SERVICE]) { + ret = ip_vs_zero_all(); + goto out; + } + + /* All following commands require a service argument, so check if we + * received a valid one. We need a full service specification when + * adding / editing a service. Only identifying members otherwise. */ + if (cmd == IPVS_CMD_ADD_SERVICE || cmd == IPVS_CMD_EDIT_SERVICE) + need_full_svc = 1; + + ret = ip_vs_genl_parse_service(&usvc, + info->attrs[IPVS_ENTRY_ATTR_SERVICE], + need_full_svc); + if (ret) + goto out; + + /* Lookup the exact service by <protocol, addr, port> or fwmark */ + if (usvc.fwmark == 0) + svc = __ip_vs_service_get(usvc.protocol, usvc.addr, usvc.port); + else + svc = __ip_vs_svc_fwm_get(usvc.fwmark); + + /* Unless we're adding a new service, the service must already exist */ + if ((cmd != IPVS_CMD_ADD_SERVICE) && (svc == NULL)) { + ret = -ESRCH; + goto out; + } + + /* Destination commands require a valid destination argument. For + * adding / editing a destination, we need a full destination + * specification. */ + if (cmd == IPVS_CMD_ADD_DEST || cmd == IPVS_CMD_EDIT_DEST || + cmd == IPVS_CMD_DEL_DEST) { + if (cmd != IPVS_CMD_DEL_DEST) + need_full_dest = 1; + + ret = ip_vs_genl_parse_dest(&udest, + info->attrs[IPVS_ENTRY_ATTR_DEST], + need_full_dest); + if (ret) + goto out; + } + + switch (cmd) { + case IPVS_CMD_ADD_SERVICE: + if (svc == NULL) + ret = ip_vs_add_service(&usvc, &svc); + else + ret = -EEXIST; + break; + case IPVS_CMD_EDIT_SERVICE: + ret = ip_vs_edit_service(svc, &usvc); + break; + case IPVS_CMD_DEL_SERVICE: + ret = ip_vs_del_service(svc); + break; + case IPVS_CMD_ZERO: + ret = ip_vs_zero_service(svc); + break; + case IPVS_CMD_ADD_DEST: + ret = ip_vs_add_dest(svc, &udest); + break; + case IPVS_CMD_EDIT_DEST: + ret = ip_vs_edit_dest(svc, &udest); + break; + case IPVS_CMD_DEL_DEST: + ret = ip_vs_del_dest(svc, &udest); + break; + default: + ret = -EINVAL; + } + +out: + if (svc) + ip_vs_service_put(svc); + mutex_unlock(&__ip_vs_mutex); + /* decrease the module use count */ + ip_vs_use_count_dec(); + + return ret; +} + +static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct sk_buff *msg; + void *reply; + int ret, cmd; + + mutex_lock(&__ip_vs_mutex); + + cmd = info->genlhdr->cmd; + + msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto out_err; + } + + reply = genlmsg_put_reply(msg, info, &ip_vs_genl_family, 0, cmd); + if (reply == NULL) + goto nla_put_failure; + + switch (cmd) { + case IPVS_CMD_GET_INFO: + NLA_PUT_U32(msg, IPVS_INFO_ATTR_VERSION, IP_VS_VERSION_CODE); + NLA_PUT_U32(msg, IPVS_INFO_ATTR_CONN_TAB_SIZE, IP_VS_CONN_TAB_SIZE); + break; + + case IPVS_CMD_GET_SERVICE: + { + struct ip_vs_service *svc; + + svc = ip_vs_genl_find_service(info->attrs[IPVS_ENTRY_ATTR_SERVICE]); + if (IS_ERR(svc)) { + ret = PTR_ERR(svc); + goto out_err; + } else if (svc) { + ret = ip_vs_genl_fill_service(msg, svc); + ip_vs_service_put(svc); + if (ret) + goto nla_put_failure; + } else { + ret = -ESRCH; + goto out_err; + } + + break; + } + + case IPVS_CMD_GET_TIMEOUT: + { + struct ip_vs_timeout_user t; + + __ip_vs_get_timeouts(&t); + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_TCP, t.tcp_timeout); + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_TCP_FIN, t.tcp_fin_timeout); + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_UDP, t.udp_timeout); + + break; + } + + default: + IP_VS_ERR("unknown Generic Netlink command\n"); + ret = -EINVAL; + goto out; + } + + genlmsg_end(msg, reply); + ret = genlmsg_unicast(msg, info->snd_pid); + goto out; + +nla_put_failure: + IP_VS_ERR("not enough space in Netlink message\n"); + ret = -EMSGSIZE; + +out_err: + if (msg) + nlmsg_free(msg); +out: + mutex_unlock(&__ip_vs_mutex); + + return ret; +} + + +static struct genl_ops ip_vs_genl_ops[] __read_mostly = { + /* SET commands */ + { + .cmd = IPVS_CMD_ADD_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_EDIT_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_DEL_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_ADD_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_EDIT_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_DEL_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_FLUSH, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_SET_TIMEOUT, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_timeout_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_START_DAEMON, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_daemon_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_STOP_DAEMON, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_daemon_policy, + .doit = ip_vs_genl_set_cmd + }, + { + .cmd = IPVS_CMD_ZERO, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd + }, + + /* GET commands */ + { + .cmd = IPVS_CMD_GET_INFO, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_get_cmd + }, + { + .cmd = IPVS_CMD_GET_SERVICES, + .flags = GENL_ADMIN_PERM, + .dumpit = ip_vs_genl_dump_services + }, + { + .cmd = IPVS_CMD_GET_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_get_cmd + }, + { + .cmd = IPVS_CMD_GET_DESTS, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .dumpit = ip_vs_genl_dump_dests + }, + { + .cmd = IPVS_CMD_GET_TIMEOUT, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_get_cmd + }, + { + .cmd = IPVS_CMD_GET_DAEMONS, + .flags = GENL_ADMIN_PERM, + .dumpit = ip_vs_genl_dump_daemons + }, +}; + +int ip_vs_genl_register(void) +{ + int ret, i; + + ret = genl_register_family(&ip_vs_genl_family); + if (ret) + return ret; + + for (i = 0; i < ARRAY_SIZE(ip_vs_genl_ops); i++) { + ret = genl_register_ops(&ip_vs_genl_family, &ip_vs_genl_ops[i]); + if (ret) + goto err_out; + } + return 0; + +err_out: + genl_unregister_family(&ip_vs_genl_family); + return ret; +} + +void ip_vs_genl_unregister(void) +{ + genl_unregister_family(&ip_vs_genl_family); +} + +/* End of Generic Netlink interface definitions */ + int ip_vs_control_init(void) { @@ -2321,6 +3186,13 @@ int ip_vs_control_init(void) return ret; } + ret = ip_vs_genl_register(); + if (ret) { + IP_VS_ERR("cannot register Generic Netlink interface.\n"); + nf_unregister_sockopt(&ip_vs_sockopts); + return ret; + } + proc_net_fops_create(&init_net, "ip_vs", 0, &ip_vs_info_fops); proc_net_fops_create(&init_net, "ip_vs_stats",0, &ip_vs_stats_fops); @@ -2357,6 +3229,7 @@ void ip_vs_control_cleanup(void) unregister_sysctl_table(sysctl_header); proc_net_remove(&init_net, "ip_vs_stats"); proc_net_remove(&init_net, "ip_vs"); + ip_vs_genl_unregister(); nf_unregister_sockopt(&ip_vs_sockopts); LeaveFunction(2); } -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 15:11 ` [PATCH 2/2] IPVS: Add genetlink interface implementation Julius Volz @ 2008-07-09 15:17 ` YOSHIFUJI Hideaki / 吉藤英明 2008-07-09 15:24 ` Julius Volz 2008-07-09 16:43 ` Patrick McHardy 1 sibling, 1 reply; 19+ messages in thread From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-07-09 15:17 UTC (permalink / raw) To: juliusv; +Cc: netdev, lvs-devel, vbusam, horms, kaber, davem, yoshfuji In article <1215616317-11386-3-git-send-email-juliusv@google.com> (at Wed, 9 Jul 2008 17:11:57 +0200), Julius Volz <juliusv@google.com> says: : > +static struct genl_ops ip_vs_genl_ops[] __read_mostly = { > + /* SET commands */ > + { > + .cmd = IPVS_CMD_ADD_SERVICE, > + .flags = GENL_ADMIN_PERM, > + .policy = ip_vs_entries_policy, > + .doit = ip_vs_genl_set_cmd please add ",". --yoshfuji ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 15:17 ` YOSHIFUJI Hideaki / 吉藤英明 @ 2008-07-09 15:24 ` Julius Volz 0 siblings, 0 replies; 19+ messages in thread From: Julius Volz @ 2008-07-09 15:24 UTC (permalink / raw) To: YOSHIFUJI Hideaki / 吉藤英明 Cc: netdev, lvs-devel, vbusam, horms, kaber, davem On Wed, Jul 9, 2008 at 5:17 PM, YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> wrote: > In article <1215616317-11386-3-git-send-email-juliusv@google.com> (at Wed, 9 Jul 2008 17:11:57 +0200), Julius Volz <juliusv@google.com> says: > > : >> +static struct genl_ops ip_vs_genl_ops[] __read_mostly = { >> + /* SET commands */ >> + { >> + .cmd = IPVS_CMD_ADD_SERVICE, >> + .flags = GENL_ADMIN_PERM, >> + .policy = ip_vs_entries_policy, >> + .doit = ip_vs_genl_set_cmd > > please add ",". Thanks, did that! Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 15:11 ` [PATCH 2/2] IPVS: Add genetlink interface implementation Julius Volz 2008-07-09 15:17 ` YOSHIFUJI Hideaki / 吉藤英明 @ 2008-07-09 16:43 ` Patrick McHardy 2008-07-09 18:16 ` Julius Volz 2008-07-10 11:20 ` Julius Volz 1 sibling, 2 replies; 19+ messages in thread From: Patrick McHardy @ 2008-07-09 16:43 UTC (permalink / raw) To: Julius Volz; +Cc: netdev, lvs-devel, vbusam, horms, davem Julius Volz wrote: > Add the implementation of the new Generic Netlink interface to IPVS and keep > the old set/getsockopt interface for userspace backwards compatibility. > > Signed-off-by: Julius Volz <juliusv@google.com> Just a few quick comments, will try to do a full review later: > + * Policy used for commands that operate on service, destination > + * or daemon entries > + */ > +static struct nla_policy ip_vs_entries_policy[IPVS_ENTRY_ATTR_MAX + 1] > +__read_mostly = { These can all be const (and have the __read_mostly annotation removed). > +static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type, > + struct ip_vs_stats *stats) > +{ > + struct nlattr *nl_stats = nla_nest_start(skb, container_type); > + if (!nl_stats) > + goto nla_put_failure; Unbalanced locking. > + > + spin_lock_bh(&stats->lock); > + > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CONNS, stats->conns); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPKTS, stats->inpkts); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPKTS, stats->outpkts); > + NLA_PUT_U64(skb, IPVS_STATS_ATTR_INBYTES, stats->inbytes); > + NLA_PUT_U64(skb, IPVS_STATS_ATTR_OUTBYTES, stats->outbytes); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CPS, stats->cps); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPPS, stats->inpps); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPPS, stats->outpps); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INBPS, stats->inbps); > + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTBPS, stats->outbps); > + > + spin_unlock_bh(&stats->lock); > + > + nla_nest_end(skb, nl_stats); > + > + return 0; > + > +nla_put_failure: > + spin_unlock_bh(&stats->lock); > + > + nla_nest_cancel(skb, nl_stats); > + return -EMSGSIZE; > +} > + > +static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info) > +{ > + struct ip_vs_service *svc; > + struct ip_vs_service_user usvc; > + struct ip_vs_dest_user udest; > + int ret = 0, cmd, flags; > + int need_full_svc = 0, need_full_dest = 0; > + > + cmd = info->genlhdr->cmd; > + flags = info->nlhdr->nlmsg_flags; > + > + /* increase the module use count */ > + ip_vs_use_count_inc(); This looks fishy - the reference probably must be taken by genetlink before calling the command handler. > +int ip_vs_genl_register(void) > +{ > + int ret, i; > + > + ret = genl_register_family(&ip_vs_genl_family); > + if (ret) > + return ret; > + > + for (i = 0; i < ARRAY_SIZE(ip_vs_genl_ops); i++) { > + ret = genl_register_ops(&ip_vs_genl_family, &ip_vs_genl_ops[i]); > + if (ret) > + goto err_out; > + } > + return 0; > + > +err_out: > + genl_unregister_family(&ip_vs_genl_family); > + return ret; > +} > + > +void ip_vs_genl_unregister(void) > +{ > + genl_unregister_family(&ip_vs_genl_family); Doesn't it also has to unregister the ops? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 16:43 ` Patrick McHardy @ 2008-07-09 18:16 ` Julius Volz 2008-07-10 12:15 ` Patrick McHardy 2008-07-10 11:20 ` Julius Volz 1 sibling, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-09 18:16 UTC (permalink / raw) To: Patrick McHardy; +Cc: netdev, lvs-devel, vbusam, horms, davem On Wed, Jul 9, 2008, Patrick McHardy wrote: > Just a few quick comments, will try to do a full review later: Thanks! >> + * Policy used for commands that operate on service, destination >> + * or daemon entries >> + */ >> +static struct nla_policy ip_vs_entries_policy[IPVS_ENTRY_ATTR_MAX + 1] >> +__read_mostly = { > > These can all be const (and have the __read_mostly annotation removed). Ok! >> +static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type, >> + struct ip_vs_stats *stats) >> +{ >> + struct nlattr *nl_stats = nla_nest_start(skb, container_type); >> + if (!nl_stats) >> + goto nla_put_failure; > > Unbalanced locking. Whoa, right! Will correct that. >> +static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info >> *info) >> +{ >> + struct ip_vs_service *svc; >> + struct ip_vs_service_user usvc; >> + struct ip_vs_dest_user udest; >> + int ret = 0, cmd, flags; >> + int need_full_svc = 0, need_full_dest = 0; >> + >> + cmd = info->genlhdr->cmd; >> + flags = info->nlhdr->nlmsg_flags; >> + >> + /* increase the module use count */ >> + ip_vs_use_count_inc(); > > This looks fishy - the reference probably must be taken by > genetlink before calling the command handler. That would seem better, but is that possible? I took this from the sockopt interface. What would you generally want to do in this situation? >> +void ip_vs_genl_unregister(void) >> +{ >> + genl_unregister_family(&ip_vs_genl_family); > > Doesn't it also has to unregister the ops? No, that happens automatically when you unregister the family. Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 18:16 ` Julius Volz @ 2008-07-10 12:15 ` Patrick McHardy 2008-07-10 13:58 ` Julius Volz 0 siblings, 1 reply; 19+ messages in thread From: Patrick McHardy @ 2008-07-10 12:15 UTC (permalink / raw) To: Julius Volz; +Cc: netdev, lvs-devel, vbusam, horms, davem Julius Volz wrote: > On Wed, Jul 9, 2008, Patrick McHardy wrote: >>> +static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info >>> *info) >>> +{ >>> + struct ip_vs_service *svc; >>> + struct ip_vs_service_user usvc; >>> + struct ip_vs_dest_user udest; >>> + int ret = 0, cmd, flags; >>> + int need_full_svc = 0, need_full_dest = 0; >>> + >>> + cmd = info->genlhdr->cmd; >>> + flags = info->nlhdr->nlmsg_flags; >>> + >>> + /* increase the module use count */ >>> + ip_vs_use_count_inc(); >> This looks fishy - the reference probably must be taken by >> genetlink before calling the command handler. > > That would seem better, but is that possible? I took this from the > sockopt interface. What would you generally want to do in this > situation? There probably should be a struct module *owner in the ops registered with genetlink. This is necessary at least to make sure that modules don't disappear during dumps. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 12:15 ` Patrick McHardy @ 2008-07-10 13:58 ` Julius Volz 2008-07-10 14:43 ` Thomas Graf 0 siblings, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-10 13:58 UTC (permalink / raw) To: Patrick McHardy; +Cc: netdev, lvs-devel, vbusam, horms, davem On Thu, Jul 10, 2008 at 2:15 PM, Patrick McHardy <kaber@trash.net> wrote: > There probably should be a struct module *owner in the > ops registered with genetlink. This is necessary at > least to make sure that modules don't disappear during > dumps. There seems to be no such thing in genetlink. af_netlink.c tracks the owner of a netlink socket, but that would increase the use count of the genetlink module. First I would have suspected the genl_mutex to be held while dumping, so that at least unregistering the family at module unload would block. But the mutex is explicitly unlocked for the duration of the netlink dump: net/netlink/genetlink.c: genl_unlock(); err = netlink_dump_start(genl_sock, skb, nlh, ops->dumpit, ops->done); genl_lock(); Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 13:58 ` Julius Volz @ 2008-07-10 14:43 ` Thomas Graf 0 siblings, 0 replies; 19+ messages in thread From: Thomas Graf @ 2008-07-10 14:43 UTC (permalink / raw) To: Julius Volz; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem * Julius Volz <juliusv@google.com> 2008-07-10 15:58 > On Thu, Jul 10, 2008 at 2:15 PM, Patrick McHardy <kaber@trash.net> wrote: > > There probably should be a struct module *owner in the > > ops registered with genetlink. This is necessary at > > least to make sure that modules don't disappear during > > dumps. > > There seems to be no such thing in genetlink. af_netlink.c tracks the > owner of a netlink socket, but that would increase the use count of > the genetlink module. > > First I would have suspected the genl_mutex to be held while dumping, > so that at least unregistering the family at module unload would > block. But the mutex is explicitly unlocked for the duration of the > netlink dump: It used to be like before the locking during dumps was revised. I promised to redo the locking and module owner tracking but haven't gotten around. Patrick's suggestion certainly makes sense. > net/netlink/genetlink.c: > > genl_unlock(); > err = netlink_dump_start(genl_sock, skb, nlh, > ops->dumpit, ops->done); > genl_lock(); > > Julius ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-09 16:43 ` Patrick McHardy 2008-07-09 18:16 ` Julius Volz @ 2008-07-10 11:20 ` Julius Volz 2008-07-10 11:36 ` Thomas Graf 1 sibling, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-10 11:20 UTC (permalink / raw) To: Patrick McHardy; +Cc: netdev, lvs-devel, vbusam, horms, davem On Wed, Jul 09, 2008 at 06:43:40PM +0200, Patrick McHardy wrote: > Julius Volz wrote: > >Add the implementation of the new Generic Netlink interface to IPVS and > >keep > >the old set/getsockopt interface for userspace backwards compatibility. > > > >Signed-off-by: Julius Volz <juliusv@google.com> > > Just a few quick comments, will try to do a full review later: Ok, I made the changes mentioned here and also went through it again myself, fixing/cleaning some other small things: - change NLA_STRING to NLA_NUL_STRING in the policy definitions - prefill flags from an existing service (when editing a service), so that the selective setting of flags actually makes sense (although it isn't used now) - preset timeouts in ip_vs_genl_set_timeout(), in case userspace doesn't send all timeout fields - only send TCP/UDP timeout values to userspace if TCP/UDP is configured in - remove unused flags variable in ip_vs_genl_set_cmd() So, here is the new version of this patch: ---------------------- Add the implementation of the new Generic Netlink interface to IPVS and keep the old set/getsockopt interface for userspace backwards compatibility. Signed-off-by: Julius Volz <juliusv@google.com> 1 files changed, 888 insertions(+), 0 deletions(-) diff --git a/net/ipv4/ipvs/ip_vs_ctl.c b/net/ipv4/ipvs/ip_vs_ctl.c index 94c5767..856675d 100644 --- a/net/ipv4/ipvs/ip_vs_ctl.c +++ b/net/ipv4/ipvs/ip_vs_ctl.c @@ -39,6 +39,7 @@ #include <net/ip.h> #include <net/route.h> #include <net/sock.h> +#include <net/genetlink.h> #include <asm/uaccess.h> @@ -2307,6 +2308,885 @@ static struct nf_sockopt_ops ip_vs_sockopts = { .owner = THIS_MODULE, }; +/* + * Generic Netlink interface + */ + +/* IPVS genetlink family */ +static struct genl_family ip_vs_genl_family = { + .id = GENL_ID_GENERATE, + .hdrsize = 0, + .name = IPVS_GENL_NAME, + .version = IPVS_GENL_VERSION, + .maxattr = IPVS_CMD_MAX +}; + +/* + * Policy used for commands that operate on service, destination + * or daemon entries + */ +static const struct nla_policy ip_vs_entries_policy[IPVS_ENTRY_ATTR_MAX + 1] = { + [IPVS_ENTRY_ATTR_SERVICE] = { .type = NLA_NESTED }, + [IPVS_ENTRY_ATTR_DEST] = { .type = NLA_NESTED }, + [IPVS_ENTRY_ATTR_DAEMON] = { .type = NLA_NESTED }, +}; + +/* Policy used for IPVS_CMD_SET_TIMEOUT command attributes */ +static const struct nla_policy ip_vs_timeout_policy[IPVS_TIMEOUT_ATTR_MAX + 1] = { + [IPVS_TIMEOUT_ATTR_TCP] = { .type = NLA_U32 }, + [IPVS_TIMEOUT_ATTR_TCP_FIN] = { .type = NLA_U32 }, + [IPVS_TIMEOUT_ATTR_UDP] = { .type = NLA_U32 }, +}; + +/* Policy used for IPVS_CMD_SET_TIMEOUT command attributes */ +static const struct nla_policy ip_vs_daemon_policy[IPVS_DAEMON_ATTR_MAX + 1] = { + [IPVS_DAEMON_ATTR_STATE] = { .type = NLA_U32 }, + [IPVS_DAEMON_ATTR_MCAST_IFN] = { .type = NLA_NUL_STRING, + .len = IP_VS_IFNAME_MAXLEN }, + [IPVS_DAEMON_ATTR_SYNC_ID] = { .type = NLA_U32 }, +}; + +/* Policy used for attributes in nested attribute IPVS_ENTRY_ATTR_SERVICE */ +static const struct nla_policy ip_vs_svc_policy[IPVS_SVC_ATTR_MAX + 1] = { + [IPVS_SVC_ATTR_AF] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_PROTOCOL] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_ADDR] = { .type = NLA_BINARY, + .len = sizeof(union nf_inet_addr) }, + [IPVS_SVC_ATTR_PORT] = { .type = NLA_U16 }, + [IPVS_SVC_ATTR_FWMARK] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_SCHED_NAME] = { .type = NLA_NUL_STRING, + .len = IP_VS_SCHEDNAME_MAXLEN }, + [IPVS_SVC_ATTR_FLAGS] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_NETMASK] = { .type = NLA_U32 }, + [IPVS_SVC_ATTR_STATS] = { .type = NLA_NESTED }, +}; + +/* Policy used for attributes in nested attribute IPVS_ENTRY_ATTR_DEST */ +static const struct nla_policy ip_vs_dest_policy[IPVS_DEST_ATTR_MAX + 1] = { + [IPVS_DEST_ATTR_ADDR] = { .type = NLA_BINARY, + .len = sizeof(union nf_inet_addr) }, + [IPVS_DEST_ATTR_PORT] = { .type = NLA_U16 }, + [IPVS_DEST_ATTR_FWD_METHOD] = { .type = NLA_BINARY, + .len = sizeof(struct ip_vs_flags) }, + [IPVS_DEST_ATTR_WEIGHT] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_U_THRESH] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_L_THRESH] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_ACTIVE_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_INACT_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_PERSIST_CONNS] = { .type = NLA_U32 }, + [IPVS_DEST_ATTR_STATS] = { .type = NLA_NESTED }, +}; + +static int ip_vs_genl_fill_stats(struct sk_buff *skb, int container_type, + struct ip_vs_stats *stats) +{ + struct nlattr *nl_stats = nla_nest_start(skb, container_type); + if (!nl_stats) + return -EMSGSIZE; + + spin_lock_bh(&stats->lock); + + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CONNS, stats->conns); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPKTS, stats->inpkts); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPKTS, stats->outpkts); + NLA_PUT_U64(skb, IPVS_STATS_ATTR_INBYTES, stats->inbytes); + NLA_PUT_U64(skb, IPVS_STATS_ATTR_OUTBYTES, stats->outbytes); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_CPS, stats->cps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INPPS, stats->inpps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTPPS, stats->outpps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_INBPS, stats->inbps); + NLA_PUT_U32(skb, IPVS_STATS_ATTR_OUTBPS, stats->outbps); + + spin_unlock_bh(&stats->lock); + + nla_nest_end(skb, nl_stats); + + return 0; + +nla_put_failure: + spin_unlock_bh(&stats->lock); + nla_nest_cancel(skb, nl_stats); + return -EMSGSIZE; +} + +static int ip_vs_genl_fill_service(struct sk_buff *skb, struct ip_vs_service *svc) +{ + struct nlattr *nl_service; + struct ip_vs_flags flags = { .flags = svc->flags, + .mask = 0 }; + + nl_service = nla_nest_start(skb, IPVS_ENTRY_ATTR_SERVICE); + if (!nl_service) + return -EMSGSIZE; + + NLA_PUT_U16(skb, IPVS_SVC_ATTR_AF, AF_INET); + + if (svc->fwmark) { + NLA_PUT_U32(skb, IPVS_SVC_ATTR_FWMARK, svc->fwmark); + } else { + NLA_PUT_U16(skb, IPVS_SVC_ATTR_PROTOCOL, svc->protocol); + NLA_PUT(skb, IPVS_SVC_ATTR_ADDR, sizeof(svc->addr), &svc->addr); + NLA_PUT_U16(skb, IPVS_SVC_ATTR_PORT, svc->port); + } + + NLA_PUT_STRING(skb, IPVS_SVC_ATTR_SCHED_NAME, svc->scheduler->name); + NLA_PUT(skb, IPVS_SVC_ATTR_FLAGS, sizeof(flags), &flags); + NLA_PUT_U32(skb, IPVS_SVC_ATTR_TIMEOUT, svc->timeout / HZ); + NLA_PUT_U32(skb, IPVS_SVC_ATTR_NETMASK, svc->netmask); + + if (ip_vs_genl_fill_stats(skb, IPVS_SVC_ATTR_STATS, &svc->stats)) + goto nla_put_failure; + + nla_nest_end(skb, nl_service); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_service); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_service(struct sk_buff *skb, struct ip_vs_service *svc, + struct netlink_callback *cb) +{ + void *hdr; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_SERVICES); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_service(skb, svc) < 0) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_services(struct sk_buff *skb, + struct netlink_callback *cb) +{ + int idx = 0, i; + int start = cb->args[0]; + struct ip_vs_service *svc; + + mutex_lock(&__ip_vs_mutex); + for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) { + list_for_each_entry(svc, &ip_vs_svc_table[i], s_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_service(skb, svc, cb) < 0) { + idx--; + goto nla_put_failure; + } + } + } + + for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) { + list_for_each_entry(svc, &ip_vs_svc_fwm_table[i], f_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_service(skb, svc, cb) < 0 ) { + idx--; + goto nla_put_failure; + } + } + } + +nla_put_failure: + mutex_unlock(&__ip_vs_mutex); + cb->args[0] = idx; + + return skb->len; +} + +static int ip_vs_genl_parse_service(struct ip_vs_service_user *usvc, + struct nlattr *nla, int full_entry) +{ + struct nlattr *attrs[IPVS_SVC_ATTR_MAX + 1]; + struct nlattr *nla_af, *nla_port, *nla_fwmark, *nla_protocol, *nla_addr; + + /* Parse mandatory identifying service fields first */ + if (nla == NULL || + nla_parse_nested(attrs, IPVS_SVC_ATTR_MAX, nla, ip_vs_svc_policy)) + return -EINVAL; + + nla_af = attrs[IPVS_SVC_ATTR_AF]; + nla_protocol = attrs[IPVS_SVC_ATTR_PROTOCOL]; + nla_addr = attrs[IPVS_SVC_ATTR_ADDR]; + nla_port = attrs[IPVS_SVC_ATTR_PORT]; + nla_fwmark = attrs[IPVS_SVC_ATTR_FWMARK]; + + if (!(nla_af && (nla_fwmark || (nla_port && nla_protocol && nla_addr)))) + return -EINVAL; + + /* For now, only support IPv4 */ + if (nla_get_u16(nla_af) != AF_INET) + return -EAFNOSUPPORT; + + if (nla_fwmark) { + usvc->protocol = IPPROTO_TCP; + usvc->fwmark = nla_get_u32(nla_fwmark); + } else { + usvc->protocol = nla_get_u16(nla_protocol); + nla_memcpy(&usvc->addr, nla_addr, sizeof(usvc->addr)); + usvc->port = nla_get_u16(nla_port); + usvc->fwmark = 0; + } + + /* If a full entry was requested, check for the additional fields */ + if (full_entry) { + struct nlattr *nla_sched, *nla_flags, *nla_timeout, + *nla_netmask; + struct ip_vs_flags flags; + struct ip_vs_service *svc; + + nla_sched = attrs[IPVS_SVC_ATTR_SCHED_NAME]; + nla_flags = attrs[IPVS_SVC_ATTR_FLAGS]; + nla_timeout = attrs[IPVS_SVC_ATTR_TIMEOUT]; + nla_netmask = attrs[IPVS_SVC_ATTR_NETMASK]; + + if (!(nla_sched && nla_flags && nla_timeout && nla_netmask)) + return -EINVAL; + + nla_memcpy(&flags, nla_flags, sizeof(flags)); + + /* prefill flags from service if it already exists */ + if (usvc->fwmark) + svc = __ip_vs_svc_fwm_get(usvc->fwmark); + else + svc = __ip_vs_service_get(usvc->protocol, usvc->addr, + usvc->port); + if (svc) { + usvc->flags = svc->flags; + ip_vs_service_put(svc); + } else + usvc->flags = 0; + + /* set new flags from userland */ + usvc->flags = (usvc->flags & ~flags.mask) | + (flags.flags & flags.mask); + + strlcpy(usvc->sched_name, nla_data(nla_sched), + sizeof(usvc->sched_name)); + usvc->timeout = nla_get_u32(nla_timeout); + usvc->netmask = nla_get_u32(nla_netmask); + } + + return 0; +} + +static struct ip_vs_service *ip_vs_genl_find_service(struct nlattr *nla) +{ + struct ip_vs_service_user usvc; + int ret; + + ret = ip_vs_genl_parse_service(&usvc, nla, 0); + if (ret) + return ERR_PTR(ret); + + if (usvc.fwmark) + return __ip_vs_svc_fwm_get(usvc.fwmark); + else + return __ip_vs_service_get(usvc.protocol, usvc.addr, + usvc.port); +} + +static int ip_vs_genl_parse_dest(struct ip_vs_dest_user *udest, + struct nlattr *nla, int full_entry) +{ + struct nlattr *attrs[IPVS_SVC_ATTR_MAX + 1]; + struct nlattr *nla_addr, *nla_port; + + /* Parse mandatory identifying destination fields first */ + if (nla == NULL || + nla_parse_nested(attrs, IPVS_DEST_ATTR_MAX, nla, ip_vs_dest_policy)) + return -EINVAL; + + nla_addr = attrs[IPVS_DEST_ATTR_ADDR]; + nla_port = attrs[IPVS_DEST_ATTR_PORT]; + + if (!(nla_addr && nla_port)) + return -EINVAL; + + nla_memcpy(&udest->addr, nla_addr, sizeof(udest->addr)); + udest->port = nla_get_u16(nla_port); + + /* If a full entry was requested, check for the additional fields */ + if (full_entry) { + struct nlattr *nla_fwd, *nla_weight, *nla_u_thresh, + *nla_l_thresh; + + nla_fwd = attrs[IPVS_DEST_ATTR_FWD_METHOD]; + nla_weight = attrs[IPVS_DEST_ATTR_WEIGHT]; + nla_u_thresh = attrs[IPVS_DEST_ATTR_U_THRESH]; + nla_l_thresh = attrs[IPVS_DEST_ATTR_L_THRESH]; + + if (!(nla_fwd && nla_weight && nla_u_thresh && nla_l_thresh)) + return -EINVAL; + + udest->conn_flags = nla_get_u32(nla_fwd) & IP_VS_CONN_F_FWD_MASK; + udest->weight = nla_get_u32(nla_weight); + udest->u_threshold = nla_get_u32(nla_u_thresh); + udest->l_threshold = nla_get_u32(nla_l_thresh); + } + + return 0; +} + +static int ip_vs_genl_fill_dest(struct sk_buff *skb, struct ip_vs_dest *dest) +{ + struct nlattr *nl_dest; + + nl_dest = nla_nest_start(skb, IPVS_ENTRY_ATTR_DEST); + if (!nl_dest) + return -EMSGSIZE; + + NLA_PUT(skb, IPVS_DEST_ATTR_ADDR, sizeof(dest->addr), &dest->addr); + NLA_PUT_U16(skb, IPVS_DEST_ATTR_PORT, dest->port); + + NLA_PUT_U32(skb, IPVS_DEST_ATTR_FWD_METHOD, + atomic_read(&dest->conn_flags) & IP_VS_CONN_F_FWD_MASK); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_WEIGHT, atomic_read(&dest->weight)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_U_THRESH, dest->u_threshold); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_L_THRESH, dest->l_threshold); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_ACTIVE_CONNS, + atomic_read(&dest->activeconns)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_INACT_CONNS, + atomic_read(&dest->inactconns)); + NLA_PUT_U32(skb, IPVS_DEST_ATTR_PERSIST_CONNS, + atomic_read(&dest->persistconns)); + + if (ip_vs_genl_fill_stats(skb, IPVS_DEST_ATTR_STATS, &dest->stats)) + goto nla_put_failure; + + nla_nest_end(skb, nl_dest); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_dest); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_dest(struct sk_buff *skb, struct ip_vs_dest *dest, + struct netlink_callback *cb) +{ + void *hdr; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_DESTS); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_dest(skb, dest) < 0) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + + +static int ip_vs_genl_dump_dests(struct sk_buff *skb, + struct netlink_callback *cb) +{ + int idx = 0; + int start = cb->args[0]; + struct ip_vs_service *svc; + struct ip_vs_dest *dest; + struct nlattr *attrs[IPVS_ENTRY_ATTR_MAX + 1]; + + mutex_lock(&__ip_vs_mutex); + + /* Try to find the service for which to dump destinations */ + if (nlmsg_parse(cb->nlh, GENL_HDRLEN, attrs, + IPVS_ENTRY_ATTR_MAX, ip_vs_entries_policy)) + goto out_err; + + svc = ip_vs_genl_find_service(attrs[IPVS_ENTRY_ATTR_SERVICE]); + if (IS_ERR(svc) || svc == NULL) + goto out_err; + + /* Dump the destinations */ + list_for_each_entry(dest, &svc->destinations, n_list) { + if (++idx <= start) + continue; + if (ip_vs_genl_dump_dest(skb, dest, cb) < 0) { + idx--; + goto nla_put_failure; + } + } + +nla_put_failure: + cb->args[0] = idx; + ip_vs_service_put(svc); + +out_err: + mutex_unlock(&__ip_vs_mutex); + + return skb->len; +} + +static int ip_vs_genl_fill_daemon(struct sk_buff *skb, __be32 state, + const char *mcast_ifn, __be32 syncid) +{ + struct nlattr *nl_daemon; + + nl_daemon = nla_nest_start(skb, IPVS_ENTRY_ATTR_DAEMON); + if (!nl_daemon) + return -EMSGSIZE; + + NLA_PUT_U32(skb, IPVS_DAEMON_ATTR_STATE, state); + NLA_PUT_STRING(skb, IPVS_DAEMON_ATTR_MCAST_IFN, mcast_ifn); + NLA_PUT_U32(skb, IPVS_DAEMON_ATTR_SYNC_ID, syncid); + + nla_nest_end(skb, nl_daemon); + + return 0; + +nla_put_failure: + nla_nest_cancel(skb, nl_daemon); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_daemon(struct sk_buff *skb, __be32 state, + const char *mcast_ifn, __be32 syncid, + struct netlink_callback *cb) +{ + void *hdr; + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, + &ip_vs_genl_family, NLM_F_MULTI, + IPVS_CMD_GET_DAEMONS); + if (!hdr) + return -EMSGSIZE; + + if (ip_vs_genl_fill_daemon(skb, state, mcast_ifn, syncid)) + goto nla_put_failure; + + return genlmsg_end(skb, hdr); + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + +static int ip_vs_genl_dump_daemons(struct sk_buff *skb, + struct netlink_callback *cb) +{ + mutex_lock(&__ip_vs_mutex); + if ((ip_vs_sync_state & IP_VS_STATE_MASTER) && !cb->args[0]) { + if (ip_vs_genl_dump_daemon(skb, IP_VS_STATE_MASTER, + ip_vs_master_mcast_ifn, + ip_vs_master_syncid, cb) < 0) + goto nla_put_failure; + + cb->args[0] = 1; + } + + if ((ip_vs_sync_state & IP_VS_STATE_BACKUP) && !cb->args[1]) { + if (ip_vs_genl_dump_daemon(skb, IP_VS_STATE_BACKUP, + ip_vs_backup_mcast_ifn, + ip_vs_backup_syncid, cb) < 0) + goto nla_put_failure; + + cb->args[1] = 1; + } + +nla_put_failure: + mutex_unlock(&__ip_vs_mutex); + + return skb->len; +} + +static int ip_vs_genl_set_timeout(struct nlattr **attrs) +{ + struct ip_vs_timeout_user t; + + __ip_vs_get_timeouts(&t); + + if (attrs[IPVS_TIMEOUT_ATTR_TCP]) + t.tcp_timeout = nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_TCP]); + + if (attrs[IPVS_TIMEOUT_ATTR_TCP_FIN]) + t.tcp_fin_timeout = + nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_TCP_FIN]); + + if (attrs[IPVS_TIMEOUT_ATTR_UDP]) + t.udp_timeout = nla_get_u32(attrs[IPVS_TIMEOUT_ATTR_UDP]); + + return ip_vs_set_timeout(&t); +} + +static int ip_vs_genl_start_daemon(struct nlattr **attrs) +{ + if (!(attrs[IPVS_DAEMON_ATTR_STATE] && + attrs[IPVS_DAEMON_ATTR_MCAST_IFN] && + attrs[IPVS_DAEMON_ATTR_SYNC_ID])) + return -EINVAL; + + return start_sync_thread(nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE]), + nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]), + nla_get_u32(attrs[IPVS_DAEMON_ATTR_SYNC_ID])); +} + +static int ip_vs_genl_stop_daemon(struct nlattr **attrs) +{ + if (!attrs[IPVS_DAEMON_ATTR_STATE]) + return -EINVAL; + + return stop_sync_thread(nla_get_u32(attrs[IPVS_DAEMON_ATTR_STATE])); +} + +static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct ip_vs_service *svc; + struct ip_vs_service_user usvc; + struct ip_vs_dest_user udest; + int ret = 0, cmd; + int need_full_svc = 0, need_full_dest = 0; + + cmd = info->genlhdr->cmd; + + /* increase the module use count */ + ip_vs_use_count_inc(); + mutex_lock(&__ip_vs_mutex); + + if (cmd == IPVS_CMD_FLUSH) { + /* Flush the virtual service */ + ret = ip_vs_flush(); + goto out; + } else if (cmd == IPVS_CMD_SET_TIMEOUT) { + /* Set timeout values for (tcp tcpfin udp) */ + ret = ip_vs_genl_set_timeout(info->attrs); + goto out; + } else if (cmd == IPVS_CMD_START_DAEMON || + cmd == IPVS_CMD_STOP_DAEMON) { + + struct nlattr *daemon_attrs[IPVS_DAEMON_ATTR_MAX + 1]; + + if (!info->attrs[IPVS_ENTRY_ATTR_DAEMON] || + nla_parse_nested(daemon_attrs, IPVS_DAEMON_ATTR_MAX, + info->attrs[IPVS_ENTRY_ATTR_DAEMON], + ip_vs_daemon_policy)) { + ret = -EINVAL; + goto out; + } + + if (cmd == IPVS_CMD_START_DAEMON) + ret = ip_vs_genl_start_daemon(daemon_attrs); + else + ret = ip_vs_genl_stop_daemon(daemon_attrs); + goto out; + } else if (cmd == IPVS_CMD_ZERO && + !info->attrs[IPVS_ENTRY_ATTR_SERVICE]) { + ret = ip_vs_zero_all(); + goto out; + } + + /* All following commands require a service argument, so check if we + * received a valid one. We need a full service specification when + * adding / editing a service. Only identifying members otherwise. */ + if (cmd == IPVS_CMD_ADD_SERVICE || cmd == IPVS_CMD_EDIT_SERVICE) + need_full_svc = 1; + + ret = ip_vs_genl_parse_service(&usvc, + info->attrs[IPVS_ENTRY_ATTR_SERVICE], + need_full_svc); + if (ret) + goto out; + + /* Lookup the exact service by <protocol, addr, port> or fwmark */ + if (usvc.fwmark == 0) + svc = __ip_vs_service_get(usvc.protocol, usvc.addr, usvc.port); + else + svc = __ip_vs_svc_fwm_get(usvc.fwmark); + + /* Unless we're adding a new service, the service must already exist */ + if ((cmd != IPVS_CMD_ADD_SERVICE) && (svc == NULL)) { + ret = -ESRCH; + goto out; + } + + /* Destination commands require a valid destination argument. For + * adding / editing a destination, we need a full destination + * specification. */ + if (cmd == IPVS_CMD_ADD_DEST || cmd == IPVS_CMD_EDIT_DEST || + cmd == IPVS_CMD_DEL_DEST) { + if (cmd != IPVS_CMD_DEL_DEST) + need_full_dest = 1; + + ret = ip_vs_genl_parse_dest(&udest, + info->attrs[IPVS_ENTRY_ATTR_DEST], + need_full_dest); + if (ret) + goto out; + } + + switch (cmd) { + case IPVS_CMD_ADD_SERVICE: + if (svc == NULL) + ret = ip_vs_add_service(&usvc, &svc); + else + ret = -EEXIST; + break; + case IPVS_CMD_EDIT_SERVICE: + ret = ip_vs_edit_service(svc, &usvc); + break; + case IPVS_CMD_DEL_SERVICE: + ret = ip_vs_del_service(svc); + break; + case IPVS_CMD_ZERO: + ret = ip_vs_zero_service(svc); + break; + case IPVS_CMD_ADD_DEST: + ret = ip_vs_add_dest(svc, &udest); + break; + case IPVS_CMD_EDIT_DEST: + ret = ip_vs_edit_dest(svc, &udest); + break; + case IPVS_CMD_DEL_DEST: + ret = ip_vs_del_dest(svc, &udest); + break; + default: + ret = -EINVAL; + } + +out: + if (svc) + ip_vs_service_put(svc); + mutex_unlock(&__ip_vs_mutex); + /* decrease the module use count */ + ip_vs_use_count_dec(); + + return ret; +} + +static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info) +{ + struct sk_buff *msg; + void *reply; + int ret, cmd; + + mutex_lock(&__ip_vs_mutex); + + cmd = info->genlhdr->cmd; + + msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto out_err; + } + + reply = genlmsg_put_reply(msg, info, &ip_vs_genl_family, 0, cmd); + if (reply == NULL) + goto nla_put_failure; + + switch (cmd) { + case IPVS_CMD_GET_INFO: + NLA_PUT_U32(msg, IPVS_INFO_ATTR_VERSION, IP_VS_VERSION_CODE); + NLA_PUT_U32(msg, IPVS_INFO_ATTR_CONN_TAB_SIZE, IP_VS_CONN_TAB_SIZE); + break; + + case IPVS_CMD_GET_SERVICE: + { + struct ip_vs_service *svc; + + svc = ip_vs_genl_find_service(info->attrs[IPVS_ENTRY_ATTR_SERVICE]); + if (IS_ERR(svc)) { + ret = PTR_ERR(svc); + goto out_err; + } else if (svc) { + ret = ip_vs_genl_fill_service(msg, svc); + ip_vs_service_put(svc); + if (ret) + goto nla_put_failure; + } else { + ret = -ESRCH; + goto out_err; + } + + break; + } + + case IPVS_CMD_GET_TIMEOUT: + { + struct ip_vs_timeout_user t; + + __ip_vs_get_timeouts(&t); +#ifdef CONFIG_IP_VS_PROTO_TCP + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_TCP, t.tcp_timeout); + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_TCP_FIN, t.tcp_fin_timeout); +#endif +#ifdef CONFIG_IP_VS_PROTO_UDP + NLA_PUT_U32(msg, IPVS_TIMEOUT_ATTR_UDP, t.udp_timeout); +#endif + + break; + } + + default: + IP_VS_ERR("unknown Generic Netlink command\n"); + ret = -EINVAL; + goto out; + } + + genlmsg_end(msg, reply); + ret = genlmsg_unicast(msg, info->snd_pid); + goto out; + +nla_put_failure: + IP_VS_ERR("not enough space in Netlink message\n"); + ret = -EMSGSIZE; + +out_err: + if (msg) + nlmsg_free(msg); +out: + mutex_unlock(&__ip_vs_mutex); + + return ret; +} + + +static struct genl_ops ip_vs_genl_ops[] __read_mostly = { + /* SET commands */ + { + .cmd = IPVS_CMD_ADD_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_EDIT_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_DEL_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_ADD_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_EDIT_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_DEL_DEST, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_FLUSH, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_SET_TIMEOUT, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_timeout_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_START_DAEMON, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_daemon_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_STOP_DAEMON, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_daemon_policy, + .doit = ip_vs_genl_set_cmd, + }, + { + .cmd = IPVS_CMD_ZERO, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_set_cmd, + }, + + /* GET commands */ + { + .cmd = IPVS_CMD_GET_INFO, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_get_cmd, + }, + { + .cmd = IPVS_CMD_GET_SERVICES, + .flags = GENL_ADMIN_PERM, + .dumpit = ip_vs_genl_dump_services, + }, + { + .cmd = IPVS_CMD_GET_SERVICE, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .doit = ip_vs_genl_get_cmd, + }, + { + .cmd = IPVS_CMD_GET_DESTS, + .flags = GENL_ADMIN_PERM, + .policy = ip_vs_entries_policy, + .dumpit = ip_vs_genl_dump_dests, + }, + { + .cmd = IPVS_CMD_GET_TIMEOUT, + .flags = GENL_ADMIN_PERM, + .doit = ip_vs_genl_get_cmd, + }, + { + .cmd = IPVS_CMD_GET_DAEMONS, + .flags = GENL_ADMIN_PERM, + .dumpit = ip_vs_genl_dump_daemons, + }, +}; + +int ip_vs_genl_register(void) +{ + int ret, i; + + ret = genl_register_family(&ip_vs_genl_family); + if (ret) + return ret; + + for (i = 0; i < ARRAY_SIZE(ip_vs_genl_ops); i++) { + ret = genl_register_ops(&ip_vs_genl_family, &ip_vs_genl_ops[i]); + if (ret) + goto err_out; + } + return 0; + +err_out: + genl_unregister_family(&ip_vs_genl_family); + return ret; +} + +void ip_vs_genl_unregister(void) +{ + genl_unregister_family(&ip_vs_genl_family); +} + +/* End of Generic Netlink interface definitions */ + int ip_vs_control_init(void) { @@ -2321,6 +3201,13 @@ int ip_vs_control_init(void) return ret; } + ret = ip_vs_genl_register(); + if (ret) { + IP_VS_ERR("cannot register Generic Netlink interface.\n"); + nf_unregister_sockopt(&ip_vs_sockopts); + return ret; + } + proc_net_fops_create(&init_net, "ip_vs", 0, &ip_vs_info_fops); proc_net_fops_create(&init_net, "ip_vs_stats",0, &ip_vs_stats_fops); @@ -2357,6 +3244,7 @@ void ip_vs_control_cleanup(void) unregister_sysctl_table(sysctl_header); proc_net_remove(&init_net, "ip_vs_stats"); proc_net_remove(&init_net, "ip_vs"); + ip_vs_genl_unregister(); nf_unregister_sockopt(&ip_vs_sockopts); LeaveFunction(2); } -- 1.5.4.5 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 11:20 ` Julius Volz @ 2008-07-10 11:36 ` Thomas Graf 2008-07-10 12:33 ` Julius Volz 0 siblings, 1 reply; 19+ messages in thread From: Thomas Graf @ 2008-07-10 11:36 UTC (permalink / raw) To: Julius Volz; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem * Julius Volz <juliusv@google.com> 2008-07-10 13:20 > +/* IPVS genetlink family */ > +static struct genl_family ip_vs_genl_family = { > + .id = GENL_ID_GENERATE, > + .hdrsize = 0, > + .name = IPVS_GENL_NAME, > + .version = IPVS_GENL_VERSION, > + .maxattr = IPVS_CMD_MAX It's not a bug but looks like a typo, maxattr should specify the number of first level attributes. > +static int ip_vs_genl_dump_service(struct sk_buff *skb, struct ip_vs_service *svc, > + struct netlink_callback *cb) > +{ > + void *hdr; > + > + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, > + &ip_vs_genl_family, NLM_F_MULTI, > + IPVS_CMD_GET_SERVICES); Typically, netlink code follows the following semantics WRT to commands/message types: -> GET_SERVICE (NLM_F_DUMP) <- NEW_SERVICE <- NEW_SERVICE <- NEW_SERVICE If you are ever going to send notifications, you can use the very same NEW_SERVICE and userspace can use the same parsing functions. > +static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info) > +{ > + struct sk_buff *msg; > + void *reply; > + int ret, cmd; > + > + mutex_lock(&__ip_vs_mutex); > + > + cmd = info->genlhdr->cmd; > + > + msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); This should be nlmsg_new(NLMSG_DEFAULT_SIZE, ...) , NLMSG_GOODSIZE is for use with skb_alloc(). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 11:36 ` Thomas Graf @ 2008-07-10 12:33 ` Julius Volz 2008-07-10 14:41 ` Thomas Graf 0 siblings, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-10 12:33 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem On Thu, Jul 10, 2008 at 1:36 PM, Thomas Graf <tgraf@suug.ch> wrote: > * Julius Volz <juliusv@google.com> 2008-07-10 13:20 >> +/* IPVS genetlink family */ >> +static struct genl_family ip_vs_genl_family = { >> + .id = GENL_ID_GENERATE, >> + .hdrsize = 0, >> + .name = IPVS_GENL_NAME, >> + .version = IPVS_GENL_VERSION, >> + .maxattr = IPVS_CMD_MAX > > It's not a bug but looks like a typo, maxattr should > specify the number of first level attributes. Ah, this is how the family's attrbuf size is set. Looks like a bug actually, but it hasn't affected anything because the command enum is bigger than any of the first-level attribute enums. I might have gotten this from net/irda/irnetlink.c, where it's also set to the maximum command attribute value. Note that I use different first level attributes depending on the command. Rather than calculating the largest needed size, it's probably best to join all attributes that may ever occur in the first level into one big enum, right? >> +static int ip_vs_genl_dump_service(struct sk_buff *skb, struct ip_vs_service *svc, >> + struct netlink_callback *cb) >> +{ >> + void *hdr; >> + >> + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, >> + &ip_vs_genl_family, NLM_F_MULTI, >> + IPVS_CMD_GET_SERVICES); > > Typically, netlink code follows the following semantics WRT to > commands/message types: > -> GET_SERVICE (NLM_F_DUMP) > <- NEW_SERVICE > <- NEW_SERVICE > <- NEW_SERVICE Ok, so I will set the answer message type to IPVS_CMD_NEW_SERVICE (and accordingly in the other dump cases). For non-dump GET commands, is it usual to have the response ID be the same as the request? >> +static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info) >> +{ >> + struct sk_buff *msg; >> + void *reply; >> + int ret, cmd; >> + >> + mutex_lock(&__ip_vs_mutex); >> + >> + cmd = info->genlhdr->cmd; >> + >> + msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); > > This should be nlmsg_new(NLMSG_DEFAULT_SIZE, ...) , NLMSG_GOODSIZE is for use > with skb_alloc(). Ok, changed! Thanks for the comments! Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 12:33 ` Julius Volz @ 2008-07-10 14:41 ` Thomas Graf 2008-07-10 15:13 ` Julius Volz 0 siblings, 1 reply; 19+ messages in thread From: Thomas Graf @ 2008-07-10 14:41 UTC (permalink / raw) To: Julius Volz; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem * Julius Volz <juliusv@google.com> 2008-07-10 14:33 > Ah, this is how the family's attrbuf size is set. Looks like a bug > actually, but it hasn't affected anything because the command enum is > bigger than any of the first-level attribute enums. I might have > gotten this from net/irda/irnetlink.c, where it's also set to the > maximum command attribute value. Thanks for the note, I will fix that. > Note that I use different first level attributes depending on the > command. Rather than calculating the largest needed size, it's > probably best to join all attributes that may ever occur in the first > level into one big enum, right? Yes, that's the easiest solution and it doesn't really cost you anything besides the slightly bigger allocation. > > Typically, netlink code follows the following semantics WRT to > > commands/message types: > > -> GET_SERVICE (NLM_F_DUMP) > > <- NEW_SERVICE > > <- NEW_SERVICE > > <- NEW_SERVICE > > Ok, so I will set the answer message type to IPVS_CMD_NEW_SERVICE (and > accordingly in the other dump cases). For non-dump GET commands, is it > usual to have the response ID be the same as the request? It should follow the same semantics as with dumps. Netlink is typically used in an object context, where objects are requested, added or deleted. Basically, a dump is a request to fill the userspace listening part with all objects of the specified type. genetlink is a bit special as it moved away from the traditional 4 commands per family (get, new, set, delete) but in a case like IPVS where you are in fact managing objects it does make sense to stick to the known semantics. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 14:41 ` Thomas Graf @ 2008-07-10 15:13 ` Julius Volz 2008-07-10 21:16 ` Thomas Graf 0 siblings, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-10 15:13 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem On Thu, Jul 10, 2008, Thomas Graf wrote: > * Julius Volz <juliusv@google.com> 2008-07-10 14:33 >> Note that I use different first level attributes depending on the >> command. Rather than calculating the largest needed size, it's >> probably best to join all attributes that may ever occur in the first >> level into one big enum, right? > > Yes, that's the easiest solution and it doesn't really cost you > anything besides the slightly bigger allocation. Yes. >> > Typically, netlink code follows the following semantics WRT to >> > commands/message types: >> > -> GET_SERVICE (NLM_F_DUMP) >> > <- NEW_SERVICE >> > <- NEW_SERVICE >> > <- NEW_SERVICE >> >> Ok, so I will set the answer message type to IPVS_CMD_NEW_SERVICE (and >> accordingly in the other dump cases). For non-dump GET commands, is it >> usual to have the response ID be the same as the request? > > It should follow the same semantics as with dumps. Netlink is typically > used in an object context, where objects are requested, added or deleted. > Basically, a dump is a request to fill the userspace listening part with > all objects of the specified type. genetlink is a bit special as it > moved away from the traditional 4 commands per family (get, new, set, > delete) but in a case like IPVS where you are in fact managing objects > it does make sense to stick to the known semantics. So, just to be sure: when I'm not returning an object (like in IPVS_CMD_GET_INFO), I still use IPVS_CMD_GET_INFO as the command ID in the response? This is also how net/irda/irnetlink.c does it, but maybe I'm copying bad examples again. But whenever a response message is about objects, be it one or multiple entries, I use the IPVS_CMD_NEW_* response IDs. Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 15:13 ` Julius Volz @ 2008-07-10 21:16 ` Thomas Graf 2008-07-10 23:16 ` Julius Volz 0 siblings, 1 reply; 19+ messages in thread From: Thomas Graf @ 2008-07-10 21:16 UTC (permalink / raw) To: Julius Volz; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem * Julius Volz <juliusv@google.com> 2008-07-10 17:13 > So, just to be sure: when I'm not returning an object (like in > IPVS_CMD_GET_INFO), I still use IPVS_CMD_GET_INFO as the command ID in > the response? This is also how net/irda/irnetlink.c does it, but maybe > I'm copying bad examples again. > > But whenever a response message is about objects, be it one or > multiple entries, I use the IPVS_CMD_NEW_* response IDs. Personally I would never use a GET id to send any data at all. My main focus is on trying to make message protocols self documenting. If a certain message type has multiple meanings depending on the direction it will make it harder to understand and harder to debug from a protocol standpoint. The common netlink semantics are CMD_OBJ_NEW - create or update objects as described in the message content. CMD_OBJ_SET - rarely used, update a static object which doesn't have to be created or added. asme as OBJ_NEW otherwise. CMD_OBJ_DEL - delete object described in the message CMD_OBJ_GET - search for a object as described in the message and send a CMD_OBJ_NEW as reply including the full object. with NLM_F_DUMP: iterate over all objects and send a CMD_OBJ_NEW for each object. This request often carries no additional data. It's pretty simple but covers almost every possible protocol for use in configuration interfaces. In the case of IPVS_CMD_GET_INFO the actual data being sent back can be regarded as a info object. Basically netlink is mainly used as a very basic form of rpc. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 21:16 ` Thomas Graf @ 2008-07-10 23:16 ` Julius Volz 2008-07-16 12:15 ` Thomas Graf 0 siblings, 1 reply; 19+ messages in thread From: Julius Volz @ 2008-07-10 23:16 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem On Thu, Jul 10, 2008, Thomas Graf wrote: > Personally I would never use a GET id to send any data at all. My main > focus is on trying to make message protocols self documenting. If a > certain message type has multiple meanings depending on the direction > it will make it harder to understand and harder to debug from a protocol > standpoint. Makes sense so far, but: > The common netlink semantics are > > CMD_OBJ_NEW - create or update objects as described in the message > content. If a single operation just means create _or_ update (NLM_F_EXCL, etc. flags don't work with genetlink), then ipvsadm would have to query for an entry first, which is racy and ugly. So I'd like to keep ADD/EDIT in separate commands. But then I need a different response id (or just use the ADD id?) for a GET. > CMD_OBJ_SET - rarely used, update a static object which doesn't have > to be created or added. asme as OBJ_NEW otherwise. Like in the case of sending the GET_INFO or GET_TIMEOUT replies. > CMD_OBJ_DEL - delete object described in the message > CMD_OBJ_GET - search for a object as described in the message and > send a CMD_OBJ_NEW as reply including the full object. > > with NLM_F_DUMP: iterate over all objects and send a > CMD_OBJ_NEW for each object. This request often carries > no additional data. Ok, that makes sense too and is pretty much what I have. Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-10 23:16 ` Julius Volz @ 2008-07-16 12:15 ` Thomas Graf 2008-07-16 13:03 ` Julius Volz 0 siblings, 1 reply; 19+ messages in thread From: Thomas Graf @ 2008-07-16 12:15 UTC (permalink / raw) To: Julius Volz; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem * Julius Volz <juliusv@google.com> 2008-07-11 01:16 > If a single operation just means create _or_ update (NLM_F_EXCL, etc. > flags don't work with genetlink), then ipvsadm would have to query for > an entry first, which is racy and ugly. So I'd like to keep ADD/EDIT > in separate commands. But then I need a different response id (or just > use the ADD id?) for a GET. That's fine, both methods a) adding your own NLM_F_ flags to genetlink and b) using separate commands would be straight forward and easy to understand. The key point is that a GET or DUMP request should be answered with one or more NEW requests. An ADD, SET, or DEL request should be simply ACKed or aborted with an error message. Optionally it can trigger a notification message which should be either a NEW or DEL request. Using SET to explicitely update an object is fine as well. The reason we are not using it in the context of notifications is that listeners can appear at any time so the listener may not have been around at the time the object was created. Anyways, it's not a requirement, just common patterns we have been using so far. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] IPVS: Add genetlink interface implementation 2008-07-16 12:15 ` Thomas Graf @ 2008-07-16 13:03 ` Julius Volz 0 siblings, 0 replies; 19+ messages in thread From: Julius Volz @ 2008-07-16 13:03 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, netdev, lvs-devel, vbusam, horms, davem On Wed, Jul 16, 2008, Thomas Graf wrote: > * Julius Volz <juliusv@google.com> 2008-07-11 01:16 >> If a single operation just means create _or_ update (NLM_F_EXCL, etc. >> flags don't work with genetlink), then ipvsadm would have to query for >> an entry first, which is racy and ugly. So I'd like to keep ADD/EDIT >> in separate commands. But then I need a different response id (or just >> use the ADD id?) for a GET. > > That's fine, both methods a) adding your own NLM_F_ flags to genetlink > and b) using separate commands would be straight forward and easy to > understand. Great. > The key point is that a GET or DUMP request should be answered with > one or more NEW requests. An ADD, SET, or DEL request should be simply > ACKed or aborted with an error message. Optionally it can trigger a > notification message which should be either a NEW or DEL request. Makes sense. > Using SET to explicitely update an object is fine as well. The reason we > are not using it in the context of notifications is that listeners can > appear at any time so the listener may not have been around at the time > the object was created. Hm, I like that option most. So if you think it's ok, I will use NEW = add and SET = edit. Julius -- Google Switzerland GmbH ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-07-16 13:03 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-09 15:11 [PATCH 0/2] IPVS: Add Generic Netlink configuration interface Julius Volz 2008-07-09 15:11 ` [PATCH 1/2] IPVS: Add genetlink interface definitions to ip_vs.h Julius Volz 2008-07-09 15:11 ` [PATCH 2/2] IPVS: Add genetlink interface implementation Julius Volz 2008-07-09 15:17 ` YOSHIFUJI Hideaki / 吉藤英明 2008-07-09 15:24 ` Julius Volz 2008-07-09 16:43 ` Patrick McHardy 2008-07-09 18:16 ` Julius Volz 2008-07-10 12:15 ` Patrick McHardy 2008-07-10 13:58 ` Julius Volz 2008-07-10 14:43 ` Thomas Graf 2008-07-10 11:20 ` Julius Volz 2008-07-10 11:36 ` Thomas Graf 2008-07-10 12:33 ` Julius Volz 2008-07-10 14:41 ` Thomas Graf 2008-07-10 15:13 ` Julius Volz 2008-07-10 21:16 ` Thomas Graf 2008-07-10 23:16 ` Julius Volz 2008-07-16 12:15 ` Thomas Graf 2008-07-16 13:03 ` Julius Volz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).