[PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Edwin Peer <edwin.peer@broadcom.com>
To: netdev@vger.kernel.org
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Andrew Gospodarek <andrew.gospodarek@broadcom.com>,
	Michael Chan <michael.chan@broadcom.com>,
	Edwin Peer <edwin.peer@broadcom.com>
Subject: [PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST
Date: Fri, 15 Jan 2021 14:59:50 -0800	[thread overview]
Message-ID: <20210115225950.18762-1-edwin.peer@broadcom.com> (raw)

The maximum possible length of an RTNL attribute is 64KB, but the
nested VFINFO list exceeds this for more than about 220 VFs (each VF
consumes approximately 300 bytes, depending on alignment and optional
fields). Exceeding the limit causes IFLA_VFINFO_LIST's length to wrap
modulo 16 bits in the kernel's nla_nest_end().

This patch is a horrible hack exploiting the fact that the full set
of attributes is actually present in the netlink packet, even though
the published length of the nested rtattr may be considerably shorter.
The total number of VFs is known, however, and can instead be used as
the basis for the iteration over the VFINFO list.

As ugly as this solution is, it does appear to be a reasonable and
practical compromise selected from a number of alternate approaches
that were considered and deemed worse or otherwise unworkable:

  - Extending the apparent maximum length of rtattr:

  To do this is a way that maintains ABI compatibility is easier said
  than done. Pushing the nested contents through deflate in response to
  a special request filter flag so that the data still fits within the
  64KB limit was considered (not entirely as crazy as this first sounds
  because there is a lot of redundancy in the data that would definitely
  compress well) as well as approaches based on providing new attribute
  types to pair with ATTR_TYPE_NESTED that extend its length in various
  ways (such as a "more" attribute or an extended attribute header with
  a wider length type). Ultimately these length extension ideas were
  rejected because the client parser APIs are expressed in terms of the
  base rtattr type, which cannot be extended cleanly without tacking on
  kludgy helpers or otherwise conducting major rework of client APIs.

  - Filtering based approaches:

  An obvious idea is to reduce the amount of data actually sent using
  filters. For example, by extending RTEXT_FILTER_SKIP_STATS to the VF
  stats, which make up a large proportion of the dump. But, the problem
  arises when it is the stats that are desired. One now either has to
  filter by VF when requesting full resolution data (ie. fetch each VF
  separately) or one has to pick another subset of fields to exclude
  and stitch the results together in the client. But, the requests are
  not atomic and the VF configuration could have changed in the interim.
  This may be less of a concern when requesting a VF's entire data as
  a whole (at least the data would necessarily apply to the same VF),
  but even so there would then need to be a mechanism to select only
  the VFINFO of interest, which is particularly messy given that we're
  not requesting a top level object here and would involve extensions
  to an otherwise frozen VF query API (and still not be atomic).

  - API redesign:

  The clean solution is to decompose the API into smaller granularity
  requests and otherwise rethink the structure of netlink attributes in
  a V2 RTM_GETLINK redesign. Such ideas are all moot, however, because
  VF config has been punted to switchdev and any new work should happen
  there instead.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 ip/ipaddress.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 571346b15cc3..3be61f49204c 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1198,13 +1198,13 @@ int print_linkinfo(struct nlmsghdr *n, void *arg)
 	}

 	if ((do_link || show_details) && tb[IFLA_VFINFO_LIST] && tb[IFLA_NUM_VF]) {
-		struct rtattr *i, *vflist = tb[IFLA_VFINFO_LIST];
-		int rem = RTA_PAYLOAD(vflist);
+		struct rtattr *vf = RTA_DATA(tb[IFLA_VFINFO_LIST]);
+		int i, ignore = 0, num_vf = rta_getattr_u32(tb[IFLA_NUM_VF]);

 		open_json_array(PRINT_JSON, "vfinfo_list");
-		for (i = RTA_DATA(vflist); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		for (i = 0; i < num_vf; vf = RTA_NEXT(vf, ignore), i++) {
 			open_json_object(NULL);
-			print_vfinfo(fp, ifi, i);
+			print_vfinfo(fp, ifi, vf);
 			close_json_object();
 		}
 		close_json_array(PRINT_JSON, NULL);
@@ -2157,22 +2157,20 @@ out:
 static void
 ipaddr_loop_each_vf(struct rtattr *tb[], int vfnum, int *min, int *max)
 {
-	struct rtattr *vflist = tb[IFLA_VFINFO_LIST];
-	struct rtattr *i, *vf[IFLA_VF_MAX+1];
+	int i, ignore = 0, num_vf = rta_getattr_u32(tb[IFLA_NUM_VF]);
+	struct rtattr *vf = RTA_DATA(tb[IFLA_VFINFO_LIST]);
+	struct rtattr *vf_tb[IFLA_VF_MAX+1];
 	struct ifla_vf_rate *vf_rate;
-	int rem;

-	rem = RTA_PAYLOAD(vflist);
+	for (i = 0; i < num_vf; vf = RTA_NEXT(vf, ignore), i++) {
+		parse_rtattr_nested(vf_tb, IFLA_VF_MAX, vf);

-	for (i = RTA_DATA(vflist); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
-		parse_rtattr_nested(vf, IFLA_VF_MAX, i);
-
-		if (!vf[IFLA_VF_RATE]) {
+		if (!vf_tb[IFLA_VF_RATE]) {
 			fprintf(stderr, "VF min/max rate API not supported\n");
 			exit(1);
 		}

-		vf_rate = RTA_DATA(vf[IFLA_VF_RATE]);
+		vf_rate = RTA_DATA(vf_tb[IFLA_VF_RATE]);
 		if (vf_rate->vf == vfnum) {
 			*min = vf_rate->min_tx_rate;
 			*max = vf_rate->max_tx_rate;
-- 
2.30.0

next             reply	other threads:[~2021-01-15 23:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 22:59 Edwin Peer [this message]
2021-01-15 23:53 ` [PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST Jakub Kicinski
2021-01-16 21:12   ` Michal Kubecek
2021-01-17  1:21     ` Jakub Kicinski
2021-01-18  3:48       ` David Ahern
2021-01-18 17:34         ` Edwin Peer
2021-01-18 17:36           ` David Ahern
2021-01-18 17:42             ` Edwin Peer
2021-01-18 17:49               ` David Ahern
2021-01-18 18:20                 ` Edwin Peer
2021-01-18 18:30                   ` Michal Kubecek
2021-01-18 17:31       ` Edwin Peer
2021-01-18 17:37     ` Edwin Peer

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:571346b15cc dfblob:3be61f49204 )
 OR (
bs:"[PATCH iproute2] iplink: work around rtattr length limits for IFLA_VFINFO_LIST" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210115225950.18762-1-edwin.peer@broadcom.com \
    --to=edwin.peer@broadcom.com \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=kuba@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.