netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <dgibson@redhat.com>
To: netdev@vger.kernel.org
Cc: Christian Benvenuti <benve@cisco.com>,
	Sujith Sankar <ssujith@cisco.com>,
	Govindarajulu Varadarajan <govindarajulu90@gmail.com>,
	Neel Patel <neepatel@cisco.com>,
	Nishank Trivedi <nistrive@cisco.com>
Subject: RFC: rtnetlink problems with Cisco enic and VFs
Date: Tue, 22 Apr 2014 14:14:25 +1000	[thread overview]
Message-ID: <20140422141425.127dabd3c63482a6a655469e@redhat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 4840 bytes --]

I believe I've found a problem with netlink handling which can be
triggered on Cisco enic devices with a large number (30-40) of virtual
functions.  I believe this is the cause of a real customer problem
we've seen.

 * When requesting a list of interfaces with RTM_GETLINK, enic devices
   (and currently, _only_ enic devices) report IFLA_VF_PORTS
   information 

 * IFLA_VF_PORTS information has at least 90 bytes ber virtual function

 * Unlike IFLA_VFINFO_LIST, the ports information is always reported,
   regardless of the setting of the IFLA_EXT_MASK parameter

 * When IFLA_EXT_MASK is not specified, the reply packets have maximum
   size NLMSG_GOODSIZE (4k - overheads)

 * If there are enough virtual functions the IFLA_VF_PORTS information
   can cause a single interface's info to exceed NLMSG_GOODSIZE

 * The number of interfaces necessary to trigger this is reduced
   substantially if both IPv4 and IPv6 IFLA_AF_SPEC information is
   present (~972 bytes)

 * If the dump function returns -EMSGSIZE on the first message in a
   packet, netlink_dump() incorrectly assumes the listing is done,
   omitting information for that interface and any later ones.

 * This can cause getifaddrs(3) to go into an infinite loop

 * 'ip link' is not affected, because it supplies IFLA_EXT_MASK which
   triggers rtnl_calcit() to recalculate the required packet size to
   greater than NLMSG_GOODSIZE.


I can see several possible ways to fix this, but they all have
possible problems.  I'm hoping someone here can determine which, if
any, are real problems, and therefore what's the right approach to fix
this.

    A) Always calculate the RTM_NEWLINK packet size, rather than
       assuming NLMSG_GOODSIZE.
    Problem: The NLMSG_GOODSIZE limit was introduced to stop broken
             user tools with limited buffers encountering problems
             (see 115c9b81928360d769a76c632bae62d15206a94a). This
             approach might mean that such tools break again.

    B) Don't issue the VF port information when RTEXT_FILTER_VF isn't
       set
    Problem: Do tools using the port information already set this flag?

    C) Don't include the VF port info when listing interfaces, only
       when doing GETLINK on a specific interface.
    Problem: As (B), plus it's ugly.

    D) Detect the case when the first interface in a packet doesn't fit
       reallocate the packet buffer
    Problem: As (A), plus more complicated.

As an interim band-aid, here's a patch which adds a WARN_ON() in this
situation, which will at least make the problem easier to locate for
the next person to encounter it.

From: David Gibson <david@gibson.dropbear.id.au>
Subject: [PATCH] rtnetlink: Warn when interface's information won't fit
         in our packet

Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.

If it doesn't, however, things will go badly wrong,  When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones.  This can cause getifaddrs(3) to
enter an infinite loop.

This patch won't fix the problem, but it will WARN_ON() making it
easier to track down what's going wrong.


Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 net/core/rtnetlink.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d4ff417..5331db2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1198,6 +1198,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb,
struct netlink_callback *cb) struct hlist_head *head;
 	struct nlattr *tb[IFLA_MAX+1];
 	u32 ext_filter_mask = 0;
+	int err;
 
 	s_h = cb->args[0];
 	s_idx = cb->args[1];
@@ -1218,11 +1219,16 @@ static int rtnl_dump_ifinfo(struct sk_buff
*skb, struct netlink_callback *cb) hlist_for_each_entry_rcu(dev, head,
index_hlist) { if (idx < s_idx)
 				goto cont;
-			if (rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
-					     NETLINK_CB
(cb->skb).portid,
-					     cb->nlh->nlmsg_seq, 0,
-					     NLM_F_MULTI,
-					     ext_filter_mask) <= 0)
+			err = rtnl_fill_ifinfo(skb, dev, RTM_NEWLINK,
+					       NETLINK_CB
(cb->skb).portid,
+					       cb->nlh->nlmsg_seq, 0,
+					       NLM_F_MULTI,
+					       ext_filter_mask);
+			/* If we ran out of room on the first message,
+			 * we're in trouble */
+			WARN_ON((err == -EMSGSIZE) && (skb->len == 0));
+
+			if (err <= 0)
 				goto out;
 
 			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
-- 
1.9.0



-- 
David Gibson <dgibson@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

             reply	other threads:[~2014-04-22  4:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-22  4:14 David Gibson [this message]
2014-04-22  4:17 ` RFC: rtnetlink problems with Cisco enic and VFs David Gibson
2014-04-22 18:03 ` Ben Hutchings
2014-04-22 18:12   ` David Miller
2014-04-22 23:26     ` David Gibson
2014-04-23  0:04       ` Greg Rose
2014-04-23  1:12         ` David Gibson
2014-04-23  1:16           ` David Miller
2014-04-23  2:33             ` Christian Benvenuti (benve)
2014-04-23  4:15               ` David Gibson
2014-04-23  0:59       ` David Miller
2014-04-22 19:14 ` Christian Benvenuti (benve)
2014-04-22 23:24   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140422141425.127dabd3c63482a6a655469e@redhat.com \
    --to=dgibson@redhat.com \
    --cc=benve@cisco.com \
    --cc=govindarajulu90@gmail.com \
    --cc=neepatel@cisco.com \
    --cc=netdev@vger.kernel.org \
    --cc=nistrive@cisco.com \
    --cc=ssujith@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).