public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Rue <dan.rue@linaro.org>
To: Alexey Kodanev <alexey.kodanev@oracle.com>
Cc: Li Wang <liwang@redhat.com>,
	ltp@lists.linux.it, mmarhefk@redhat.com, netdev@vger.kernel.org
Subject: Re: [LTP] [RFC] [PATCH] netns: Fix race in virtual interface bringup
Date: Fri, 17 Nov 2017 16:29:20 -0600	[thread overview]
Message-ID: <20171117222920.yr46xvdgtspmq6jp@xps> (raw)
In-Reply-To: <2c722239-8e24-9796-a022-03b0c423f3b8@oracle.com>

Alexey, Li, thank you for your suggestions.

On Fri, Nov 17, 2017 at 03:08:20PM +0300, Alexey Kodanev wrote:
> On 11/17/2017 09:09 AM, Li Wang wrote:
> > Hi Dan,
> >
> > On Fri, Nov 10, 2017 at 4:38 AM, Dan Rue <dan.rue@linaro.org> wrote:
> >> Symptoms (+ command, error):
> >>     netns_comm_ip_ipv6_ioctl:
> >>         + ip netns exec tst_net_ns1 ping6 -q -c2 -I veth1 fd00::2
> >>         connect: Cannot assign requested address
> >>
> >>     netns_comm_ip_ipv6_netlink:
> >>         + ip netns exec tst_net_ns0 ping6 -q -c2 -I veth0 fd00::3
> >>         connect: Cannot assign requested address
> >>
> >>     netns_comm_ns_exec_ipv6_ioctl:
> >>         + ns_exec 6689 net ping6 -q -c2 -I veth0 fd00::3
> >>         connect: Cannot assign requested address
> >>
> >>     netns_comm_ns_exec_ipv6_netlin:
> >>         + ns_exec 6891 net ping6 -q -c2 -I veth0 fd00::3
> >>         connect: Cannot assign requested address
> >>
> >> The error is coming from ping6, which is trying to get an IP address for
> >> veth0 (due to -I veth0), but cannot. Waiting for two seconds fixes the
> >> test in my testcases. 1 second is not long enough.
> >>
> >> dmesg shows the following during the test:
> >>
> >>     [Nov 7 15:39] LTP: starting netns_comm_ip_ipv6_ioctl (netns_comm.sh ip ipv6 ioctl)
> >>     [  +0.302401] IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
> >>     [  +0.048059] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 
> It's quite strange that veth interface needs 2 seconds to become
> operational and it is up in less than 0.3s according to dmesg, but
> you said that it's not enough even 1 sec... Are you sure that IPv6
> address not in tentative state and dad process actually disabled?
> I'm asking because you don't have it disabled in the script:
> https://gist.github.com/danrue/7b76bbcbc23a6296030b7295650b69f3

Investigating further, the dmesg output is reporting on the status of
the link between veth0 and veth1, not the veth0 interface itself. That
is, the first dmesg message comes from "ip netns exec tst_net_ns0
ifconfig veth0 up" and the second comes from "ip netns exec tst_net_ns1
ifconfig veth1 up". This explains why we see .3s in dmesg but a 2 second
sleep being required. There is not actually anything in dmesg that is
helpful here.

Regarding dad (duplicate address detection), we have seen similar issues
on low power ARM64 boards and IPv4. Anyway, I tried disabling dad on the
interface and it did not make a difference.

> 
> >>
> >> Signed-off-by: Dan Rue <dan.rue@linaro.org>
> >> ---
> >>
> >> We've periodically hit this problem across many arm64 kernels and boards, and
> >> it seems to be caused by "ping6" running before the virtual interface is
> >> actually ready. "sleep 2" works around the issue and proves that it is a race
> >> condition, but I would prefer something faster and deterministic. Please
> >> suggest a better implementation.
> > Just FYI:
> >
> > I'm not good at network things, but one method I copied from ltp/numa
> > test is to split the '2s' into many smaller pieces of time.
> >
> > which something like:
> >
> > --- a/testcases/kernel/containers/netns/netns_helper.sh
> > +++ b/testcases/kernel/containers/netns/netns_helper.sh
> > @@ -240,6 +240,22 @@ netns_ip_setup()
> >                 tst_brkm TBROK "unable to add device veth1 to the
> > separate network namespace"
> >  }
> >
> > +wait_for_set_ip()
> > +{
> > +       local dev=$1
> > +       local retries=200
> > +
> > +       while [ $retries -gt 0 ]; do
> > +               dmesg -c | grep -q "IPv6: ADDRCONF(NETDEV_CHANGE):
> > $dev: link becomes ready"
> 
> 
> What about "grep -q up /sys/class/net/$dev/operstate && break"?

Since dmesg will not help, I explored /sys as proposed.

operstate shows "up", and ping6 still fails.
carrier shows "1" (up), and ping6 still fails.
dormant shows "0" (interface is not dormant), and ping6 still fails.
flags shows "0x1003" before and after a 2s sleep (they don't change)

So it seems there is nothing in dmesg, or /sys that can help here.

Dan

> 
> Thanks,
> Alexey
> 
> 
> > +               if [ $? -eq 0 ]; then
> > +                       break
> > +               fi
> > +
> > +               retries=$((retries-1))
> > +               tst_sleep 10ms
> > +       done
> > +}
> > +
> >  ##
> >  # Enables virtual ethernet devices and assigns IP addresses for both
> >  # of them (IPv4/IPv6 variant is decided by netns_setup() function).
> > @@ -285,6 +301,9 @@ netns_set_ip()
> >                         tst_brkm TBROK "enabling veth1 device failed"
> >                 ;;
> >         esac
> > +
> > +       wait_for_set_ip veth0
> > +       wait_for_set_ip veth1
> >  }
> >
> >  netns_ns_exec_cleanup()
> >
> >> Also, is it correct that "ifconfig veth0 up" returns before the interface is
> >> actually ready?
> >>
> >> See also this isolated test script:
> >> https://gist.github.com/danrue/7b76bbcbc23a6296030b7295650b69f3
> >>
> >>  testcases/kernel/containers/netns/netns_helper.sh | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/testcases/kernel/containers/netns/netns_helper.sh b/testcases/kernel/containers/netns/netns_helper.sh
> >> index a95cdf206..99172c0c0 100755
> >> --- a/testcases/kernel/containers/netns/netns_helper.sh
> >> +++ b/testcases/kernel/containers/netns/netns_helper.sh
> >> @@ -285,6 +285,7 @@ netns_set_ip()
> >>                         tst_brkm TBROK "enabling veth1 device failed"
> >>                 ;;
> >>         esac
> >> +       sleep 2
> >>  }
> >>
> >>  netns_ns_exec_cleanup()
> >> --
> >> 2.13.6
> >>
> >>
> >> --
> >> Mailing list info: https://lists.linux.it/listinfo/ltp
> >
> >
> 

      parent reply	other threads:[~2017-11-17 22:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171109203841.28856-1-dan.rue@linaro.org>
2017-11-15 19:04 ` [RFC] [PATCH] netns: Fix race in virtual interface bringup Dan Rue
2017-11-16  9:56   ` Nicolas Dichtel
2017-11-21 21:12     ` [LTP] " Dan Rue
     [not found] ` <CAEemH2eiHo8EfyMqVhS=iee8Hfxw7_ygeKtGHBo_AFGW1QEYGw@mail.gmail.com>
     [not found]   ` <2c722239-8e24-9796-a022-03b0c423f3b8@oracle.com>
2017-11-17 22:29     ` Dan Rue [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171117222920.yr46xvdgtspmq6jp@xps \
    --to=dan.rue@linaro.org \
    --cc=alexey.kodanev@oracle.com \
    --cc=liwang@redhat.com \
    --cc=ltp@lists.linux.it \
    --cc=mmarhefk@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox