All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Cc: <netdev@vger.kernel.org>
Subject: Re: [rtnetlink] Potential bug in Linux (rt)netlink code
Date: Fri, 12 Oct 2018 11:51:59 -0700	[thread overview]
Message-ID: <20181012115159.7ead2f97@xeon-e3> (raw)
In-Reply-To: <4d7a11b7-1f43-5669-6f19-3c746cc88306@fkie.fraunhofer.de>

On Fri, 12 Oct 2018 09:30:40 +0200
Henning Rogge <henning.rogge@fkie.fraunhofer.de> wrote:

> Hi,
> 
> I am working on a self-written routing agent 
> (https://github.com/OLSR/OONF) and am stuck on a problem with netlink 
> that I cannot explain with an userspace error.
> 
> I am using a netlink socket for setting routes 
> (RTM_NEWROUTE/RTM_DELROUTE), querying the kernel for the current routes 
> in the database (via a RTM_GETROUTE dump) and for getting multicast 
> messages for ongoing routing changes.
> 
> After a few netlink messages I get to the point where the kernel just 
> does not responst to a RTM_NEWROUTE. No error, no answer, despite the 
> NLM_F_ACK flag set)... but sometime when (during shutdown of the routing 
> agent) the program sends another route command (most times a 
> RTM_DELROUTE) I get a single netlink packet with a "successful" response 
> for both the "missing" RTM_NEWROUTE and one for the new RTM DELROUTE 
> sequence number.
> 
> I am testing two routing agents, each of them in a systemd-nspawn based 
> container connected over a bridge on the host system on a current Debian 
> Testing (kernel 4.18.0-1-amd64).
> 
> I am directly using the netlink sockets, without any other userspace 
> library in between.
> 
> I have checked the hexdumps of a couple of netlink messages (including 
> the ones just before the bug happens) by hand and they seem to be okay.
> 
> When I tried to add a "netlink listener" socket for futher debugging (ip 
> link add nlmon0 type nlmon) the problem vanished until I removed the 
> listener socket again.
> 
> Any ideas how to debug this problem? Unfortunately I have no short 
> example program to trigger the bug... I have rarely seen the problem for 
> years (once every couple of months), but until a few days ago I never 
> managed to reproduce it.
> 
> Henning Rogge

Are you reading the responses to your requests?  If you don't read
the response, the socket will get flow blocked.

  reply	other threads:[~2018-10-13  2:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  7:30 [rtnetlink] Potential bug in Linux (rt)netlink code Henning Rogge
2018-10-12 18:51 ` Stephen Hemminger [this message]
2018-10-15  5:25   ` Henning Rogge
2018-10-22  5:22     ` Henning Rogge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181012115159.7ead2f97@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=henning.rogge@fkie.fraunhofer.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.