netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@linux-foundation.org>
To: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: [IPV4 3/5] fib_trie: dump doesnt use RCU
Date: Wed, 23 Jan 2008 22:45:02 -0800	[thread overview]
Message-ID: <479833EE.1090905@linux-foundation.org> (raw)
In-Reply-To: <47983304.3030309@trash.net>

Patrick McHardy wrote:
> David Miller wrote:
>> From: Stephen Hemminger <shemminger@linux-foundation.org>
>> Date: Wed, 23 Jan 2008 14:48:47 -0800
>>
>>  
>>> Since fib dump (via netlink) holds the RTNL mutex, it is unnecessary
>>> to use RCU, and it is impossible to get truncated (-EBUSY) result.
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>     
>>
>> You tested this patch, right? :-/
>>
>> The whole reason we need the nlk->cb[] state is to hold things across
>> multiple recvmsg() calls that might be necessary to obtain the full
>> dump.
>>
>> rtnetlink goes:
>>
>>     rtnl_lock();
>>     netlink_rcv_skb(skb, &rtnetlink_rcv_msg);
>>  ...
>> static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>> {
>>  ...
>>     if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
>>         struct sock *rtnl;
>>         rtnl_dumpit_func dumpit;
>>
>>         dumpit = rtnl_get_dumpit(family, type);
>>         if (dumpit == NULL)
>>             return -EOPNOTSUPP;
>>
>>         __rtnl_unlock();
>>         rtnl = net->rtnl;
>>         err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
>>         rtnl_lock();
>>         return err;
>>
>> (NOTE: Drops RTNL semaphore for netlink_dump_start() call)
>>
>>  ...
>> int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
>>                struct nlmsghdr *nlh,
>>                int (*dump)(struct sk_buff *skb,
>>                    struct netlink_callback *),
>>                int (*done)(struct netlink_callback *))
>> {
>>  ...
>>     cb->dump = dump;
>>     cb->done = done;
>>     cb->nlh = nlh;
>>     atomic_inc(&skb->users);
>>     cb->skb = skb;
>>  ...
>>     mutex_lock(nlk->cb_mutex);
>>  ...
>>     nlk->cb = cb;
>>     mutex_unlock(nlk->cb_mutex);
>>
>>     netlink_dump(sk);
>>  ...
>>
>> static int netlink_dump(struct sock *sk)
>> {
>>  ...
>>     mutex_lock(nlk->cb_mutex);
>>  ...
>>     len = cb->dump(skb, cb);
>>
>>     if (len > 0) {
>>         mutex_unlock(nlk->cb_mutex);
>>         skb_queue_tail(&sk->sk_receive_queue, skb);
>>         sk->sk_data_ready(sk, len);
>>         return 0;
>>     }
>>
>> (NOTE: Therefore cb->dump() runs without RTNL semaphore held)
>>
>>  ...
>> static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
>>                struct msghdr *msg, size_t len,
>>                int flags)
>> {
>>  ...
>>     if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2)
>>         netlink_dump(sk);
>>  ...
>>
>> Therefore, that RTNL assertion you added should have triggered on any
>> dump you may have tried since ->dump() is always invoked without the
>> RTNL semaphore since rtnetlink drops it around the ->dump() call and
>> the call chain for this fib_trie cause would be:
>>
>> inet_dump_fib()
>>  fn_trie_dump()
>>
>> and nothing in that code path retakes the RTNL semaphore.
>
> Actually we're always holding the rtnl during dumps, nlk->cb_mutex points
> to rtnl_mutex in case of rtnetlink. It used to be held only during the 
> first
> ->dump invocation and not on continuations, but I changed this a few 
> versions
> ago.
>
>
Yes, I tested, no the assert didn't hit... Actually, the reason I went 
down this path was
because I couldn't trigger -EBUSY with concurrent updates.

P.s: I checked and Quagga handles -EAGAIN, so if you need an error code that
would be the one to use.


  parent reply	other threads:[~2008-01-24  6:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20080123224844.610730277@linux-foundation.org>
2008-01-23 22:48 ` [IPV4 1/5] fib_trie: more whitespace cleanup Stephen Hemminger
2008-01-24  4:37   ` David Miller
2008-01-23 22:48 ` [IPV4 2/5] fib_trie: remove unneeded NULL check Stephen Hemminger
2008-01-24  4:38   ` David Miller
2008-01-23 22:48 ` [IPV4 3/5] fib_trie: dump doesnt use RCU Stephen Hemminger
2008-01-24  4:50   ` David Miller
2008-01-24  6:41     ` Patrick McHardy
2008-01-24  6:43       ` David Miller
2008-01-24  6:47         ` Patrick McHardy
2008-01-24  7:26           ` David Miller
2008-01-24  6:45       ` Stephen Hemminger [this message]
2008-01-24  7:27         ` David Miller
2008-01-24 21:51           ` [PATCH] fib_trie: rescan if key is lost during dump Stephen Hemminger
2008-01-25  8:23             ` Jarek Poplawski
2008-01-25 16:13               ` Stephen Hemminger
2008-01-25 19:01                 ` Jarek Poplawski
2008-02-01  0:45             ` David Miller
2008-01-23 22:48 ` [IPV4 4/5] fib_trie: version 0.410 Stephen Hemminger
2008-01-24  4:50   ` David Miller
2008-01-23 22:48 ` [IPV4 5/5] fib_semantics: sparse warnings Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479833EE.1090905@linux-foundation.org \
    --to=shemminger@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).