netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Berg <johannes@sipsolutions.net>
To: Richard Cochran <richardcochran@gmail.com>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: [RFC] net: remove erroneous sk null assignment in timestamping
Date: Sat, 08 Oct 2011 10:16:48 +0200	[thread overview]
Message-ID: <1318061808.3991.12.camel@jlt3.sipsolutions.net> (raw)
In-Reply-To: <20111008075719.GA2284@netboy.at.omicron.at> (sfid-20111008_095753_882744_64F82E9F)

On Sat, 2011-10-08 at 09:57 +0200, Richard Cochran wrote:

> I don't remember why I put it that way, but I took a look at the
> problem, and I am not sure how to solve it. The other callers of
> sock_queue_err_skb all create or clone the error skb immediately
> before queueing it:
> 
>   net/core/skbuff.c:       skb_tstamp_tx
>   net/ipv4/ip_sockglue.c:  ip_icmp_error, ip_local_error
>   net/ipv6/datagram.c:     ipv6_icmp_error, ipv6_local_error

Yeah, I noticed that too. That's also the reason they pass the socket
externally I believe, since it's not a properly refcounted socket (the
reference they use is still from the original skb).

The thing that makes it work is that
 a) they don't release the original SKB before sock_queue_err_skb() and
 b) skb->sk is NULL for them

Since this is just a single function, they can guarantee that -- in the
case we found here it's scattered across the code and won't always be
guaranteed -- e.g. the kfree_skb() case in the PHY driver potentially
violates b).

> So I need to prevent the socket from disappearing between
> skb_clone_tx_timestamp and skb_complete_tx_timestamp:
> 
>   skb_clone_tx_timestamp
> 	clone = skb_clone(skb, GFP_ATOMIC);
> 	sock_hold
>   skb_complete_tx_timestamp
> 	sock_queue_err_skb(sk, skb);
> 	sock_put
> 
> What do you think?

I'm not terribly familiar with struct sock. Looking at it, I'm a bit
confused by skb_orphan() -- it doesn't put the sock reference. So are
sockets not refcounted for skbs in this way? They seem to use
sock_wfree() which does a bit more than this it seems, and I don't see
it using sk_refcnt anywhere so I'm a bit confused now.

> BTW, while looking for a good pattern to follow, I found that the can
> driver also sets skb->sk after clone with no special treatment, like
> so:
> 
>   drivers/net/can/dev.c:285
> 	can_put_echo_skb
> 		struct sock *srcsk = skb->sk;
> 		skb = skb_clone(old_skb, GFP_ATOMIC);
> 		skb->sk = srcsk;

Yeah that looks fishy too. But to me it looks a bit like it should
charge to the socket instead of refcounting it -- though of course
that's not really the correct thing to do from a socket buffer point of
view, but it seems the sk_refcnt and sk_wmem_alloc are two separate
mechanisms of refcounting the socket -- I just haven't figured out yet
how they interact.

> > The TX side of this infrastructure seems very poorly tested.
> 
> In fact, we do have the phyter driver used in an extensive automated
> test farm, but the applications just don't do the kinds of things
> suggested to trigger the problem. The normal pattern is, send event
> packet, get tx timestamp, and so we haven't seen the bug at all.

Makes sense, you never wrote an application trying to crash it :-)

> > Maybe that's how you can trigger it: have one thread turn on and off
> > timestamping all the time, and another thread send frames all the time,
> > then eventually you'll probably run into the kfree_skb() case there. If
> > you ever manage to run into that case, it'll crash either when freeing
> > this skb or when freeing the original.
> 
> Thats one weird app, but I get the point, and thanks for your
> attention to my code.

Agree, it's obviously a specifically devised app to try to make it
crash. It serves no other practical purpose.

johannes

  reply	other threads:[~2011-10-08  8:16 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-07 17:11 [RFC] net: remove erroneous sk null assignment in timestamping Johannes Berg
2011-10-07 17:33 ` David Miller
2011-10-07 17:40   ` Johannes Berg
2011-10-07 17:47     ` Johannes Berg
2011-10-07 17:53       ` Johannes Berg
2011-10-07 18:42     ` Johannes Berg
2011-10-08  7:59       ` Richard Cochran
2011-10-08  7:57   ` Richard Cochran
2011-10-08  8:16     ` Johannes Berg [this message]
2011-10-08  8:57       ` Eric Dumazet
2011-10-08 10:32         ` Johannes Berg
2011-10-11 13:34           ` Richard Cochran
2011-10-08 10:35         ` Richard Cochran
2011-10-12 18:36 ` [PATCH 1/1] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-12 19:25   ` Eric Dumazet
2011-10-12 19:27   ` Johannes Berg
2011-10-12 19:52     ` Eric Dumazet
2011-10-13  8:54       ` Johannes Berg
2011-10-13  4:51     ` Richard Cochran
2011-10-13  9:46   ` [PATCH 0/3] net: time stamping fixes Richard Cochran
2011-10-19  4:16     ` David Miller
2011-10-19  5:15       ` Johannes Berg
2011-10-19 11:50         ` Richard Cochran
2011-10-19 12:33           ` Eric Dumazet
2011-10-19 12:38           ` Eric Dumazet
2011-10-19 12:58             ` Johannes Berg
2011-10-19 13:09               ` Johannes Berg
2011-10-19 13:25                 ` Eric Dumazet
2011-10-19 13:35                   ` Johannes Berg
2011-10-19 13:44                     ` Eric Dumazet
2011-10-19 13:57                       ` Johannes Berg
2011-10-19 14:08                         ` Eric Dumazet
2011-10-19 14:24                           ` Johannes Berg
2011-10-19 14:27                             ` Richard Cochran
2011-10-19 14:33                               ` Eric Dumazet
2011-10-19 13:21               ` Eric Dumazet
2011-10-19 13:25                 ` Johannes Berg
2011-10-19 13:27                   ` Eric Dumazet
2011-10-19 13:32                     ` Johannes Berg
2011-10-19 14:25                       ` Richard Cochran
2011-10-13  9:46   ` [PATCH 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-19  4:42     ` Eric Dumazet
2011-10-13  9:46   ` [PATCH 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-19  4:47     ` Eric Dumazet
2011-10-13  9:46   ` [PATCH 3/3] dp83640: free packet queues on remove Richard Cochran
2011-10-19  4:48     ` Eric Dumazet
2011-10-21 10:49   ` [PATCH v2 0/3] net: time stamping fixes Richard Cochran
2011-10-21 10:49   ` [PATCH v2 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-21 11:31     ` Eric Dumazet
2011-10-24  6:55       ` David Miller
2011-10-21 11:44     ` Johannes Berg
2011-10-21 10:49   ` [PATCH v2 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-24  6:55     ` David Miller
2011-10-24 17:47       ` Richard Cochran
2011-10-24 23:16         ` David Miller
2011-10-21 10:49   ` [PATCH v2 3/3] dp83640: free packet queues on remove Richard Cochran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1318061808.3991.12.camel@jlt3.sipsolutions.net \
    --to=johannes@sipsolutions.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).