From: Johannes Berg <johannes@sipsolutions.net>
To: Richard Cochran <richardcochran@gmail.com>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: [RFC] net: remove erroneous sk null assignment in timestamping
Date: Sat, 08 Oct 2011 10:16:48 +0200 [thread overview]
Message-ID: <1318061808.3991.12.camel@jlt3.sipsolutions.net> (raw)
In-Reply-To: <20111008075719.GA2284@netboy.at.omicron.at> (sfid-20111008_095753_882744_64F82E9F)
On Sat, 2011-10-08 at 09:57 +0200, Richard Cochran wrote:
> I don't remember why I put it that way, but I took a look at the
> problem, and I am not sure how to solve it. The other callers of
> sock_queue_err_skb all create or clone the error skb immediately
> before queueing it:
>
> net/core/skbuff.c: skb_tstamp_tx
> net/ipv4/ip_sockglue.c: ip_icmp_error, ip_local_error
> net/ipv6/datagram.c: ipv6_icmp_error, ipv6_local_error
Yeah, I noticed that too. That's also the reason they pass the socket
externally I believe, since it's not a properly refcounted socket (the
reference they use is still from the original skb).
The thing that makes it work is that
a) they don't release the original SKB before sock_queue_err_skb() and
b) skb->sk is NULL for them
Since this is just a single function, they can guarantee that -- in the
case we found here it's scattered across the code and won't always be
guaranteed -- e.g. the kfree_skb() case in the PHY driver potentially
violates b).
> So I need to prevent the socket from disappearing between
> skb_clone_tx_timestamp and skb_complete_tx_timestamp:
>
> skb_clone_tx_timestamp
> clone = skb_clone(skb, GFP_ATOMIC);
> sock_hold
> skb_complete_tx_timestamp
> sock_queue_err_skb(sk, skb);
> sock_put
>
> What do you think?
I'm not terribly familiar with struct sock. Looking at it, I'm a bit
confused by skb_orphan() -- it doesn't put the sock reference. So are
sockets not refcounted for skbs in this way? They seem to use
sock_wfree() which does a bit more than this it seems, and I don't see
it using sk_refcnt anywhere so I'm a bit confused now.
> BTW, while looking for a good pattern to follow, I found that the can
> driver also sets skb->sk after clone with no special treatment, like
> so:
>
> drivers/net/can/dev.c:285
> can_put_echo_skb
> struct sock *srcsk = skb->sk;
> skb = skb_clone(old_skb, GFP_ATOMIC);
> skb->sk = srcsk;
Yeah that looks fishy too. But to me it looks a bit like it should
charge to the socket instead of refcounting it -- though of course
that's not really the correct thing to do from a socket buffer point of
view, but it seems the sk_refcnt and sk_wmem_alloc are two separate
mechanisms of refcounting the socket -- I just haven't figured out yet
how they interact.
> > The TX side of this infrastructure seems very poorly tested.
>
> In fact, we do have the phyter driver used in an extensive automated
> test farm, but the applications just don't do the kinds of things
> suggested to trigger the problem. The normal pattern is, send event
> packet, get tx timestamp, and so we haven't seen the bug at all.
Makes sense, you never wrote an application trying to crash it :-)
> > Maybe that's how you can trigger it: have one thread turn on and off
> > timestamping all the time, and another thread send frames all the time,
> > then eventually you'll probably run into the kfree_skb() case there. If
> > you ever manage to run into that case, it'll crash either when freeing
> > this skb or when freeing the original.
>
> Thats one weird app, but I get the point, and thanks for your
> attention to my code.
Agree, it's obviously a specifically devised app to try to make it
crash. It serves no other practical purpose.
johannes
next prev parent reply other threads:[~2011-10-08 8:16 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-07 17:11 [RFC] net: remove erroneous sk null assignment in timestamping Johannes Berg
2011-10-07 17:33 ` David Miller
2011-10-07 17:40 ` Johannes Berg
2011-10-07 17:47 ` Johannes Berg
2011-10-07 17:53 ` Johannes Berg
2011-10-07 18:42 ` Johannes Berg
2011-10-08 7:59 ` Richard Cochran
2011-10-08 7:57 ` Richard Cochran
2011-10-08 8:16 ` Johannes Berg [this message]
2011-10-08 8:57 ` Eric Dumazet
2011-10-08 10:32 ` Johannes Berg
2011-10-11 13:34 ` Richard Cochran
2011-10-08 10:35 ` Richard Cochran
2011-10-12 18:36 ` [PATCH 1/1] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-12 19:25 ` Eric Dumazet
2011-10-12 19:27 ` Johannes Berg
2011-10-12 19:52 ` Eric Dumazet
2011-10-13 8:54 ` Johannes Berg
2011-10-13 4:51 ` Richard Cochran
2011-10-13 9:46 ` [PATCH 0/3] net: time stamping fixes Richard Cochran
2011-10-13 9:46 ` [PATCH 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-19 4:42 ` Eric Dumazet
2011-10-13 9:46 ` [PATCH 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-19 4:47 ` Eric Dumazet
2011-10-13 9:46 ` [PATCH 3/3] dp83640: free packet queues on remove Richard Cochran
2011-10-19 4:48 ` Eric Dumazet
2011-10-19 4:16 ` [PATCH 0/3] net: time stamping fixes David Miller
2011-10-19 5:15 ` Johannes Berg
2011-10-19 11:50 ` Richard Cochran
2011-10-19 12:33 ` Eric Dumazet
2011-10-19 12:38 ` Eric Dumazet
2011-10-19 12:58 ` Johannes Berg
2011-10-19 13:09 ` Johannes Berg
2011-10-19 13:25 ` Eric Dumazet
2011-10-19 13:35 ` Johannes Berg
2011-10-19 13:44 ` Eric Dumazet
2011-10-19 13:57 ` Johannes Berg
2011-10-19 14:08 ` Eric Dumazet
2011-10-19 14:24 ` Johannes Berg
2011-10-19 14:27 ` Richard Cochran
2011-10-19 14:33 ` Eric Dumazet
2011-10-19 13:21 ` Eric Dumazet
2011-10-19 13:25 ` Johannes Berg
2011-10-19 13:27 ` Eric Dumazet
2011-10-19 13:32 ` Johannes Berg
2011-10-19 14:25 ` Richard Cochran
2011-10-21 10:49 ` [PATCH v2 " Richard Cochran
2011-10-21 10:49 ` [PATCH v2 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-21 11:31 ` Eric Dumazet
2011-10-24 6:55 ` David Miller
2011-10-21 11:44 ` Johannes Berg
2011-10-21 10:49 ` [PATCH v2 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-24 6:55 ` David Miller
2011-10-24 17:47 ` Richard Cochran
2011-10-24 23:16 ` David Miller
2011-10-21 10:49 ` [PATCH v2 3/3] dp83640: free packet queues on remove Richard Cochran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1318061808.3991.12.camel@jlt3.sipsolutions.net \
--to=johannes@sipsolutions.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=richardcochran@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.