From: Vlad Yasevich <vyasevic@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: netdev@vger.kernel.org
Subject: Re: [RFC net-next PATCH] macvtap: Add packet capture support
Date: Wed, 20 Nov 2013 14:36:40 -0500 [thread overview]
Message-ID: <528D0F48.60304@redhat.com> (raw)
In-Reply-To: <20131120181949.GB25569@redhat.com>
On 11/20/2013 01:19 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 20, 2013 at 01:04:09PM -0500, Vlad Yasevich wrote:
>> Currently it is impossible to capture traffic when using a macvtap
>> device. The reason is that all capture handling is done either in
>> dev_hard_start_xmit() or in __netif_receive_skb_core(). Macvtap
>> currenlty doesn't use dev_hard_start_xmit(), and at the time the
>> packet traverses __netif_receive_skb_core, the skb->dev is set to
>> the lower-level device that doesn't end up matching macvtap.
>>
>> To solve the issue, use dev_hard_start_xmit() on the output path.
>> On the input path, it is toughter to solve since macvtap ends up
>> consuming the skb so there is nothing more left for
>> __netif_receive_skb_core() to do. A simple solution is to
>> pull the code that delivers things to the taps/captures into
>> its own function and allow macvtap to call it from its receive
>> routine.
>>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>> ---
>> This is only an RFC. I'd like to solicit comments on this simple
>> approach.
>
> I'm kind of worried about this. What worries me is that normally
> if you have a packet socket bound to all interfaces, what it shows is
> traffic to/from the box.
>
> This might be a bug for someone, but I suspect at this point this
> is part of the ABI.
>
> But macvtap bypasses most of the host networking stack,
> So I worry that suddenly showing these packets would be confusing.
Is it really different from using bridge and tap? If someone
does 'tcpdump -i any', they will see packets sent by the guest with
bridge+tap. It makes sense to do that same thing for macvtap, no?
>
> Assuming we want to show this traffic, I think it's preferable to
> - if (!ptype->dev || ptype->dev == skb->dev) {
> + if (ptype->dev == skb->dev) {
> so you have to bind to macvtap explicitly to see the traffic.
>
>
> Of course when you start binding to specific macvtaps
> it doesn't scale well because we'll have a long list
> on ptype_all suddenly.
How likely is that really? ptype_all doesn't scale as it.
Any time you don't set the protocol field, the entry goes
into ptype_all. Macvtap doesn't change that. You can
create a long list in ptype all if you have lots of guests and
you want to snoop on each guest separately thus binding to tap
device.
-vlad
> This might mean we need to keep this list per-device ...
>
>> A more complicated solution would have been to give
>> macvtap its own rx_handler and return RX_HANDLER_ANOTHER during
>> receive operation to make the packet go through another round of
>> capturing and hit the macvtap rx_handler. I thought this would
>> hurt performance too much for no real gain.
>>
>> Thanks
>> -vlad
>>
>> drivers/net/macvtap.c | 4 +++-
>> include/linux/netdevice.h | 2 ++
>> net/core/dev.c | 33 ++++++++++++++++++++++++++-------
>> 3 files changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index dc76670..0ed8fae 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -334,6 +334,7 @@ drop:
>> */
>> static int macvtap_receive(struct sk_buff *skb)
>> {
>> + netif_receive_skb_taps(skb, true);
>> skb_push(skb, ETH_HLEN);
>> return macvtap_forward(skb->dev, skb);
>> }
>> @@ -727,8 +728,9 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>> skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>> }
>> if (vlan) {
>> + skb->dev = vlan->dev;
>> local_bh_disable();
>> - macvlan_start_xmit(skb, vlan->dev);
>> + dev_hard_start_xmit(skb, vlan->dev, NULL, NULL);
>> local_bh_enable();
>> } else {
>> kfree_skb(skb);
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 8b3de7c..84880eb 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -2383,6 +2383,8 @@ void dev_kfree_skb_any(struct sk_buff *skb);
>> int netif_rx(struct sk_buff *skb);
>> int netif_rx_ni(struct sk_buff *skb);
>> int netif_receive_skb(struct sk_buff *skb);
>> +struct packet_type *netif_receive_skb_taps(struct sk_buff *skb,
>> + bool last_deliver);
>> gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
>> void napi_gro_flush(struct napi_struct *napi, bool flush_old);
>> struct sk_buff *napi_get_frags(struct napi_struct *napi);
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index da9c5e1..50f0ac4 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3484,6 +3484,31 @@ static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
>> }
>> }
>>
>> +struct packet_type *netif_receive_skb_taps(struct sk_buff *skb,
>> + bool last_deliver)
>> +{
>> + struct packet_type *ptype, *pt_prev = NULL;
>> + struct net_device *orig_dev;
>> +
>> + if (list_empty(&ptype_all))
>> + return NULL;
>> +
>> + orig_dev = skb->dev;
>> + list_for_each_entry_rcu(ptype, &ptype_all, list) {
>> + if (!ptype->dev || ptype->dev == skb->dev) {
>> + if (pt_prev)
>> + deliver_skb(skb, pt_prev, orig_dev);
>> + pt_prev = ptype;
>> + }
>> + }
>> +
>> + if (last_deliver && pt_prev)
>> + deliver_skb(skb, pt_prev, orig_dev);
>> +
>> + return pt_prev;
>> +}
>> +EXPORT_SYMBOL_GPL(netif_receive_skb_taps);
>> +
>> static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>> {
>> struct packet_type *ptype, *pt_prev;
>> @@ -3535,13 +3560,7 @@ another_round:
>> if (pfmemalloc)
>> goto skip_taps;
>>
>> - list_for_each_entry_rcu(ptype, &ptype_all, list) {
>> - if (!ptype->dev || ptype->dev == skb->dev) {
>> - if (pt_prev)
>> - ret = deliver_skb(skb, pt_prev, orig_dev);
>> - pt_prev = ptype;
>> - }
>> - }
>> + pt_prev = netif_receive_skb_taps(skb, false);
>>
>> skip_taps:
>> #ifdef CONFIG_NET_CLS_ACT
>> --
>> 1.8.4.2
next prev parent reply other threads:[~2013-11-20 19:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-20 18:04 [RFC net-next PATCH] macvtap: Add packet capture support Vlad Yasevich
2013-11-20 18:19 ` Michael S. Tsirkin
2013-11-20 19:36 ` Vlad Yasevich [this message]
2013-11-20 20:06 ` Michael S. Tsirkin
2013-11-20 20:19 ` Vlad Yasevich
2013-11-20 20:30 ` Michael S. Tsirkin
2013-11-20 20:35 ` Vlad Yasevich
2013-11-20 21:12 ` Michael S. Tsirkin
2013-11-20 22:01 ` Vlad Yasevich
2013-11-20 19:52 ` David Miller
2013-11-20 20:10 ` Vlad Yasevich
2013-11-20 20:20 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=528D0F48.60304@redhat.com \
--to=vyasevic@redhat.com \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).