From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [RFC net-next PATCH] macvtap: Add packet capture support
Date: Wed, 20 Nov 2013 20:19:49 +0200
Message-ID: <20131120181949.GB25569@redhat.com>
References: <1384970649-29839-1-git-send-email-vyasevic@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: netdev@vger.kernel.org
To: Vlad Yasevich <vyasevic@redhat.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:47434 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754545Ab3KTSQg (ORCPT <rfc822;netdev@vger.kernel.org>);
	Wed, 20 Nov 2013 13:16:36 -0500
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rAKIGZku028393
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <netdev@vger.kernel.org>; Wed, 20 Nov 2013 13:16:35 -0500
Content-Disposition: inline
In-Reply-To: <1384970649-29839-1-git-send-email-vyasevic@redhat.com>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Wed, Nov 20, 2013 at 01:04:09PM -0500, Vlad Yasevich wrote:
> Currently it is impossible to capture traffic when using a macvtap
> device.  The reason is that all capture handling is done either in
> dev_hard_start_xmit() or in __netif_receive_skb_core().  Macvtap
> currenlty doesn't use dev_hard_start_xmit(), and at the time the
> packet traverses __netif_receive_skb_core, the skb->dev is set to
> the lower-level device that doesn't end up matching macvtap.
> 
> To solve the issue, use dev_hard_start_xmit() on the output path.
> On the input path, it is toughter to solve since macvtap ends up
> consuming the skb so there is nothing more left for
> __netif_receive_skb_core() to do.  A simple solution is to
> pull the code that delivers things to the taps/captures into
> its own function and allow macvtap to call it from its receive
> routine.
> 
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> ---
> This is only an RFC.  I'd like to solicit comments on this simple
> approach.

I'm kind of worried about this. What worries me is that normally
if you have a packet socket bound to all interfaces, what it shows is
traffic to/from the box.

This might be a bug for someone, but I suspect at this point this
is part of the ABI.

But macvtap bypasses most of the host networking stack,
So I worry that suddenly showing these packets would be confusing.

Assuming we want to show this traffic, I think it's preferable to
-		if (!ptype->dev || ptype->dev == skb->dev) {
+		if (ptype->dev == skb->dev) {
so you have to bind to macvtap explicitly to see the traffic.


Of course when you start binding to specific macvtaps
it doesn't scale well because we'll have a long list
on ptype_all suddenly.
This might mean we need to keep this list per-device ...

> A more complicated solution would have been to give
> macvtap its own rx_handler and return RX_HANDLER_ANOTHER during
> receive operation to make the packet go through another round of
> capturing and hit the macvtap rx_handler.  I thought this would
> hurt performance too much for no real gain.
> 
> Thanks
> -vlad
> 
>  drivers/net/macvtap.c     |  4 +++-
>  include/linux/netdevice.h |  2 ++
>  net/core/dev.c            | 33 ++++++++++++++++++++++++++-------
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index dc76670..0ed8fae 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -334,6 +334,7 @@ drop:
>   */
>  static int macvtap_receive(struct sk_buff *skb)
>  {
> +	netif_receive_skb_taps(skb, true);
>  	skb_push(skb, ETH_HLEN);
>  	return macvtap_forward(skb->dev, skb);
>  }
> @@ -727,8 +728,9 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
>  		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
>  	}
>  	if (vlan) {
> +		skb->dev = vlan->dev;
>  		local_bh_disable();
> -		macvlan_start_xmit(skb, vlan->dev);
> +		dev_hard_start_xmit(skb, vlan->dev, NULL, NULL);
>  		local_bh_enable();
>  	} else {
>  		kfree_skb(skb);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8b3de7c..84880eb 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2383,6 +2383,8 @@ void dev_kfree_skb_any(struct sk_buff *skb);
>  int netif_rx(struct sk_buff *skb);
>  int netif_rx_ni(struct sk_buff *skb);
>  int netif_receive_skb(struct sk_buff *skb);
> +struct packet_type *netif_receive_skb_taps(struct sk_buff *skb,
> +					   bool last_deliver);
>  gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
>  void napi_gro_flush(struct napi_struct *napi, bool flush_old);
>  struct sk_buff *napi_get_frags(struct napi_struct *napi);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index da9c5e1..50f0ac4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3484,6 +3484,31 @@ static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
>  	}
>  }
>  
> +struct packet_type *netif_receive_skb_taps(struct sk_buff *skb,
> +					   bool last_deliver)
> +{
> +	struct packet_type *ptype, *pt_prev = NULL;
> +	struct net_device *orig_dev;
> +
> +	if (list_empty(&ptype_all))
> +		return NULL;
> +
> +	orig_dev = skb->dev;
> +	list_for_each_entry_rcu(ptype, &ptype_all, list) {
> +		if (!ptype->dev || ptype->dev == skb->dev) {
> +			if (pt_prev)
> +				deliver_skb(skb, pt_prev, orig_dev);
> +			pt_prev = ptype;
> +		}
> +	}
> +
> +	if (last_deliver && pt_prev)
> +		deliver_skb(skb, pt_prev, orig_dev);
> +
> +	return pt_prev;
> +}
> +EXPORT_SYMBOL_GPL(netif_receive_skb_taps);
> +
>  static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
>  {
>  	struct packet_type *ptype, *pt_prev;
> @@ -3535,13 +3560,7 @@ another_round:
>  	if (pfmemalloc)
>  		goto skip_taps;
>  
> -	list_for_each_entry_rcu(ptype, &ptype_all, list) {
> -		if (!ptype->dev || ptype->dev == skb->dev) {
> -			if (pt_prev)
> -				ret = deliver_skb(skb, pt_prev, orig_dev);
> -			pt_prev = ptype;
> -		}
> -	}
> +	pt_prev = netif_receive_skb_taps(skb, false);
>  
>  skip_taps:
>  #ifdef CONFIG_NET_CLS_ACT
> -- 
> 1.8.4.2