qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Anton Ivanov <anton.ivanov@cambridgegreys.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] Unified Datagram Socket Transport - raw support
Date: Mon, 24 Jul 2017 12:03:06 +0800	[thread overview]
Message-ID: <2c69da12-2b18-2272-6eff-c6223667aad0@redhat.com> (raw)
In-Reply-To: <b1194532-a97a-33d3-92d4-8531674d358d@cambridgegreys.com>



On 2017年07月22日 02:50, Anton Ivanov wrote:
>
> [snip]
>
>>> +    "-netdev raw,id=str,ifname=ifname\n"
>>> +    "                configure a network backend with ID 'str' 
>>> connected to\n"
>>> +    "                an Ethernet interface named ifname via raw 
>>> socket.\n"
>>> +    "                This backend does not change the interface 
>>> settings.\n"
>>> +    "                Most interfaces will require being set into 
>>> promisc mode,\n"
>>> +    "                as well having most offloads (TSO, etc) turned 
>>> off.\n"
>>> +    "                Some virtual interfaces like tap support only 
>>> RX.\n"
>>
>> Pay attention that qemu supports vnet header. So any reason to turn 
>> off e.g TSO here?
>
> I am not aware of any means to get extra info like checksums, etc show 
> up on raw socket read.
>
> If you know a way to make them show up, this is worth investigating.

See packet_rcv_vnet(). But a known 'issue' for raw socket is that it 
forbids change vnet header length after creation, we may need some 
workaround in qemu.

>
>>
>>>   #endif
>>>       "-netdev 
>>> socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]\n"
>>>       "                configure a network backend to connect to 
>>> another network\n"
>>> @@ -2463,6 +2470,32 @@ qemu-system-i386 linux.img -net nic -net 
>>> gre,src=4.2.3.1,dst=1.2.3.4
>>>     @end example
>>>   +@item -netdev raw,id=@var{id},ifname=@var{ifname}
>>> +@itemx -net raw[,vlan=@var{n}][,name=@var{name}],ifname=@var{ifname}
>>> +Connect VLAN @var{n} directly to an Ethernet interface using raw 
>>> socket.
>>> +
>>> +This transport allows a VM to bypass most of the network stack 
>>> which is
>>> +extremely useful for tapping.
>>> +
>>> +@item ifname=@var{ifname}
>>> +    interface name (mandatory)
>>> +
>>> +@example
>>> +# set up the interface - put it in promiscuous mode and turn off 
>>> offloads
>>> +ifconfig eth0 up
>>> +ifconfig eth0 promisc
>>> +
>>> +/sbin/ethtool -K eth0 gro off
>>> +/sbin/ethtool -K eth0 tso off
>>> +/sbin/ethtool -K eth0 gso off
>>> +/sbin/ethtool -K eth0 tx off
>>
>> Any reason to turn off tx here?
>
> Yes - we already have it computed and we have written it as is as a 
> whole packet. You do not want it
> re-computed as at least some adapters do silly things if you start 
> writing raw and the checksum already exists.

This looks like a bug of the driver?

For GRO it's easier to understand since guest may not handle big packets 
with partial checksum. But tso,gso,tx, this still looks questionable for 
the nic which may want to offload them to card (e.g virtio-net).

>
> Once again, this one of the pros/cons of using tpacket vs recv/send 
> (with or without mmsg) on a raw socket.
>
> recvm(m)sg/sendm(m)sg are brute force as far as offloads, but things 
> like scatter/gather work correctly so there are little copies.
>
> Compared to that, tpacket will allow you some access to checksumming 
> which you can map onto checksum offload in a vNIC. As a payback for 
> this you end up copying in more cases than for send/recvmmsg and you 
> pay penalty for timestamping if you do not have a hardware timestamp 
> source in the NIC.
>
> The other issue I always had with tpacket is that you "see" your own 
> packets so you have to manage a  RX side BPF filter which removes 
> those so you do not see your own packets.

Don't get here, looks like I don't get this 'issue'. Anyway we can 
discuss this when I post the tpacket backend.

Thanks.

> That can get quite interesting if you have a lot of MACs on a NIC 
> (f.e. when there are multicast apps). Not sure if this is still the 
> case - it definitely was in mid 3.x Linux kernels. If you use raw 
> sendm(m)sg there is no issue - the packets are not looped when writing 
> to physical interfaces.
>
>>
>>> +
>>> +# launch QEMU instance - if your network has reorder or is very 
>>> lossy add ,pincounter
>>> +
>>> +qemu-system-i386 linux.img -net nic -net raw,ifname=eth0
>>
>> Can we switch to use -netdev here?
>
> This is done in the new revisions.
>
>>
>> Thanks
>>
>>> +
>>> +@end example
>>> +
>>>   @item -netdev 
>>> vde,id=@var{id}[,sock=@var{socketpath}][,port=@var{n}][,group=@var{groupname}][,mode=@var{octalmode}]
>>>   @itemx -net 
>>> vde[,vlan=@var{n}][,name=@var{name}][,sock=@var{socketpath}] 
>>> [,port=@var{n}][,group=@var{groupname}][,mode=@var{octalmode}]
>>>   Connect VLAN @var{n} to PORT @var{n} of a vde switch running on 
>>> host and
>>
>>

  reply	other threads:[~2017-07-24  4:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-18 17:08 [Qemu-devel] Unified Socket Driver anton.ivanov
2017-07-18 17:08 ` [Qemu-devel] [PATCH 1/3] Unified Datagram Socket Transport anton.ivanov
2017-07-19  5:39   ` Jason Wang
2017-07-19  5:48     ` Anton Ivanov
2017-07-19  6:07       ` Jason Wang
2017-07-19  6:48         ` Anton Ivanov
2017-07-21 17:50     ` Anton Ivanov
2017-07-24  3:51       ` Jason Wang
2017-07-18 17:08 ` [Qemu-devel] [PATCH 2/3] Unified Datagram Socket Transport - GRE support anton.ivanov
2017-07-19  5:48   ` Jason Wang
2017-07-19  5:50     ` Anton Ivanov
2017-07-19 14:40   ` Eric Blake
2017-07-19 14:46     ` Anton Ivanov
2017-07-19 17:32     ` Anton Ivanov
2017-07-21 19:14       ` Eric Blake
2017-07-22  7:52         ` Anton Ivanov
2017-07-18 17:08 ` [Qemu-devel] [PATCH 3/3] Unified Datagram Socket Transport - raw support anton.ivanov
2017-07-19  5:58   ` Jason Wang
2017-07-19  6:02     ` Anton Ivanov
2017-07-21 18:50     ` Anton Ivanov
2017-07-24  4:03       ` Jason Wang [this message]
2017-09-08 17:22         ` Anton Ivanov
2017-07-19 14:42   ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c69da12-2b18-2272-6eff-c6223667aad0@redhat.com \
    --to=jasowang@redhat.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).