Re: [Qemu-devel] The maximum limit of virtual network device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>,
	"Wu, Jiaxin" <jiaxin.wu@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Alexander Bezzubikov <zuban32s@gmail.com>
Subject: Re: [Qemu-devel] The maximum limit of virtual network device
Date: Thu, 6 Jul 2017 12:24:12 +0300	[thread overview]
Message-ID: <537c874c-a5d4-30ce-6bb2-1a2bc345f66a@redhat.com> (raw)
In-Reply-To: <503895c7-4649-a568-2ee4-0fea1908fd60@redhat.com>

On 06/07/2017 11:31, Laszlo Ersek wrote:
> Hi Jiaxin,
> 
> it's nice to see a question from you on qemu-devel! :)
> 
> On 07/06/17 08:20, Wu, Jiaxin wrote:
>> Hello experts,
>>
>> We know QEMU has the capability to create the multiple network devices
>> in one QEMU guest with the -device syntax. But I met the below failure
>> when I'm trying to create more than 30 virtual devices with the each
>> TAP backend:
>>
>> qemu-system-x86_64: -device e1000: PCI: no slot/function available for
>> e1000, all in use.
>>
>> The corresponding QEMU command shows as following:
>>
>> sudo qemu-system-x86_64 \
>>    -pflash OVMF.fd \
>>    -global e1000.romfile="" \
>>    -netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no \
>>    -device e1000,netdev=hostnet0 \

[...]

>>    -netdev tap,id=hostnet29,ifname=tap29,script=no,downscript=no \
>>    -device e1000,netdev=hostnet29
>>
>>  From above,  the max limit of virtual network device in one guest is
>> about 29? If not, how can I avoid such failure? My use case is to
>> create more than 150 network devices in one guest. Please provide your
>> comments on this.
> 
> You are seeing the above symptom because the above command line
> instructs QEMU to do the following:
> - use the i440fx machine type,
> - use a single PCI bus (= the main root bridge),
> - add the e1000 cards to separate slots (always using function 0) on
>    that bus.
> 
> Accordingly, there are three things you can do to remedy this:
> 
> - Use the Q35 machine type and work with a PCI Express hierarchy rather
>    than a PCI hierarchy. I'm mentioning this only for completeness,
>    because it won't directly help your use case. But, I certainly want to
>    highlight "docs/pcie.txt". Please read it sometime; it has nice
>    examples and makes good points.
> 
> - Use multiple PCI bridges to attach the devices. For this, several ways
>    are possible:
> 
>    - use multiple root buses, with the pxb or pxb-pcie devices (see
>      "docs/pci_expander_bridge.txt" and "docs/pcie.txt")
> 
>    - use multiple normal PCI bridges
> 
>    - use multiple PCI Express root ports or downstream ports (but for
>      this, you'll likely have to use the PCI Express variant of the
>      e1000, namely e1000e)
> 
> - If you don't need hot-plug / hot-unplug, aggregate eights of e1000
>    NICs into multifunction PCI devices.
> > Now, I would normally recommend sticking with i440fx for simplicity.
> However, each PCI bridge requires 4KB of IO space (meaning (1 + 5) * 4KB
> = 24KB),  and OVMF on the i440fx does not support that much (only
> 0x4000). So, I'll recommend Q35 for IO space purposes; OVMF on Q35
> provides 0xA000 (40KB).

So if we use OVMF, going for Q35 gives us actually more IO space, nice!
However recommending Q35 for IO space seems odd :)

> 
> For scaling higher than this, a PCI Express hierarchy should be used
> with PCI Express devices that require no IO space at all. However, that
> setup is even more problematic *for now*; please see section "3. IO
> space issues" in "docs/pcie.txt". We have open OVMF and QEMU BZs for
> limiting IO space allocation to cases when it is really necessary:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=1344299
>    https://bugzilla.redhat.com/show_bug.cgi?id=1434740
> 
> Therefore I guess the simplest example I can give now is:
> - use Q35 (for a larger IO space),
> - plug a DMI-PCI bridge into the root bridge,
> - plug 5 PCI bridges into the DMI-PCI bridge,
> - plug 31 NICs per PCI bridge, each NIC into a separate slot.
> 

The setup looks OK to me (assuming OVMF is needed, otherwise
PC + pci-bridges will result in more devices),
I do have a little concern.
We want to deprecate the dmi-pci bridge since it does not support 
hot-plug (for itself or devices behind it).
Alexandr (CCed) is a GSOC student working on a generic
pcie-pci bridge that can (eventually) be hot-plugged
into a PCIe Root Port and keeps the machine cleaner.

See:
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg05498.html

If is a "lab" project it doesn't really matter, but I wanted
to point out the direction.

Thanks,
Marcel

> This follows the following recommendation of "2.3 PCI only hierarchy" in
> "docs/pcie.txt" (slightly rewrapped here):
> 
>> 2.3 PCI only hierarchy
>> ======================
>> Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
>> but, as mentioned in section 5, doing so means the legacy PCI
>> device in question will be incapable of hot-unplugging.
>> Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
>> with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
>>
>> Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
>> (having 32 slots) and several PCI-PCI Bridges attached to it (each
>> supporting also 32 slots) will support hundreds of legacy devices. The
>> recommendation is to populate one PCI-PCI Bridge under the DMI-PCI
>> Bridge until is full and then plug a new PCI-PCI Bridge...
> 
> Here's a command line. Please note that the OVMF boot may take quite
> long with this, as the E3522X2.EFI driver from BootUtil (-D
> E1000_ENABLE) binds all 150 e1000 NICs in succession! Watching the OVMF
> debug log is recommended.
> 
> qemu-system-x86_64 \
>    \
>    -machine q35,vmport=off,accel=kvm \
>    -pflash OVMF.fd \
>    -global e1000.romfile="" \
>    -m 2048 \
>    -debugcon file:debug.log \
>    -global isa-debugcon.iobase=0x402 \
>    \
>    -netdev tap,id=hostnet0,ifname=tap0,script=no,downscript=no \
[...]
>    -netdev tap,id=hostnet149,ifname=tap149,script=no,downscript=no \
>    \
>    -device i82801b11-bridge,id=dmi-pci-bridge \
>    \
>    -device pci-bridge,id=bridge-1,chassis_nr=1,bus=dmi-pci-bridge \
>    -device pci-bridge,id=bridge-2,chassis_nr=2,bus=dmi-pci-bridge \
>    -device pci-bridge,id=bridge-3,chassis_nr=3,bus=dmi-pci-bridge \
>    -device pci-bridge,id=bridge-4,chassis_nr=4,bus=dmi-pci-bridge \
>    -device pci-bridge,id=bridge-5,chassis_nr=5,bus=dmi-pci-bridge \
>    \
>    -device e1000,netdev=hostnet0,bus=bridge-1,addr=0x1.0 \
[...]
>    -device e1000,netdev=hostnet149,bus=bridge-5,addr=0x1a.0
> 
> Thanks
> Laszlo
>

next prev parent reply	other threads:[~2017-07-06  9:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-06  6:20 [Qemu-devel] The maximum limit of virtual network device Wu, Jiaxin
2017-07-06  8:11 ` Daniel P. Berrange
2017-07-06  8:31 ` Laszlo Ersek
2017-07-06  9:24   ` Marcel Apfelbaum [this message]
2017-07-06  9:46     ` Laszlo Ersek
2017-07-06 14:49       ` Wu, Jiaxin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537c874c-a5d4-30ce-6bb2-1a2bc345f66a@redhat.com \
    --to=marcel@redhat.com \
    --cc=jiaxin.wu@intel.com \
    --cc=lersek@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zuban32s@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).