From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38872) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtUOx-0006FG-1t for qemu-devel@nongnu.org; Tue, 03 Nov 2015 00:45:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZtUOr-0006DT-W6 for qemu-devel@nongnu.org; Tue, 03 Nov 2015 00:44:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34560) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZtUOr-0006DN-Lb for qemu-devel@nongnu.org; Tue, 03 Nov 2015 00:44:53 -0500 References: <1445792408-28571-1-git-send-email-leonid.bloch@ravellosystems.com> <563060C1.7030209@redhat.com> <5632FFF6.8000003@redhat.com> <5C35993E-34F3-4C1E-938D-4B1F6F0EB136@daynix.com> <5636DA1E.5090308@redhat.com> <5941010E-D55E-4692-B128-BDD9D7B086FE@daynix.com> From: Jason Wang Message-ID: <563849CD.5010402@redhat.com> Date: Tue, 3 Nov 2015 13:44:45 +0800 MIME-Version: 1.0 In-Reply-To: <5941010E-D55E-4692-B128-BDD9D7B086FE@daynix.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH 0/5] Introduce Intel 82574 GbE Controller Emulation (e1000e) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dmitry Fleytman Cc: Leonid Bloch , Leonid Bloch , qemu-devel@nongnu.org, Shmulik Ladkani On 11/02/2015 03:49 PM, Dmitry Fleytman wrote: > >> On 2 Nov 2015, at 05:35 AM, Jason Wang > > wrote: >> >> >> >> On 10/31/2015 01:52 PM, Dmitry Fleytman wrote: >>> Hello Jason, >>> >>> Thanks for reviewing. See my answers inline. >>> >>> >>>> On 30 Oct 2015, at 07:28 AM, Jason Wang >>> >>>> > wrote: >>>> >>>> >>>> >>>> On 10/28/2015 01:44 PM, Jason Wang wrote: >>>>> >>>>> On 10/26/2015 01:00 AM, Leonid Bloch wrote: >>>>>> Hello qemu-devel, >>>>>> >>>>>> This patch series is an RFC for the new networking device emulatio= n >>>>>> we're developing for QEMU. >>>>>> >>>>>> This new device emulates the Intel 82574 GbE Controller and works >>>>>> with unmodified Intel e1000e drivers from the Linux/Windows kernel= s. >>>>>> >>>>>> The status of the current series is "Functional Device Ready, work >>>>>> on Extended Features in Progress". >>>>>> >>>>>> More precisely, these patches represent a functional device, which >>>>>> is recognized by the standard Intel drivers, and is able to transf= er >>>>>> TX/RX packets with CSO/TSO offloads, according to the spec. >>>>>> >>>>>> Extended features not supported yet (work in progress): >>>>>> 1. TX/RX Interrupt moderation mechanisms >>>>>> 2. RSS >>>>>> 3. Full-featured multi-queue (use of multiqueued network backend) >>>>>> >>>>>> Also, there will be some code refactoring and performance >>>>>> optimization efforts. >>>>>> >>>>>> This series was tested on Linux (Fedora 22) and Windows (2012R2) >>>>>> guests, using Iperf, with TX/RX and TCP/UDP streams, and various >>>>>> packet sizes. >>>>>> >>>>>> More thorough testing, including data streams with different MTU >>>>>> sizes, and Microsoft Certification (HLK) tests, are pending missin= g >>>>>> features' development. >>>>>> >>>>>> See commit messages (esp. "net: Introduce e1000e device emulation"= ) >>>>>> for more information about the development approaches and the >>>>>> architecture options chosen for this device. >>>>>> >>>>>> This series is based upon v2.3.0 tag of the upstream QEMU reposito= ry, >>>>>> and it will be rebased to latest before the final submission. >>>>>> >>>>>> Please share your thoughts - any feedback is highly welcomed :) >>>>>> >>>>>> Best Regards, >>>>>> Dmitry Fleytman. >>>>> Thanks for the series. Will go through this in next few days. >>>> >>>> Have a quick glance at the series, got the following questions: >>>> >>>> - Though e1000e differs from e1000 in many places, I still see lots = of >>>> code duplications. We need consider to reuse e1000.c (or at least pa= rt >>>> of). I believe we don't want to fix a bug twice in two places in the >>>> future and I expect hundreds of lines could be saved through this wa= y. >>> >>> That=E2=80=99s a good question :) >>> >>> This is how we started, we had a common =E2=80=9Ccore=E2=80=9D code b= ase meant to >>> implement all common logic (this split is still present in the patche= s >>> - there are e1000e_core.c and e1000e.c files). >>> Unfortunately at some point it turned out that there are more >>> differences that commons. We noticed that the code becomes filled wit= h >>> many minor differences handling. >>> This also made the code base more complicated and harder to follow. >>> >>> So at some point of time it was decided to split the code base and >>> revert all changes done to the e1000 device (except a few >>> fixes/improvements Leonid submitted a few days ago). >>> >>> Although there was common code between devices, total SLOC of e1000 >>> and e1000e devices became smaller after the split. >>> >>> Amount of code that may be shared between devices will be even smalle= r >>> after we complete the implementation which still misses a few feature= s >>> (see cover letter) that will change many things. >>> >>> Still after the device implementation is done, we plan to review code >>> similarities again to see if there are possibilities for code sharing= . >> >> I see, but if we can try to re-use or unify the codes from beginning, = it >> would be a little bit easier. Looks like the differences were mainly: >> >> 1) MSI-X support >> 2) offloading support through virtio-net header >> 3) trace points >> 4) other new functions through e1000e specific registers >> >> So we could first unify the code through implementing the support of 2 >> and 3 for e1000. For MSI-X and other e1000e specific new functions, it >> could be done through: >> >> 1) model specific callbacks, e.g realize, transmission and reception >> 2) A new register flags e.g PHY_RW_E1000E which means the register is >> for e1000e only. Or even model specific wirteops and readops >> 3) For other subtle differences, it could be done in the code by >> checking the model explicitly. >> >> What's your opinion? (A good example of code sharing is freebsd's e100= 0 >> driver which covers both 8254x and 8257x). > > > Hi Jason, > > This is exactly how we started. > > Issues that made us split the code base were following: > > 1. The majority of registers are different. Even same registers in > many cases have different bits and require different processing logic. > This introduced too much if=E2=80=99s into the code; Then we can probably have different writeops and readops. This way, we can at least save the codes of common registers. > 2. The data path is totally different not just because of virtio > headers but also because these devices use different descriptor > layouts and require different logic in order to parse and fill those. > There are legacy descriptors that look almost the same but of course > we must support all descriptor types described by spec; Yes, but looks like the only extend rx/tx descriptors is 8257x specific. And 8257x fully supports both legacy and context descriptor of 8254x. This give us another chance to reuse the code. > 3. Interrupt handling logic is different because of MSI support; Right, but this is not hard to address, probably a new helper. > 4. Mutli-queue and RSS make the code even less similar to the old devic= e; Yes, this could be in 8275x specific file. > 5. Changes required to isolate shared code base required changes in > migration tables and fishy tricks to preserve compatibility with > current implementation; Since 8257x is a totally new device, it can has its own vmstate if it's simpler to be implemented and we don't even need to care migration compatibility. > 6. e1000 code suffered from massive changes which are very hard to > verify because spec is big and there are no drivers that use all > device features. Then we can try to change e1000 as mini as possible. E.g just factor out the common logic to helpers to reuse it in e1000e. > > Overall, code for handling differences in device behaviours was bigger > and more complicated then the device logic itself. The difference is > not subtle when it comes to the full featured device implementation. > As for FreeBSD=E2=80=99s driver, I=E2=80=99m not deeply familiar with i= ts code but I > suspect it works with device in legacy mode which is pretty similar to > an old device indeed. Since we must support all modes our situation is > different. Yes, it does not use extended descriptor format. > > Again, I=E2=80=99m totally into shared code and would like to have some= common > code base anyway. Currently it looks like the best way to achieve this > is to finish with all device features and then see what parts of logic > may be shared between the old and the new devices. It=E2=80=99s better = to have > slight code duplication and smaller shared code base than to have > bloated and tricky shared code that will introduce its own problems to > both devices. The code duplication is not slight in this rfc :). So the code has the possibility to be unified. But I'm ok to evaluate this after all features were developed. Thanks > > Best Regards, > Dmitry > >> >>> >>>> - For e1000e it self, since it was a new device, so no need to care >>>> about compatibility stuffs (e.g auto negotiation and mit). We can ju= st >>>> enable them forever. >>> >>> Yes, we have this in plans. >>> >>>> - And for the generic packet abstraction layer, what's the >>>> advantages of >>>> this? If it has lot, maybe we can use it in other nic model (e.g >>>> virtio-net)? >>> >>> These abstractions were initially developed by me as a part of vmxnet= 3 >>> device to be generic and re-usable. Their main advantage is support >>> for virtio headers for virtio-enabled backends and emulation of >>> network offloads in software for backends that do not support virtio. >>> >>> Of course they may be re-used by virtio, however I=E2=80=99m not sure= if it >>> will be really useful because virtio has feature negotiation >>> facilities and do not require SW emulation for network task offloads. >>> >>> For other devices they are useful because each and every device that >>> requires SW offloads implementation need to do exactly the same thing= s >>> and it doesn=E2=80=99t make sense to have a few implementations for t= his. >>> >>> Best Regards, >>> Dmitry >> >> Ok, thanks for the explanation. >> >>> >>>> >>>> Thanks >>>> >>>>> >>>>> Since 2.5 is in soft freeze, this looks a 2.6 material. >