From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:38110)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fabien.lementec@gmail.com>) id 1TbTiI-00050R-TG
	for qemu-devel@nongnu.org; Thu, 22 Nov 2012 05:08:59 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <fabien.lementec@gmail.com>) id 1TbTiH-00058M-HP
	for qemu-devel@nongnu.org; Thu, 22 Nov 2012 05:08:54 -0500
Received: from mail-ie0-f173.google.com ([209.85.223.173]:35432)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <fabien.lementec@gmail.com>) id 1TbTiH-00058G-95
	for qemu-devel@nongnu.org; Thu, 22 Nov 2012 05:08:53 -0500
Received: by mail-ie0-f173.google.com with SMTP id e13so1596091iej.4
	for <qemu-devel@nongnu.org>; Thu, 22 Nov 2012 02:08:52 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <20121122081942.GC7598@stefanha-thinkpad.redhat.com>
References: <CAAPMK-T4pmMr+SR8miDzDtLuGtS+5Q6yqJk4C80M8PAX21gYKA@mail.gmail.com>
	<CAJSP0QU6T3=7fzLXgmbuoTWtxN1vxWtLEu1KupocNZHX-Pvqeg@mail.gmail.com>
	<CAAPMK-RyEYYsVVtV2PJnhuvkfzsGrJkzcsQYLocShFDU=QP+Ng@mail.gmail.com>
	<20121119085528.GD17444@stefanha-thinkpad.redhat.com>
	<CAAPMK-RmsrC9McUecguwwCxWaaGZdxzr6w5SZxEeAGEYiFEzCg@mail.gmail.com>
	<20121122081942.GC7598@stefanha-thinkpad.redhat.com>
Date: Thu, 22 Nov 2012 11:08:51 +0100
Message-ID: <CAAPMK-TPWhS3Pr-mRdv-5POuh4xAVb8EWK8Ke1B8B=pkozzNBQ@mail.gmail.com>
From: lementec fabien <fabien.lementec@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: [Qemu-devel] TCP based PCIE request forwarding
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Cam Macdonell <cam@cs.ualberta.ca>, Nick Gasson <nick@nickg.me.uk>, qemu-devel <qemu-devel@nongnu.org>, fred.konrad@greensocs.com

Hi,

Thanks for the feedback, I will modify the previous document
to include the changes you mentionned. I reply here too.

2012/11/22 Stefan Hajnoczi <stefanha@gmail.com>:
> On Wed, Nov 21, 2012 at 03:27:48PM +0100, lementec fabien wrote:
>> usage
>> -----
>> PCIEFW devices are instanciated using the following QEMU options:
>> -device \
>>  pciefw,\
>>  laddr=<local_addr>,\
>>  lport=<local_port>,\
>>  raddr=<remote_addr>,\
>>  rport=<remote_port>
>
> Take a look at qemu_socket.h:socket_parse().  It should allow you to
> support TCP, UNIX domain sockets, and arbitrary file descriptors.
>

ok, I will have a look what it implies to support arbitrary file descriptor.
For instance, my current implementation does not work with UDP sockets.
It assumes a reliable, ordered transport layer, whose OS API is not datagram
oriented.

>> implementation
>> --------------
>> PCIEFW is a PCIE accesses forwarding device added to the QEMU source tree. At
>> initialization, this device opens a bidirectionnal point to point communication
>> channel with an external process. This process actually implements the PCIE
>> endpoint. That is, a PCIE access made by QEMU is forwarded to the process.
>> Reciprocally, replies and interrupts messages from the process are forwarded
>> to QEMU.
>>
>> The commnication currently relies on a bidirectionnal point to point TCP
>
> s/commnication/communication/
>
>> socket based channel. Byte ordering is little endian.
>>
>> PCIEFW initiates a request upon access from QEMU. It sends a message whose
>> format is described by the pciefw_msg_t type:
>>
>> typedef struct pciefw_msg
>> {
>> #define PCIEFW_MSG_MAX_SIZE (offsetof(pciefw_msg_t, data) + 0x1000)
>
> The size field is uint16_t.  Do you really want to limit to 4 KB of
> data?
>

My first implementation required to allocate a fixed size buffer. It
is no longer
the case (with non datagram oriented IO operations) since I included the header
that contains the message size. Since PCIE maximum payload size is 0x1000,
it was an obvious choice. Of course, it is an arbitrary choice.

>>
>>   pciefw_header_t header;
>>
>> #define PCIEFW_OP_READ_CONFIG 0
>> #define PCIEFW_OP_WRITE_CONFIG 1
>> #define PCIEFW_OP_READ_MEM 2
>> #define PCIEFW_OP_WRITE_MEM 3
>> #define PCIEFW_OP_READ_IO 4
>> #define PCIEFW_OP_WRITE_IO 5
>> #define PCIEFW_OP_INT 6
>> #define PCIEFW_OP_MSI 7
>> #define PCIEFW_OP_MSIX 8
>>
>>   uint8_t op; /* in PCIEFW_OP_XXX */
>>   uint8_t bar; /* in [0:5] */
>>   uint8_t width; /* access in 1, 2, 4, 8 */
>>   uint64_t addr;
>>   uint16_t size; /* data size, in bytes */
>
> Why is are both width and size fields?  For read-type operations the
> size field would indicate how many bytes to read.  For write-type
> operations the size field would indicate how many bytes are included in
> data[].
>

Actually, the width field is currently not required. I included it to
allow multiple
contiguous accesses in 1 operation (where count = size / width). The device
would still need know the width of individual accesses in this case. But this is
not used.

>>   uint8_t data[1];
>>
>> } __attribute__((packed)) pciefw_msg_t;
>>
>> Note that data is a variable length field.
>>
>> The PCIE endpoint process replies with a pciefw_reply_t formatted message:
>>
>> typedef struct pciefw_reply
>> {
>>   pciefw_header_t header;
>>   uint8_t status;
>
> What values does this field take?
>

I will define a PCIEFW_STATUS_XXX

>>   uint8_t data[8];
>> } __attribute__((packed)) pciefw_reply_t;
>>
>> The PCIE endpoint process can initiate pciefw_msg_t to perform write operations
>> of its own. This is used to perform data transfer (DMA engines ...) and send
>> interrupts.
>
> Any flow control rules?  For example, can the endpoint raise an
> interrupt while processing a message (before it sends a reply)?
>

Currently, messages are not identified so the transport is assumed
to be made in order. In practice, it works because of the LINUX
application I use does not starts 2 DMA transfers in parallel. But a
protocol cannot rely on such assumptions. Plus, I assume QEMU
can eventually make 2 PCIE concurrent accesses on the same
device which would lead to 2 replies. so I will add an identifier field.

>> Both types start with a pciefw_header containing the total size:
>>
>> typedef struct pciefw_header
>> {
>>   uint16_t size;
>> } __attribute__((packed)) pciefw_header_t;
>
> A "hello" message type would be useful so that you can extend the
> protocol in the future.  The message would contain feature bits or a
> version number.
>

I did think about it. More generally, it would be useful to have a control
message to allow an endpoint to be disconnected then reconnected
without having to reboot QEMU. It is very useful when developping a
new device.

> Stefan

I will send you the modified document,

Thanks,

Fabien.