All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Pyrgiotis <apyrgio@arrikto.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
Date: Fri, 19 Feb 2016 13:50:00 +0200	[thread overview]
Message-ID: <56C70168.2050408@arrikto.com> (raw)
In-Reply-To: <56BC6DC0.4070204@redhat.com>

Hi Paolo,

Sorry for the late reply and thanks for the review. See my comments inline:

On 02/11/2016 01:17 PM, Paolo Bonzini wrote:
> 
> 
> On 16/12/2015 17:55, Alex Pyrgiotis wrote:
>> +/*
>> + * Create a QEMUIOVector from a scatter-gather list.
>> + *
>> + * This function does not copy the data of the scatter-gather list. Instead, it
>> + * uses the dma_memory_map() function to map physical memory regions of the
>> + * virtual device (as interpreted by the guest kernel) into the address space
>> + * of the QEMU process, in order to have access to the data.
>> + */
>> +static void dma_map_sg(DMAAIOCB *dbs)
> 
> In special cases where the QEMUSGList includes MMIO regions, dma_map_sg
> might not be able to map the whole list.  In this case, for regular I/O
> it is possible to break the operation in multiple steps---in fact, this
> breaking of requests is the main purpose of most of the code in
> dma-helpers.c.

You're right. I've written a comment about that yet, somehow, the
implications of rescheduling a DMA operation flew over my head. Great.

So, I've tried to grasp the current situation and this is what I've got
so far. Please correct me where I'm wrong.


Guest kernel space:
1. The guest kernel wants to send a SCSI request to a (para)virtual SCSI
   controller.
2. This SCSI request involves a list of physical address ranges.
   Whether the request only refers to a single contiguous physical
   range, or whether it is a scatter-gather list depends on the
   implementation of the kernel's Low-level Device Driver (LLDD) and
   the capabilities of the SCSI controller.
3. Some of these ranges may include MMIO regions. Since I'm not aware
   of a scenario where this happens, can you provide an example?
4. The LLDD writes the SCSI CDB to the SCSI controller's queue and
   informs the SCSI controller by writing to some sort of doorbell
   register.

QEMU/Hardware space:
5. The SCSI controller code will create a QEMUSGList that points to
   the memory regions of the SCSI request. This QEMUSGList will also
   include the MMIO regions.
6. The QEMU device implementation, e.g. scsi-block, chooses to use
   the dma_* interface.
7. The dma_blk_read/write() code will ultimately attempt to map all the
   memory regions pointed by the QEMUSGList in order to create a
   QEMUIOVector.
8. At some point during the mapping loop, the code will encounter an
   MMIO region. Since reading and writing from/to an MMIO region
   requires  special handling, e.g., we need to call
   MemoryRegion->ops->write(), we cannot include it in our read/write
   system call to the host kernel.
9. This leads to a partial read/write and the mapping loop will resume
   once the partial read/write() has finished.


Are we in the same page so far? If so, let's dig into the
dma_blk_read/write() logic. The following is a simplified pseudocode:


	allocate QEMUIOVector
callback:

	# Reset any changes to QEMUIOVector from previous operations
	for address in QEMUIOVector:
		unmap address
	reset QEMUIOVector

	# Check if the previous IO operation was the last one and if so,
	# call the completion callback of the user and return.
	if operation is completed:
		dma_complete()
		return

	offset = adjust read/write offset
	# Map each sg of QEMUSGList and add it to the QEMUIOVector
	for sg in QEMUSGList:
		address = dma_memory_map(sg)
		if address == NULL:
			break
		add address to QEMUIOVector

	# When either of the following operations have finished, restart
	# this DMA operation.
	if nothing mapped:
		# If we can't map anything, we need to retry
		create reschedule_dma callback
		cpu_register_map_client(callback)
	else:
		# Else, we can perform a partial read/write()
		do readv/writev(offset, QEMUIOVector, callback)
	

Are the above OK? If so, I have some questions:

a) Is an MMIO region one of the reasons why we can't map an sg?
b) At which point will the relevant ops->write() method for the MMIO
   region be called when we have to DMA into the region?? Is it handled
   implicitly in dma_memory_map()?
c) I'm not quite sure about the logic of the "nothing mapped" section.
   Correct me if I'm wrong, but what I think it does is that it
   registers a callback (reschedule_dma) once some sort of mapping has
   completed. What kind of mapping is this? Is there anything more to
   it?

> However, it is not possible to do the same for ioctls.  This is actually
> the reason why no one has ever tried to make scsi-generic do anything
> but bounce-buffering. I think that your code breaks horribly in this
> case, and I don't see a way to fix it, except for reverting to bounce
> buffering.
>
> This would require major changes in your patches, and I'm not sure
> whether they are worth it for the single use case of tape devices...

Well, I wouldn't narrow it down to tape devices. The scsi-generic
interface is the universal interface for all SCSI devices and the only
interface that is fully passthrough. So this patch would effectively
boost the baseline performance of SCSI devices. I think it's worth a try.

Thanks,
Alex

  reply	other threads:[~2016-02-19 11:50 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-16 16:55 [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic Alex Pyrgiotis
2016-02-11 11:17   ` Paolo Bonzini
2016-02-19 11:50     ` Alex Pyrgiotis [this message]
2016-02-22 10:43       ` Paolo Bonzini
2016-02-25 10:10         ` Alex Pyrgiotis
2016-02-25 10:22           ` Paolo Bonzini
2016-02-25 11:19             ` Alex Pyrgiotis
2016-02-25 13:01               ` Paolo Bonzini
2016-02-26  9:20           ` Kevin Wolf
2015-12-16 16:55 ` [Qemu-devel] [PATCH 2/9] dma-helpers: Add support for ioctl operations Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 3/9] dma-helpers: Do not truncate small qiovs Alex Pyrgiotis
2016-02-11 11:08   ` Paolo Bonzini
2015-12-16 16:55 ` [Qemu-devel] [PATCH 4/9] scsi-generic: Add common functions Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 5/9] scsi-generic: Separate `sg_io_hdr' initializations Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 6/9] scsi-generic: Make request execution buf-specific Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 7/9] scsi-generic: Make data-copying logic clearer Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 8/9] scsi-generic: Factor out response interception Alex Pyrgiotis
2015-12-16 16:55 ` [Qemu-devel] [PATCH 9/9] scsi-generic: Allow full scatter-gather support Alex Pyrgiotis
2015-12-16 18:16 ` [Qemu-devel] [PATCH 0/9] Add full scatter-gather support for SCSI generic devices Paolo Bonzini
2015-12-17  8:47   ` Alex Pyrgiotis
2015-12-17 10:31     ` Paolo Bonzini
2015-12-17 13:10       ` Alex Pyrgiotis
2015-12-17 13:13         ` Paolo Bonzini
2015-12-21 10:58           ` Alex Pyrgiotis
2016-01-11 13:30           ` Alex Pyrgiotis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C70168.2050408@arrikto.com \
    --to=apyrgio@arrikto.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.