All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, jcody@redhat.com, jsnow@redhat.com,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests
Date: Tue, 27 Oct 2015 11:58:55 +0100	[thread overview]
Message-ID: <562F58EF.9050709@kamp.de> (raw)
In-Reply-To: <20151026103949.GA20111@stefanha-x1.localdomain>

Am 26.10.2015 um 11:39 schrieb Stefan Hajnoczi:
> On Mon, Oct 12, 2015 at 02:27:24PM +0200, Peter Lieven wrote:
>> this patch adds a new aio readv compatible function which copies
>> all data through a bounce buffer. The benefit is that these requests
>> can be flagged as canceled to avoid guest memory corruption when
>> a canceled request is completed by the backend at a later stage.
>>
>> If an IDE protocol wants to use this function it has to pipe
>> all read requests through ide_readv_cancelable and it may then
>> enable requests_cancelable in the IDEState.
>>
>> If this state is enable we can avoid the blocking blk_drain_all
>> in case of a BMDMA reset.
>>
>> Currently only read operations are cancelable thus we can only
>> use this logic for read-only devices.
> Naming is confusing here.  Requests are already "cancelable" using
> bdv_aio_cancel().
>
> Please use a different name, for example "orphan" requests.  These are
> requests that QEMU still knows about but the guest believes are
> complete.  Or maybe "IDEBufferedRequest" since data is transferred
> through a bounce buffer.
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>   hw/ide/core.c     | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/ide/internal.h | 16 ++++++++++++++++
>>   hw/ide/pci.c      | 42 ++++++++++++++++++++++++++++--------------
>>   3 files changed, 98 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 317406d..24547ce 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -561,6 +561,59 @@ static bool ide_sect_range_ok(IDEState *s,
>>       return true;
>>   }
>>   
>> +static void ide_readv_cancelable_cb(void *opaque, int ret)
>> +{
>> +    IDECancelableRequest *req = opaque;
>> +    if (!req->canceled) {
>> +        if (!ret) {
>> +            qemu_iovec_from_buf(req->org_qiov, 0, req->buf, req->org_qiov->size);
>> +        }
>> +        req->org_cb(req->org_opaque, ret);
>> +    }
>> +    QLIST_REMOVE(req, list);
>> +    qemu_vfree(req->buf);
>> +    qemu_iovec_destroy(&req->qiov);
>> +    g_free(req);
>> +}
>> +
>> +#define MAX_CANCELABLE_REQS 16
>> +
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> +                                 QEMUIOVector *iov, int nb_sectors,
>> +                                 BlockCompletionFunc *cb, void *opaque)
>> +{
>> +    BlockAIOCB *aioreq;
>> +    IDECancelableRequest *req;
>> +    int c = 0;
>> +
>> +    QLIST_FOREACH(req, &s->cancelable_requests, list) {
>> +        c++;
>> +    }
>> +    if (c > MAX_CANCELABLE_REQS) {
>> +        return NULL;
>> +    }
> A BH is probably needed here to schedule an cb(-EIO) call since this
> function isn't supposed to return NULL if it's a direct replacement for
> blk_aio_readv().

You mean sth like:

acb = qemu_aio_get(&bdrv_em_aiocb_info, bs, cb, opaque);
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_aio_bh_cb, acb);
acb->ret = -EIO;
qemu_bh_schedule(acb->bh);

return &acb->common;

>
>> +
>> +    req = g_new0(IDECancelableRequest, 1);
>> +    qemu_iovec_init(&req->qiov, 1);
> It saves a g_new() call if you add a struct iovec field to
> IDECancelableRequest and use qemu_iovec_init_external() instead of
> qemu_iovec_init().
>
> The qemu_iovec_destroy() calls must be dropped when an external struct
> iovec is used.
>
> The qemu_iovec_init_external() call must be moved after the
> qemu_blockalign() and struct iovec setup below.

okay

>
>> +    req->buf = qemu_blockalign(blk_bs(s->blk), iov->size);
>> +    qemu_iovec_add(&req->qiov, req->buf, iov->size);
>> +    req->org_qiov = iov;
>> +    req->org_cb = cb;
>> +    req->org_opaque = opaque;
>> +
>> +    aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
>> +                           ide_readv_cancelable_cb, req);
>> +    if (aioreq == NULL) {
>> +        qemu_vfree(req->buf);
>> +        qemu_iovec_destroy(&req->qiov);
>> +        g_free(req);
>> +    } else {
>> +        QLIST_INSERT_HEAD(&s->cancelable_requests, req, list);
>> +    }
>> +
>> +    return aioreq;
>> +}
>> +
>>   static void ide_sector_read(IDEState *s);
>>   
>>   static void ide_sector_read_cb(void *opaque, int ret)
>> @@ -805,6 +858,7 @@ void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
>>       s->bus->retry_unit = s->unit;
>>       s->bus->retry_sector_num = ide_get_sector(s);
>>       s->bus->retry_nsector = s->nsector;
>> +    s->bus->s = s;
> How is 's' different from 'unit' and 'retry_unit'?
>
> The logic for switching between units is already a little tricky since
> the guest can write to the hardware registers while requests are
> in-flight.
>
> Please don't duplicate "active unit" state, that increases the risk of
> inconsistencies.
>
> Can you use idebus_active_if() to get an equivalent IDEState pointer
> without storing s?

That should be possible.

>
>>       if (s->bus->dma->ops->start_dma) {
>>           s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
>>       }
>> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
>> index 05e93ff..ad188c2 100644
>> --- a/hw/ide/internal.h
>> +++ b/hw/ide/internal.h
>> @@ -343,6 +343,16 @@ enum ide_dma_cmd {
>>   #define ide_cmd_is_read(s) \
>>   	((s)->dma_cmd == IDE_DMA_READ)
>>   
>> +typedef struct IDECancelableRequest {
>> +    QLIST_ENTRY(IDECancelableRequest) list;
>> +    QEMUIOVector qiov;
>> +    uint8_t *buf;
>> +    QEMUIOVector *org_qiov;
>> +    BlockCompletionFunc *org_cb;
>> +    void *org_opaque;
> Please don't shorten names, original_* is clearer than org_*.

Ok.

>
>> +    bool canceled;
>> +} IDECancelableRequest;
>> +
>>   /* NOTE: IDEState represents in fact one drive */
>>   struct IDEState {
>>       IDEBus *bus;
>> @@ -396,6 +406,8 @@ struct IDEState {
>>       BlockAIOCB *pio_aiocb;
>>       struct iovec iov;
>>       QEMUIOVector qiov;
>> +    QLIST_HEAD(, IDECancelableRequest) cancelable_requests;
>> +    bool requests_cancelable;
>>       /* ATA DMA state */
>>       int32_t io_buffer_offset;
>>       int32_t io_buffer_size;
>> @@ -468,6 +480,7 @@ struct IDEBus {
>>       uint8_t retry_unit;
>>       int64_t retry_sector_num;
>>       uint32_t retry_nsector;
>> +    IDEState *s;
>>   };
>>   
>>   #define TYPE_IDE_DEVICE "ide-device"
>> @@ -572,6 +585,9 @@ void ide_set_inactive(IDEState *s, bool more);
>>   BlockAIOCB *ide_issue_trim(BlockBackend *blk,
>>           int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>>           BlockCompletionFunc *cb, void *opaque);
>> +BlockAIOCB *ide_readv_cancelable(IDEState *s, int64_t sector_num,
>> +                                 QEMUIOVector *iov, int nb_sectors,
>> +                                 BlockCompletionFunc *cb, void *opaque);
>>   
>>   /* hw/ide/atapi.c */
>>   void ide_atapi_cmd(IDEState *s);
>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>> index d31ff88..5587183 100644
>> --- a/hw/ide/pci.c
>> +++ b/hw/ide/pci.c
>> @@ -240,21 +240,35 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>>       /* Ignore writes to SSBM if it keeps the old value */
>>       if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>>           if (!(val & BM_CMD_START)) {
>> -            /*
>> -             * We can't cancel Scatter Gather DMA in the middle of the
>> -             * operation or a partial (not full) DMA transfer would reach
>> -             * the storage so we wait for completion instead (we beahve
>> -             * like if the DMA was completed by the time the guest trying
>> -             * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> -             * set).
>> -             *
>> -             * In the future we'll be able to safely cancel the I/O if the
>> -             * whole DMA operation will be submitted to disk with a single
>> -             * aio operation with preadv/pwritev.
>> -             */
>>               if (bm->bus->dma->aiocb) {
>> -                blk_drain_all();
>> -                assert(bm->bus->dma->aiocb == NULL);
>> +                if (bm->bus->s && bm->bus->s->requests_cancelable) {
>> +                    /*
>> +                     * If the used IDE protocol supports request cancelation we
>> +                     * can flag requests as canceled here and disable DMA.
>> +                     * The IDE protocol used MUST use ide_readv_cancelable for all
>> +                     * read operations and then subsequently can enable this code
>> +                     * path. Currently this is only supported for read-only
>> +                     * devices.
>> +                    */
>> +                    IDECancelableRequest *req;
>> +                    QLIST_FOREACH(req, &bm->bus->s->cancelable_requests, list) {
>> +                        if (!req->canceled) {
>> +                            req->org_cb(req->org_opaque, -ECANCELED);
>> +                        }
>> +                        req->canceled = true;
>> +                    }
>> +                } else {
>> +                    /*
>> +                     * We can't cancel Scatter Gather DMA in the middle of the
>> +                     * operation or a partial (not full) DMA transfer would reach
>> +                     * the storage so we wait for completion instead (we beahve
>> +                     * like if the DMA was completed by the time the guest trying
>> +                     * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
>> +                     * set).
>> +                     */
>> +                    blk_drain_all();
>> +                    assert(bm->bus->dma->aiocb == NULL);
> This assertion applies is both branches of the if statement, it could be
> moved after the if statement.

Right.

As pointed out in my comment to your requestion about write/discard I think it should
be feasible to use buffered readv requests for all read-only IDE devices.
Only thing I'm unsure about is reopening. A reopen seems to only flush the device not
drain all requests.

Peter

  reply	other threads:[~2015-10-27 10:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12 12:27 [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 1/4] ide/atapi: make PIO read requests async Peter Lieven
2015-10-22 16:17   ` Stefan Hajnoczi
2015-10-23 15:17     ` Peter Lieven
2015-11-03  0:48   ` John Snow
2015-11-03  7:03     ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 2/4] ide/atapi: blk_aio_readv may return NULL Peter Lieven
2015-10-22 16:20   ` Stefan Hajnoczi
2015-10-23 15:18     ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 3/4] ide: add support for cancelable read requests Peter Lieven
2015-10-26 10:39   ` Stefan Hajnoczi
2015-10-27 10:58     ` Peter Lieven [this message]
2015-10-28 11:26       ` Stefan Hajnoczi
2015-10-28 19:56         ` Peter Lieven
2015-10-12 12:27 ` [Qemu-devel] [PATCH 4/4] ide/atapi: enable cancelable requests Peter Lieven
2015-10-26 10:42 ` [Qemu-devel] [PATCH V2 0/4] ide: avoid main-loop hang on CDROM/NFS failure Stefan Hajnoczi
2015-10-26 10:56   ` Peter Lieven
2015-10-28 11:27     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562F58EF.9050709@kamp.de \
    --to=pl@kamp.de \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.