From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andres Lagar-Cavilla <andreslc@gridcentric.ca>
Cc: andres.lagarcavilla@gmail.com, xen-devel@lists.xen.org,
Andres Lagar-Cavilla <andres@lagarcavilla.org>,
David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH] Xen backend support for paged out grant targets.
Date: Fri, 31 Aug 2012 17:14:42 -0400 [thread overview]
Message-ID: <20120831211441.GC20594@localhost.localdomain> (raw)
In-Reply-To: <CAO=PTzoNfS8+DFXUM2+E9FaZSCJwTeRjecCXJwy_+CFyxGYpUQ@mail.gmail.com>
On Fri, Aug 31, 2012 at 03:33:17PM -0400, Andres Lagar-Cavilla wrote:
> But msleep will wind up calling schedule(). We definitely cannot afford to
> pin down a dom0 vcpu when the pager itself is in dom0.
Duh! Somehow I was thinking it just do a dumb loop. Maybe I am thinking
about a different type of early XYZsleep variant.
>
> Iiuc....
>
> Andres
> On Aug 31, 2012 12:55 PM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>
> wrote:
>
> > On Fri, Aug 31, 2012 at 11:42:32AM -0400, Andres Lagar-Cavilla wrote:
> > > Actually acted upon your feedback ipso facto:
> > >
> > > commit d5fab912caa1f0cf6be0a6773f502d3417a207b6
> > > Author: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > > Date: Sun Aug 26 09:45:57 2012 -0400
> > >
> > > Xen backend support for paged out grant targets.
> > >
> > > Since Xen-4.2, hvm domains may have portions of their memory paged
> > out. When a
> > > foreign domain (such as dom0) attempts to map these frames, the map
> > will
> > > initially fail. The hypervisor returns a suitable errno, and kicks an
> > > asynchronous page-in operation carried out by a helper. The foreign
> > domain is
> > > expected to retry the mapping operation until it eventually
> > succeeds. The
> > > foreign domain is not put to sleep because itself could be the one
> > running the
> > > pager assist (typical scenario for dom0).
> > >
> > > This patch adds support for this mechanism for backend drivers using
> > grant
> > > mapping and copying operations. Specifically, this covers the
> > blkback and
> > > gntdev drivers (which map foregin grants), and the netback driver
> > (which copies
> > > foreign grants).
> > >
> > > * Add GNTST_eagain, already exposed by Xen, to the grant interface.
> > > * Add a retry method for grants that fail with GNTST_eagain (i.e.
> > because the
> > > target foregin frame is paged out).
> > > * Insert hooks with appropriate macro decorators in the
> > aforementioned drivers.
> > >
> > > The retry loop is only invoked if the grant operation status is
> > GNTST_eagain.
> > > It guarantees to leave a new status code different from
> > GNTST_eagain. Any other
> > > status code results in identical code execution as before.
> > >
> > > The retry loop performs 256 attempts with increasing time intervals
> > through a
> > > 32 second period. It uses msleep to yield while waiting for the next
> > retry.
> > >
> >
> >
> > Would it make sense to yield to other processes (so call schedule)? Or
> > perhaps have this in a workqueue ?
> >
> > I mean the 'msleep' just looks like a hack. .. 32 seconds of doing
> > 'msleep' on 1VCPU dom0 could trigger the watchdog I think?
> >
> > > V2 after feedback from David Vrabel:
> > > * Explicit MAX_DELAY instead of wrap-around delay into zero
> > > * Abstract GNTST_eagain check into core grant table code for netback
> > module.
> > >
> > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > >
> > > diff --git a/drivers/net/xen-netback/netback.c
> > b/drivers/net/xen-netback/netback.c
> > > index 682633b..5610fd8 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -635,9 +635,7 @@ static void xen_netbk_rx_action(struct xen_netbk
> > *netbk)
> > > return;
> > >
> > > BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
> > > - ret = HYPERVISOR_grant_table_op(GNTTABOP_copy,
> > &netbk->grant_copy_op,
> > > - npo.copy_prod);
> > > - BUG_ON(ret != 0);
> > > + gnttab_batch_copy_no_eagain(netbk->grant_copy_op, npo.copy_prod);
> > >
> > > while ((skb = __skb_dequeue(&rxq)) != NULL) {
> > > sco = (struct skb_cb_overlay *)skb->cb;
> > > @@ -1460,18 +1458,15 @@ static void xen_netbk_tx_submit(struct xen_netbk
> > *netbk)
> > > static void xen_netbk_tx_action(struct xen_netbk *netbk)
> > > {
> > > unsigned nr_gops;
> > > - int ret;
> > >
> > > nr_gops = xen_netbk_tx_build_gops(netbk);
> > >
> > > if (nr_gops == 0)
> > > return;
> > > - ret = HYPERVISOR_grant_table_op(GNTTABOP_copy,
> > > - netbk->tx_copy_ops, nr_gops);
> > > - BUG_ON(ret);
> > >
> > > - xen_netbk_tx_submit(netbk);
> > > + gnttab_batch_copy_no_eagain(netbk->tx_copy_ops, nr_gops);
> > >
> > > + xen_netbk_tx_submit(netbk);
> > > }
> > >
> > > static void xen_netbk_idx_release(struct xen_netbk *netbk, u16
> > pending_idx)
> > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > > index eea81cf..96543b2 100644
> > > --- a/drivers/xen/grant-table.c
> > > +++ b/drivers/xen/grant-table.c
> > > @@ -38,6 +38,7 @@
> > > #include <linux/vmalloc.h>
> > > #include <linux/uaccess.h>
> > > #include <linux/io.h>
> > > +#include <linux/delay.h>
> > > #include <linux/hardirq.h>
> > >
> > > #include <xen/xen.h>
> > > @@ -823,6 +824,26 @@ unsigned int gnttab_max_grant_frames(void)
> > > }
> > > EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
> > >
> > > +#define MAX_DELAY 256
> > > +void
> > > +gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status,
> > > + const char *func)
> > > +{
> > > + unsigned delay = 1;
> > > +
> > > + do {
> > > + BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1));
> > > + if (*status == GNTST_eagain)
> > > + msleep(delay++);
> > > + } while ((*status == GNTST_eagain) && (delay < MAX_DELAY));
> > > +
> > > + if (delay >= MAX_DELAY) {
> > > + printk(KERN_ERR "%s: %s eagain grant\n", func,
> > current->comm);
> > > + *status = GNTST_bad_page;
> > > + }
> > > +}
> > > +EXPORT_SYMBOL_GPL(gnttab_retry_eagain_gop);
> > > +
> > > int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> > > struct gnttab_map_grant_ref *kmap_ops,
> > > struct page **pages, unsigned int count)
> > > @@ -836,6 +857,11 @@ int gnttab_map_refs(struct gnttab_map_grant_ref
> > *map_ops,
> > > if (ret)
> > > return ret;
> > >
> > > + /* Retry eagain maps */
> > > + for (i = 0; i < count; i++)
> > > + if (map_ops[i].status == GNTST_eagain)
> > > + gnttab_retry_eagain_map(map_ops + i);
> > > +
> > > if (xen_feature(XENFEAT_auto_translated_physmap))
> > > return ret;
> > >
> > > diff --git a/drivers/xen/xenbus/xenbus_client.c
> > b/drivers/xen/xenbus/xenbus_client.c
> > > index b3e146e..749f6a3 100644
> > > --- a/drivers/xen/xenbus/xenbus_client.c
> > > +++ b/drivers/xen/xenbus/xenbus_client.c
> > > @@ -490,8 +490,7 @@ static int xenbus_map_ring_valloc_pv(struct
> > xenbus_device *dev,
> > >
> > > op.host_addr = arbitrary_virt_to_machine(pte).maddr;
> > >
> > > - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
> > > - BUG();
> > > + gnttab_map_grant_no_eagain(&op);
> > >
> > > if (op.status != GNTST_okay) {
> > > free_vm_area(area);
> > > @@ -572,8 +571,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int
> > gnt_ref,
> > > gnttab_set_map_op(&op, (unsigned long)vaddr, GNTMAP_host_map,
> > gnt_ref,
> > > dev->otherend_id);
> > >
> > > - if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
> > > - BUG();
> > > + gnttab_map_grant_no_eagain(&op);
> > >
> > > if (op.status != GNTST_okay) {
> > > xenbus_dev_fatal(dev, op.status,
> > > diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> > > index 11e27c3..2fecfab 100644
> > > --- a/include/xen/grant_table.h
> > > +++ b/include/xen/grant_table.h
> > > @@ -43,6 +43,7 @@
> > > #include <xen/interface/grant_table.h>
> > >
> > > #include <asm/xen/hypervisor.h>
> > > +#include <asm/xen/hypercall.h>
> > >
> > > #include <xen/features.h>
> > >
> > > @@ -183,6 +184,43 @@ unsigned int gnttab_max_grant_frames(void);
> > >
> > > #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
> > >
> > > +/* Retry a grant map/copy operation when the hypervisor returns
> > GNTST_eagain.
> > > + * This is typically due to paged out target frames.
> > > + * Generic entry-point, use macro decorators below for specific grant
> > > + * operations.
> > > + * Will retry for 1, 2, ... 255 ms, i.e. 256 times during 32 seconds.
> > > + * Return value in *status guaranteed to no longer be GNTST_eagain. */
> > > +void gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t
> > *status,
> > > + const char *func);
> > > +
> > > +#define gnttab_retry_eagain_map(_gop) \
> > > + gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, (_gop), \
> > > + &(_gop)->status, __func__)
> > > +
> > > +#define gnttab_retry_eagain_copy(_gop) \
> > > + gnttab_retry_eagain_gop(GNTTABOP_copy, (_gop), \
> > > + &(_gop)->status, __func__)
> > > +
> > > +#define gnttab_map_grant_no_eagain(_gop)
> > \
> > > +do {
> > \
> > > + if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, (_gop), 1))
> > \
> > > + BUG();
> > \
> > > + if ((_gop)->status == GNTST_eagain)
> > \
> > > + gnttab_retry_eagain_map((_gop));
> > \
> > > +} while(0)
> > > +
> > > +static inline void
> > > +gnttab_batch_copy_no_eagain(struct gnttab_copy *batch, unsigned count)
> > > +{
> > > + unsigned i;
> > > + struct gnttab_copy *op;
> > > +
> > > + BUG_ON(HYPERVISOR_grant_table_op(GNTTABOP_copy, batch, count));
> > > + for (i = 0, op = batch; i < count; i++, op++)
> > > + if (op->status == GNTST_eagain)
> > > + gnttab_retry_eagain_copy(op);
> > > +}
> > > +
> > > int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> > > struct gnttab_map_grant_ref *kmap_ops,
> > > struct page **pages, unsigned int count);
> > > diff --git a/include/xen/interface/grant_table.h
> > b/include/xen/interface/grant_table.h
> > > index 7da811b..66cb734 100644
> > > --- a/include/xen/interface/grant_table.h
> > > +++ b/include/xen/interface/grant_table.h
> > > @@ -520,6 +520,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
> > > #define GNTST_permission_denied (-8) /* Not enough privilege for
> > operation. */
> > > #define GNTST_bad_page (-9) /* Specified page was invalid for
> > op. */
> > > #define GNTST_bad_copy_arg (-10) /* copy arguments cross page
> > boundary */
> > > +#define GNTST_eagain (-12) /* Retry.
> > */
> > >
> > > #define GNTTABOP_error_msgs { \
> > > "okay", \
> > > @@ -533,6 +534,7 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_get_version);
> > > "permission denied", \
> > > "bad page", \
> > > "copy arguments cross page boundary" \
> > > + "retry" \
> > > }
> > >
> > > #endif /* __XEN_PUBLIC_GRANT_TABLE_H__ */
> > >
> > >
> > > On Aug 31, 2012, at 10:45 AM, Andres Lagar-Cavilla wrote:
> > >
> > > >
> > > > On Aug 31, 2012, at 10:32 AM, David Vrabel wrote:
> > > >
> > > >> On 27/08/12 17:51, andres@lagarcavilla.org wrote:
> > > >>> From: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > > >>>
> > > >>> Since Xen-4.2, hvm domains may have portions of their memory paged
> > out. When a
> > > >>> foreign domain (such as dom0) attempts to map these frames, the map
> > will
> > > >>> initially fail. The hypervisor returns a suitable errno, and kicks an
> > > >>> asynchronous page-in operation carried out by a helper. The foreign
> > domain is
> > > >>> expected to retry the mapping operation until it eventually
> > succeeds. The
> > > >>> foreign domain is not put to sleep because itself could be the one
> > running the
> > > >>> pager assist (typical scenario for dom0).
> > > >>>
> > > >>> This patch adds support for this mechanism for backend drivers using
> > grant
> > > >>> mapping and copying operations. Specifically, this covers the
> > blkback and
> > > >>> gntdev drivers (which map foregin grants), and the netback driver
> > (which copies
> > > >>> foreign grants).
> > > >>>
> > > >>> * Add GNTST_eagain, already exposed by Xen, to the grant interface.
> > > >>> * Add a retry method for grants that fail with GNTST_eagain (i.e.
> > because the
> > > >>> target foregin frame is paged out).
> > > >>> * Insert hooks with appropriate macro decorators in the
> > aforementioned drivers.
> > > >>
> > > >> I think you should implement wrappers around
> > HYPERVISOR_grant_table_op()
> > > >> have have the wrapper do the retries instead of every backend having
> > to
> > > >> check for EAGAIN and issue the retries itself. Similar to the
> > > >> gnttab_map_grant_no_eagain() function you've already added.
> > > >>
> > > >> Why do some operations not retry anyway?
> > > >
> > > > All operations retry. The reason why I could not make it as elegant as
> > you suggest is because grant operations are submitted in batches and their
> > status(es?) later checked individually elsewhere. This is the case for
> > netback. Note that both blkback and gntdev use a more linear structure with
> > the gnttab_map_refs helper, which allows me to hide all the retry gore from
> > those drivers into grant table code. Likewise for xenbus ring mapping.
> > > >
> > > > In summary, outside of core grant table code, only the netback driver
> > needs to check explicitly for retries, due to its
> > batch-copy-delayed-per-slot-check structure.
> > > >
> > > >>
> > > >>> +void
> > > >>> +gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t
> > *status,
> > > >>> + const char *func)
> > > >>> +{
> > > >>> + u8 delay = 1;
> > > >>> +
> > > >>> + do {
> > > >>> + BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1));
> > > >>> + if (*status == GNTST_eagain)
> > > >>> + msleep(delay++);
> > > >>> + } while ((*status == GNTST_eagain) && delay);
> > > >>
> > > >> Terminating the loop when delay wraps is a bit subtle. Why not make
> > > >> delay unsigned and check delay <= MAX_DELAY?
> > > > Good idea (MAX_DELAY == 256). I'd like to get Konrad's feedback before
> > a re-spin.
> > > >
> > > >>
> > > >> Would it be sensible to ramp the delay faster? Perhaps double each
> > > >> iteration with a maximum possible delay of e.g., 256 ms.
> > > > Generally speaking we've never seen past three retries. I am open to
> > changing the algorithm but there is a significant possibility it won't
> > matter at all.
> > > >
> > > >>
> > > >>> +#define gnttab_map_grant_no_eagain(_gop)
> > \
> > > >>> +do {
> > \
> > > >>> + if ( HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, (_gop),
> > 1)) \
> > > >>> + BUG();
> > \
> > > >>> + if ((_gop)->status == GNTST_eagain)
> > \
> > > >>> + gnttab_retry_eagain_map((_gop));
> > \
> > > >>> +} while(0)
> > > >>
> > > >> Inline functions, please.
> > > >
> > > > I want to retain the original context for debugging. Eventually we
> > print __func__ if things go wrong.
> > > >
> > > > Thanks, great feedback
> > > > Andres
> > > >
> > > >>
> > > >> David
> > > >
> > >
> >
prev parent reply other threads:[~2012-08-31 21:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-27 16:51 [PATCH] Xen backend support for paged out grant targets andres
2012-08-31 14:32 ` David Vrabel
2012-08-31 14:45 ` Andres Lagar-Cavilla
2012-08-31 15:42 ` Andres Lagar-Cavilla
2012-08-31 16:10 ` David Vrabel
2012-09-05 16:27 ` Konrad Rzeszutek Wilk
2012-09-05 17:21 ` Andres Lagar-Cavilla
2012-09-12 13:20 ` Andres Lagar-Cavilla
2012-09-12 18:30 ` Konrad Rzeszutek Wilk
2012-08-31 16:54 ` Konrad Rzeszutek Wilk
2012-08-31 19:33 ` Andres Lagar-Cavilla
2012-08-31 21:14 ` Konrad Rzeszutek Wilk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120831211441.GC20594@localhost.localdomain \
--to=konrad.wilk@oracle.com \
--cc=andres.lagarcavilla@gmail.com \
--cc=andres@lagarcavilla.org \
--cc=andreslc@gridcentric.ca \
--cc=david.vrabel@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).