xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Wei Liu <Wei.Liu2@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: wei.liu2@citrix.com, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH V3] Switch from select() to poll() in xenconsoled's IO loop.
Date: Mon, 7 Jan 2013 12:12:47 +0000	[thread overview]
Message-ID: <1357560767.10283.15.camel@iceland> (raw)
In-Reply-To: <1357554023.14291.128.camel@zakaz.uk.xensource.com>

On Mon, 2013-01-07 at 10:20 +0000, Ian Campbell wrote:
> On Fri, 2013-01-04 at 17:17 +0000, Wei Liu wrote:
> > In Linux select() typically supports up to 1024 file descriptors. This can be
> > a problem when user tries to boot up many guests. Switching to poll() has
> > minimum impact on existing code and has better scalibility.
> > 
> > Tracking arrays are dynamically allocated / reallocated. If the tracking
> > arrays fail to expand, we just ignore the incoming fd.
> > 
> > Change from V2:
> >   * remove unnecessary malloc in initialize_pollfd_arrays
> >   * use ROUND_UP to get new size of arrays
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  tools/console/daemon/io.c |  147 +++++++++++++++++++++++++++++++++------------
> >  1 file changed, 109 insertions(+), 38 deletions(-)
> > 
> > diff --git a/tools/console/daemon/io.c b/tools/console/daemon/io.c
> > index 48fe151..0c53d30 100644
> > --- a/tools/console/daemon/io.c
> > +++ b/tools/console/daemon/io.c
> > @@ -28,7 +28,7 @@
> >  #include <stdlib.h>
> >  #include <errno.h>
> >  #include <string.h>
> > -#include <sys/select.h>
> > +#include <poll.h>
> >  #include <fcntl.h>
> >  #include <unistd.h>
> >  #include <termios.h>
> > @@ -928,9 +928,87 @@ static void handle_log_reload(void)
> >         }
> >  }
> > 
> > +
> > +/* Should have at least max_fd + 1 elements */
> > +static struct pollfd  *fds;
> > +static struct pollfd **fd_to_pollfd;
> > +static unsigned int current_array_size;
> > +static unsigned int nr_fds;
> > +
> > +static void initialize_pollfd_arrays(void)
> > +{
> > +       fds = NULL;
> > +       fd_to_pollfd = NULL;
> > +       current_array_size = 0;
> > +}
> 
> These can all be done as part of the definition of the variables.
> 
> > +
> > +static void destroy_pollfd_arrays(void)
> > +{
> > +       free(fds);
> > +       free(fd_to_pollfd);
> > +       current_array_size = 0;
> > +}
> 
> Having done the above I'd be inclined to inline this at the single
> callsite.
> 

I left the initialize() there just to pair with destroy(). But as you
suggested, I can eliminate both.

> > +
> > +#define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1))
> > +static void set_fds(int fd, short events)
> > +{
> > +       if (current_array_size < fd+1) {
> > +               struct pollfd  *p1 = NULL;
> > +               struct pollfd **p2 = NULL;
> > +               unsigned long newsize;
> > +
> > +               /* Round up to 2^8 boundary, in practice this just
> > +                * make newsize larger than current_array_size.
> > +                */
> > +               newsize = ROUNDUP(fd+1, 8);
> > +
> > +               p1 = realloc(fds, sizeof(struct pollfd)*newsize);
> > +               if (!p1)
> > +                       goto fail;
> > +               fds = p1;
> > +
> > +               p2 = realloc(fd_to_pollfd, sizeof(struct pollfd *)*newsize);
> > +               if (!p2)
> > +                       goto fail;
> > +               fd_to_pollfd = p2;
> 
> Do these two arrays really need to be the same size?
> 

Not necessary. I will see what I can do.

> I can see why fd_to_pollfd needs to be as big as the highest fd we are
> polling but the fds array only actually needs to scale with the total
> *number* of fds we are polling. Is it the case that the set of fds we
> are waiting on is pretty densely packed?
> 

It depends. If we have many guests up then destroy them all, then the
arrays can be sparse. There is no way to shrink the arrays. But see
below.

> Or maybe even if the fds are sparse it doesn't matter and this is
> simpler at the cost of a little bit of extra memory usage?
> 
> 

Even we have thousands of guests, the tracking arrays only occupy a few
megabytes.

> > +
> > +               memset(&fds[0] + current_array_size, 0,
> > +                      sizeof(struct pollfd) * (newsize-current_array_size));
> > +               memset(&fd_to_pollfd[0] + current_array_size, 0,
> > +                      sizeof(struct pollfd *) * (newsize-current_array_size));
> > +               current_array_size = newsize;
> > +       }
> > +
> > +       fds[nr_fds].fd = fd;
> > +       fds[nr_fds].events = events;
> > +       fd_to_pollfd[fd] = &fds[nr_fds];
> > +       nr_fds++;
> > +
> > +       return;
> > +fail:
> > +       dolog(LOG_ERR, "realloc failed, ignoring fd %d\n", fd);
> > +       return;
> > +}
> [...]
> > -               if (log_hv && FD_ISSET(xc_evtchn_fd(xce_handle), &readfds))
> > +               if (log_hv && fd_revents(xc_evtchn_fd(xce_handle)) & POLLIN)
> >                         handle_hv_logs();
> > 
> >                 if (ret <= 0)
> >                         continue;
> > 
> > -               if (FD_ISSET(xs_fileno(xs), &readfds))
> > +               if (fd_revents(xs_fileno(xs)) & POLLIN)
> 
> It seems like both this and the logging fd could be handled with a
> struct pollfd * variable specifically for each. Which combined with...
> 
> >                         handle_xs();
> > 
> >                 for (d = dom_head; d; d = n) {
> >                         n = d->next;
> >                         if (d->event_count < RATE_LIMIT_ALLOWANCE) {
> >                                 if (d->xce_handle != NULL &&
> > -                                   FD_ISSET(xc_evtchn_fd(d->xce_handle),
> > -                                            &readfds))
> > +                                   fd_revents(xc_evtchn_fd(d->xce_handle)) &
> 
> ... adding some struct pollfd pointers to struct domain (for this and
> master_fd) would remove the need for the fd_to_pollfd lookup array
> altogether.
> 

Yes, this is possible. But my original thought was to make as little
impact to original code as possible, so I didn't touch struct domain and
choose to add extra tracking facilities to make it just work.

If we hit poll() bottleneck we can move to libevent or libev, which
requires big surgery on existing code.

Of course I can do that as well if that suits you.


Wei.

  reply	other threads:[~2013-01-07 12:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-03 17:14 [PATCH] Switch to poll in xenconsoled's io loop Wei Liu
2013-01-03 18:22 ` Mats Petersson
2013-01-04 12:30   ` Wei Liu
2013-01-04 15:58 ` [PATCH V2] Switch from select() to poll() in xenconsoled's IO loop Wei Liu
2013-01-04 16:08   ` Ian Campbell
2013-01-04 16:38     ` Wei Liu
2013-01-04 16:51       ` Mats Petersson
2013-01-04 17:17 ` [PATCH V3] " Wei Liu
2013-01-07 10:20   ` Ian Campbell
2013-01-07 12:12     ` Wei Liu [this message]
2013-01-07 12:16       ` Ian Campbell
2013-01-07 14:28 ` [PATCH V4] " Wei Liu
2013-01-07 14:39   ` Ian Campbell
2013-01-07 14:44     ` Wei Liu
2013-01-07 14:52     ` Ian Jackson
2013-01-07 14:41   ` Mats Petersson
2013-01-07 15:01     ` Wei Liu
2013-01-07 15:06       ` Mats Petersson
2013-01-07 15:17         ` Ian Campbell
2013-01-07 15:16       ` Ian Campbell
2013-01-07 15:24         ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1357560767.10283.15.camel@iceland \
    --to=wei.liu2@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).