linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* aio: questions with ioctx_alloc() and large num_possible_cpus()
@ 2016-10-04 22:55 Mauricio Faria de Oliveira
  2016-10-05  6:34 ` Kent Overstreet
  2016-10-05 17:41 ` Benjamin LaHaise
  0 siblings, 2 replies; 8+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-10-04 22:55 UTC (permalink / raw)
  To: Benjamin LaHaise, Kent Overstreet
  Cc: Alexander Viro, linux-fsdevel, linux-aio, linux-kernel

Hi Benjamin, Kent, and others,

Would you please comment / answer about this possible problem?
Any feedback is appreciated.

Since commit e1bdd5f27a5b ("aio: percpu reqs_available") the maximum
number of aio nr_events may be a function of num_possible_cpus() and
actually be /inversely proportional/ to it (i.e., more CPUs lead to
less system-wide aio nr_events). This is a problem on larger systems.

That's because if "nr_events < num_possible_cpus() * 4" (for example
nr_events == 1) that counts as "num_possible_cpus() * 4" into aio_nr
and against aio_max_nr

     static struct kioctx *ioctx_alloc(unsigned nr_events)
     ...
         nr_events = max(nr_events, num_possible_cpus() * 4);
         nr_events *= 2;
     ...
         /* limit the number of system wide aios */
     ....
         if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
     ...
             err = -EAGAIN;
     ...
         aio_nr += ctx->max_reqs;
     ...

That problem is easily noticeable on a common POWER8 system:  160 CPUs
(2 sockets * 10 cores/socket * 8 threads/core = 160 CPUs) limits the max
AIO contexts with "io_setup(1, )" to 102 out of 64k (default ax_aio_nr):

     # cat /sys/devices/system/cpu/possible
     0-159

     # cat /proc/sys/fs/aio-max-nr
     65536

     # echo $(( 65536 / (160 * 4) ))
     102

test-case snippet & output:

     for (i = 0; i < 65536; i++)
         if (rc = io_setup(1, &ioctx[i]))
             break;

     printf("rc = %d, i = %d\n", rc, i);

     > rc = -11, i = 102

(another problem is that the sysctl aio-nr grows larger than aio-max-nr,
since it's checked against "aio_max_nr * 2")

So,

I've been trying to understand/fix this, but soon got stuck on options
as I didn't quite get a few points.. if you could provide some insight,
please, that would be really helpful:

- why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?

   Is it just related to req_batch in a form of a reasonable constant,
   or there are other implications (e.g., related to "up to half of
   slots on other cpu's percpu counters" -- which would be nice to
   understand why too.)

- "struct kioctx" says max_reqs is

    " is what userspace passed to io_setup(), it's not used for
    anything but counting against the global max_reqs quota. "

    However, we see it incremented by the modified nr_events, thus
    not really the value from userspace anymore, and used to derive
    nr_events in aio_setup_ring().  Is the comment wrong nowadays,
    or is the code usage of max_reqs wrong/abusing it, or... ? :)

- what's really expected to be counted by aio-nr is nr_events
   (er.. the value actually requested by userspace?) or the number
   of times io_setup(N, ) returned successfully (say, io contexts),
   regardless of the total/sum of their nr_events?

- any other comments/suggestions are appreciated.

Thanks in advance,

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-28 18:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-04 22:55 aio: questions with ioctx_alloc() and large num_possible_cpus() Mauricio Faria de Oliveira
2016-10-05  6:34 ` Kent Overstreet
2016-10-05 17:21   ` Mauricio Faria de Oliveira
2016-10-05 17:41 ` Benjamin LaHaise
2016-10-05 17:58   ` Mauricio Faria de Oliveira
2016-10-05 18:17     ` Benjamin LaHaise
2016-10-05 19:22       ` Mauricio Faria de Oliveira
2016-10-28 18:59       ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).