All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Steve Wise" <swise@opengridcomputing.com>
To: 'Max Gurtovoy' <maxg@mellanox.com>,
	'Sagi Grimberg' <sagi@grimberg.me>,
	'Leon Romanovsky' <leon@kernel.org>
Cc: 'Doug Ledford' <dledford@redhat.com>,
	'Jason Gunthorpe' <jgg@mellanox.com>,
	'RDMA mailing list' <linux-rdma@vger.kernel.org>,
	'Saeed Mahameed' <saeedm@mellanox.com>,
	'linux-netdev' <netdev@vger.kernel.org>
Subject: RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask
Date: Wed, 18 Jul 2018 09:25:54 -0500	[thread overview]
Message-ID: <015401d41ea3$3ce8cfa0$b6ba6ee0$@opengridcomputing.com> (raw)
In-Reply-To: <9a4d8d50-19b0-fcaa-d4a3-6cfa2318a973@mellanox.com>



> -----Original Message-----
> From: Max Gurtovoy <maxg@mellanox.com>
> Sent: Wednesday, July 18, 2018 9:14 AM
> To: Sagi Grimberg <sagi@grimberg.me>; Steve Wise
> <swise@opengridcomputing.com>; 'Leon Romanovsky' <leon@kernel.org>
> Cc: 'Doug Ledford' <dledford@redhat.com>; 'Jason Gunthorpe'
> <jgg@mellanox.com>; 'RDMA mailing list' <linux-rdma@vger.kernel.org>;
> 'Saeed Mahameed' <saeedm@mellanox.com>; 'linux-netdev'
> <netdev@vger.kernel.org>
> Subject: Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity
> mask
> 
> 
> 
> On 7/18/2018 2:38 PM, Sagi Grimberg wrote:
> >
> >>> IMO we must fulfil the user wish to connect to N queues and not reduce
> >>> it because of affinity overlaps. So in order to push Leon's patch we
> >>> must also fix the blk_mq_rdma_map_queues to do a best effort
> mapping
> >>> according the affinity and map the rest in naive way (in that way we
> >>> will *always* map all the queues).
> >>
> >> That is what I would expect also.   For example, in my node, where
> >> there are
> >> 16 cpus, and 2 numa nodes, I observe much better nvmf IOPS
> performance by
> >> setting up my 16 driver completion event queues such that each is
> >> bound to a
> >> node-local cpu.  So I end up with each nodel-local cpu having 2 queues
> >> bound
> >> to it.   W/O adding support in iw_cxgb4 for ib_get_vector_affinity(),
> >> this
> >> works fine.   I assumed adding ib_get_vector_affinity() would allow
> >> this to
> >> all "just work" by default, but I'm running into this connection failure
> >> issue.
> >>
> >> I don't understand exactly what the blk_mq layer is trying to do, but I
> >> assume it has ingress event queues and processing that it trying to align
> >> with the drivers ingress cq event handling, so everybody stays on the
> >> same
> >> cpu (or at least node).   But something else is going on.  Is there
> >> documentation on how this works somewhere?
> >
> > Does this (untested) patch help?
> 
> I'm not sure (I'll test it tomorrow) because the issue is the unmapped
> queues and not the cpus.
> for example, if the affinity of q=6 and q=12 returned the same cpumask
> than q=6 will not be mapped and will fail to connect.
> 
> > --
> > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> > index 3eb169f15842..dbe962cb537d 100644
> > --- a/block/blk-mq-cpumap.c
> > +++ b/block/blk-mq-cpumap.c
> > @@ -30,29 +30,34 @@ static int get_first_sibling(unsigned int cpu)
> >          return cpu;
> >   }
> >
> > -int blk_mq_map_queues(struct blk_mq_tag_set *set)
> > +void blk_mq_map_queue(struct blk_mq_tag_set *set, unsigned int cpu)
> >   {

There is already a static inline function named blk_mq_map_queue() in block/blk-mq.h.  Did you mean to replace that?  Or is this just a function name conflict?


> >          unsigned int *map = set->mq_map;
> >          unsigned int nr_queues = set->nr_hw_queues;
> > -       unsigned int cpu, first_sibling;
> > +       unsigned int first_sibling;
> >
> > -       for_each_possible_cpu(cpu) {
> > -               /*
> > -                * First do sequential mapping between CPUs and queues.
> > -                * In case we still have CPUs to map, and we have some
> > number of
> > -                * threads per cores then map sibling threads to the
> > same queue for
> > -                * performace optimizations.
> > -                */
> > -               if (cpu < nr_queues) {
> > +       /*
> > +        * First do sequential mapping between CPUs and queues.
> > +        * In case we still have CPUs to map, and we have some number of
> > +        * threads per cores then map sibling threads to the same queue for
> > +        * performace optimizations.
> > +        */
> > +       if (cpu < nr_queues) {
> > +               map[cpu] = cpu_to_queue_index(nr_queues, cpu);
> > +       } else {
> > +               first_sibling = get_first_sibling(cpu);
> > +               if (first_sibling == cpu)
> >                          map[cpu] = cpu_to_queue_index(nr_queues, cpu);
> > -               } else {
> > -                       first_sibling = get_first_sibling(cpu);
> > -                       if (first_sibling == cpu)
> > -                               map[cpu] = cpu_to_queue_index(nr_queues,
> > cpu);
> > -                       else
> > -                               map[cpu] = map[first_sibling];
> > -               }
> > +               else
> > +                       map[cpu] = map[first_sibling];
> >          }
> > +}
> > +EXPORT_SYMBOL_GPL(blk_mq_map_queue);
> > +
> > +int blk_mq_map_queues(struct blk_mq_tag_set *set)
> > +{
> > +       for_each_possible_cpu(cpu)
> > +                blk_mq_map_queue(set, cpu);
> >
> >          return 0;
> >   }
> > diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
> > index 996167f1de18..5e91789bea5b 100644
> > --- a/block/blk-mq-rdma.c
> > +++ b/block/blk-mq-rdma.c
> > @@ -35,6 +35,10 @@ int blk_mq_rdma_map_queues(struct
> blk_mq_tag_set *set,
> >          const struct cpumask *mask;
> >          unsigned int queue, cpu;
> >
> > +       /* reset all to  */
> > +       for_each_possible_cpu(cpu)
> > +               set->mq_map[cpu] = UINT_MAX;
> > +
> >          for (queue = 0; queue < set->nr_hw_queues; queue++) {
> >                  mask = ib_get_vector_affinity(dev, first_vec + queue);
> >                  if (!mask)
> > @@ -44,6 +48,11 @@ int blk_mq_rdma_map_queues(struct
> blk_mq_tag_set *set,
> >                          set->mq_map[cpu] = queue;
> >          }
> >
> > +       for_each_possible_cpu(cpu) {
> > +               if (set->mq_map[cpu] == UINT_MAX)
> > +                       blk_mq_map_queue(set, cpu);
> > +       }
> > +
> >          return 0;
> >
> >   fallback:
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index e3147eb74222..7a9848a82475 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -283,6 +283,7 @@ int blk_mq_freeze_queue_wait_timeout(struct
> > request_queue *q,
> >                                       unsigned long timeout);
> >
> >   int blk_mq_map_queues(struct blk_mq_tag_set *set);
> > +void blk_mq_map_queue(struct blk_mq_tag_set *set, unsigned int cpu);
> >   void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int
> > nr_hw_queues);
> >
> >   void blk_mq_quiesce_queue_nowait(struct request_queue *q);
> > --
> >
> > It really is still a best effort thing...

  reply	other threads:[~2018-07-18 14:25 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  8:30 [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask Leon Romanovsky
2018-07-16 10:23 ` Sagi Grimberg
2018-07-16 10:30   ` Leon Romanovsky
2018-07-16 14:54     ` Max Gurtovoy
2018-07-16 14:59       ` Sagi Grimberg
2018-07-16 16:46         ` Max Gurtovoy
2018-07-16 17:08           ` Steve Wise
2018-07-17  8:46             ` Max Gurtovoy
2018-07-17  8:58               ` Leon Romanovsky
2018-07-17 10:05                 ` Max Gurtovoy
2018-07-17 13:03               ` Steve Wise
2018-07-18 11:38                 ` Sagi Grimberg
2018-07-18 14:14                   ` Max Gurtovoy
2018-07-18 14:25                     ` Steve Wise [this message]
2018-07-18 19:29                     ` Steve Wise
2018-07-19 14:50                       ` Max Gurtovoy
2018-07-19 18:45                         ` Steve Wise
2018-07-20  1:25                           ` Max Gurtovoy
2018-07-23 16:49                             ` Jason Gunthorpe
2018-07-23 16:53                               ` Max Gurtovoy
2018-07-30 15:47                                 ` Steve Wise
2018-07-31 10:00                                   ` Max Gurtovoy
2018-08-01  5:12                                 ` Sagi Grimberg
2018-08-01 14:27                                   ` Max Gurtovoy
2018-08-06 19:20                                     ` Steve Wise
2018-08-15  6:37                                       ` Leon Romanovsky
2018-08-16 18:26                                       ` Sagi Grimberg
2018-08-16 18:32                                         ` Steve Wise
2018-08-17 16:17                                           ` Steve Wise
2018-08-17 20:03                                             ` Sagi Grimberg
2018-08-17 20:17                                               ` Jason Gunthorpe
2018-08-17 20:26                                                 ` Sagi Grimberg
2018-08-17 21:28                                               ` Steve Wise
2018-07-24 15:24                             ` Steve Wise
2018-07-24 20:52                               ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='015401d41ea3$3ce8cfa0$b6ba6ee0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    --cc=dledford@redhat.com \
    --cc=jgg@mellanox.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.