From mboxrd@z Thu Jan  1 00:00:00 1970
From: keith.busch@intel.com (Keith Busch)
Date: Thu, 17 Jan 2019 08:22:02 -0700
Subject: [PATCH] nvme: fix out-of-bounds access during irq vectors
 allocation
In-Reply-To: <95eb3f38-0f55-dc99-94a0-d5a2b88c0e4c@kernel.dk>
References: <1547694610-31879-1-git-send-email-chenhc@lemote.com>
 <0a01d310-c949-af4e-edc0-44859fb277c5@kernel.dk>
 <95eb3f38-0f55-dc99-94a0-d5a2b88c0e4c@kernel.dk>
Message-ID: <20190117152201.GA31543@localhost.localdomain>

On Wed, Jan 16, 2019@08:57:21PM -0700, Jens Axboe wrote:
> On 1/16/19 8:51 PM, Jens Axboe wrote:
> > On 1/16/19 8:10 PM, Huacai Chen wrote:
> >> While reducing irq_queues in the do-while loop in nvme_setup_irqs(),
> >> the reduction of irq_sets[] is behind irq_queues. Below is an example.
> >>
> >> On a 8 cpu platform, with default setting, nvme_setup_irqs() begin with
> >> irq_queues = 8 (but when allocating irq vectors it will become 9 due to
> >> the admin queue), affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0]
> >> = 8. If MSI-X resources are not enough, then the do-while loop will
> >> reduce irq vectors:
> >>
> >> The 1st time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 9, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 8
> >> The 2nd time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 8, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 8
> >> The 3rd time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 7, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 7
> >>
> >> However, this will cause an out of bounds access in __pci_enable_msix()
> >> --> ... --> irq_create_affinity_masks() --> irq_build_affinity_masks().
> >>
> >> In the 2nd round of reduction, let's pay attention to the calling of
> >> irq_build_affinity_masks(affd, curvec, this_vecs, curvec, node_to_cpumask, masks):
> >>
> >> The number of elements in masks is 8 (depends on nvecs which is equal to
> >> irq_queues), curvec is 1 (depends on affd.pre_vectors), then
> >> irq_build_affinity_masks() will access 8 elements in masks (depends on
> >> this_vecs which is equal to affd.sets[0]), and the last element is out
> >> of bounds.
> >>
> >> So the root cause is affd.sets[] + affd.pre_vectors should not be larger
> >> than vectors to be allocated. In this patch we introduce alloc_queues to
> >> indicate how many queues to allocate (not reuse irq_queues), and so we
> >> can adjust affd.set[] correctly (depends on irq_queues) to avoid out of
> >> bounds access.
> >>
> >> After this patch:
> >>
> >> The 1st time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 8, alloc_queues = 9, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 8
> >> The 2nd time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 7, alloc_queues = 8, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 7
> >> The 3rd time call pci_alloc_irq_vectors_affinity(),
> >> irq_queues = 6, alloc_queues = 7, affd.pre_vectors = 1, affd.nr_sets = 1, affd.sets[0] = 6
> > 
> > We currently have this one queued up:
> > 
> > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=c45b1fa2433c65e44bdf48f513cb37289f3116b9
> > 
> > can you check if it fixes the issue for you?
> 
> Nevermind, took a closer look, and this looks like a different issue.

The solutions look different, but I think they're both targeting the
same problem, which is the older code had been accounting for vectors
and queues differenting in the first iteration than subsequent ones. I
think Ming's patch will probably fix the issue raised here and worth a
shot at testing it.