All of lore.kernel.org
 help / color / mirror / Atom feed
* Fixing nouveau for >4k PAGE_SIZE
       [not found]     ` <1376179046.32100.60.camel@pasglop>
@ 2013-08-11  0:41       ` Benjamin Herrenschmidt
  2013-08-11  5:36         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-11  0:41 UTC (permalink / raw)
  To: Dave Airlie; +Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

Hi folks !

So I've been trying to figure out what it would take to make
nouveau work properly on architectures where PAGE_SIZE isn't
4k such as most ppc64's. An initial patch from Dave fixed a
bogon in nv41.c nv41_vm_map_sg() which was trying to handle
the case at that low level, but this isn't enough, and after
a bit of digging, I also think that's not the right approach:

So, from what I can tell, subdev/vm/base.c is not clean vs. PAGE_SIZE
in a number of places unless I'm mistaken. For example, in
nouveau_vm_map_sg_table(), something like that:

		sglen = sg_dma_len(sg) >> PAGE_SHIFT;

		end = pte + sglen;

Seems to imply an assumption here that the "pte" is in multiple of
PAGE_SHIFT, but afaik, it's not. So further down, we do:

		for (m = 0; m < len; m++) {
			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);

			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);                       
			num--;
			pte++;

But in fact, inside vmm->map_sg, with the current code, we will have
incremented pte by more than 1 ... so we basically lose track here.

			if (num == 0)
				goto finish;
		}
		if (unlikely(end >= max)) {
			pde++;
			pte = 0;
		}

We need to similarly make sure we don't end up crossing accross two
pde's here, ie, that our 64k page isn't mapped at a non-multiple of
64k in the card space, where is that enforced ? (It might be, I don't
know). If it is, then we need to recalc the pde on each sub page.

So I'm thinking the right fix is to remove the inner loop in nv41.c
nv41_vm_map_sg() and similars, let those basically deal with card-size
PTEs exclusively, and move the breakdown logic inside
nouveau_vm_map_sg_table() and friendes, that's the only way it can get
pte and pde right anyway and it would generally work with all backends
(well, hopefully)

Now, to do that, I need a better understanding of the various things
in there since I'm not familiar with nouveau at all. What I think I've
figured out is with a few questions, it would be awesome if you could
answer them so I can have a shot at fixing it all :-)

 - There is spg_shift and lpg_shift in the backend vmm. My understanding
is those correspond to the supported small and large page shift respectively
in the card's MMU, correct ? On nv41 they are both 12.

 - vma->node->type indicates the desired page shift for a given vma
object we are trying to map. It may or may not match spg_shift. If it
doesn't, the 'big' flag gets set in the various vm/base.c functions,
which makes them use a different page table via:

		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];

 Question: Can that ever happen on nv41 ? Or rather, can node->type
ever be set to something that is neither vmm->spg_shift nor vmm->lpg_shift ?

 - vma->node->offset is the location in bytes in the card memory space
of the nouveau_vma object, right ?

 - In nouveau_vm_map_sg_table(), we take a "delta" argument. I assume that's
an indication of where within the "vma" we want to map the sg list, ie page
1 of the sg list gets mapped to page 1 of the VMA, correct ?

Thanks !

Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  0:41       ` Fixing nouveau for >4k PAGE_SIZE Benjamin Herrenschmidt
@ 2013-08-11  5:36         ` Benjamin Herrenschmidt
  2013-08-11  6:17           ` Maarten Lankhorst
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-11  5:36 UTC (permalink / raw)
  To: Dave Airlie; +Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

On Sun, 2013-08-11 at 10:41 +1000, Benjamin Herrenschmidt wrote:
> Now, to do that, I need a better understanding of the various things
> in there since I'm not familiar with nouveau at all. What I think I've
> figured out is with a few questions, it would be awesome if you could
> answer them so I can have a shot at fixing it all :-)

Ok, a few more questions :-)

 - in struct nouveau_mm_node, what unit are "offset" and "length" ?

They *seem* to be hard wired to be in units of 4k (regardless of either
of system page size) which I assume means card small page size, but
"offset" is in bytes ? At least according to my parsing of the code in
nouveau_vm_map_at().

The big question then is whether that represents card address space or
system address space... I assume the former right ? So a function like
nouveau_vm_map_at() is solely concerned with mapping a piece of card
address space in the card VM and thus shouldn't be concerned by the
system PAGE_SIZE at all, right ?

IE. The only one we should actually care about here is
nouveau_vm_map_sg_table() or am I missing an important piece of the
puzzle ?

Additionally, nv41.c has only map_sg() callbacks, no map() callbacks,
should I assume that means that nouveau_vm_map() and nouveau_vm_map_at()
will never be called on these ?

 - In vm/base.c this construct appears regulary:

    struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];

Which makes me believe we have separate page tables for small vs. large
pages (in card mmu) (though I assume big is always 0 on nv40 unless
missed something, I want to make sure I'm not breaking everything
else...).

Thus I assume that each "pte" in a "big" page table maps a page size
of 1 << vmm->lpg_shift, is that correct ?

vmm->pgt_bits is always the same however, thus I assume that PDEs always
map the same amount of space, but regions for large pages just have
fewer PTEs, which seem to correspond to what the code does here:

	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;

- My basic idea is to move the card page vs system PAGE breakdown
up into vm/base.c, which seems reasonably simple in the case of
nouveau_vm_map_sg_table() since we only call vmm->map_sg for one PTE at
a time, but things get a bit trickier with nouveau_vm_map_sg which
passes the whole page list down.

Following my idea would mean essentially also making it operate one PTE
at a time, thus potentially reducing the performance of the map
operations.

One option is to special case the PAGE_SIZE==12 case to keep the
existing behaviour and operate in "reduced" (one PTE per call)
mode in the other case but I dislike special casing like that (bitrots,
one code path gets untested/unfixed, etc...)

Another option would be to make struct nouveau_mem *mem ->pages always
be an array of 4k (ie, create a breakdown when constructing that list),
but I have yet to figure out what the impact would be (where that stuff
gets created) and whether this is realistic at all...

 - struct nouveau_mem is called "node" in various places (in
nouveau_ttm) which is confusing. What is the relationship between
nouveau_mem, nouveau_mm and nouveau_mm_node ? nouveau_mem seems to be
having things such as "offset" in multiple of PAGE_SIZE, but have a
"page_shift" member that appears to match the buffer object page_shift,
hence seem to indicate that it is a card page shift...

Would it be possible, maybe, to add comments next to the fields in
those various data structure indicating in which units they are ?

 - Similar confusion arises with things like struct ttm_mem_reg *mem.
For example, in nouveau_ttm.c, I see statements like:

	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
			     NV_MEM_ACCESS_RW, &node->vma[0]);

Which seems to indicate that mem->num_pages is in multiple of 4k always,
though I would have though that a ttm object was in multiple of
PAGE_SIZE, am I wrong ?

Especially since the same object is later populated using:

	mem->start   = node->vma[0].offset >> PAGE_SHIFT;

So the "start" member is in units of PAGE_SHIFT here, not 12.

So I'm still a bit confused :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  5:36         ` Benjamin Herrenschmidt
@ 2013-08-11  6:17           ` Maarten Lankhorst
  2013-08-11  7:06             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Maarten Lankhorst @ 2013-08-11  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Stéphane Marchesin

Op 11-08-13 07:36, Benjamin Herrenschmidt schreef:
> On Sun, 2013-08-11 at 10:41 +1000, Benjamin Herrenschmidt wrote:
>> Now, to do that, I need a better understanding of the various things
>> in there since I'm not familiar with nouveau at all. What I think I've
>> figured out is with a few questions, it would be awesome if you could
>> answer them so I can have a shot at fixing it all :-)
> Ok, a few more questions :-)
>
>  - in struct nouveau_mm_node, what unit are "offset" and "length" ?
Depends on the context. It's re-used a few times. In case of nouveau_vm it's multiples of 1<<12, which is the smallest unit
a card can address.
> They *seem* to be hard wired to be in units of 4k (regardless of either
> of system page size) which I assume means card small page size, but
> "offset" is in bytes ? At least according to my parsing of the code in
> nouveau_vm_map_at().
Correct.
> The big question then is whether that represents card address space or
> system address space... I assume the former right ? So a function like
> nouveau_vm_map_at() is solely concerned with mapping a piece of card
> address space in the card VM and thus shouldn't be concerned by the
> system PAGE_SIZE at all, right ?
Former, but the code entangles system PAGE_SIZE and card PAGE_SIZE/SHIFT/MASK in some cases.

> IE. The only one we should actually care about here is
> nouveau_vm_map_sg_table() or am I missing an important piece of the
> puzzle ?
nouveau_vm_map_sg too.

nouveau_vm_map is special, and also used to map VRAM into BAR1/BAR3 by subdev/bar code.
> Additionally, nv41.c has only map_sg() callbacks, no map() callbacks,
> should I assume that means that nouveau_vm_map() and nouveau_vm_map_at()
> will never be called on these ?
Correct. all cards before the nv50 family have no real vm. the BAR used for vram is just an identity mapping,
not the entirety of VRAM may be accessible to the system.
>  - In vm/base.c this construct appears regulary:
>
>     struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
>
> Which makes me believe we have separate page tables for small vs. large
> pages (in card mmu) (though I assume big is always 0 on nv40 unless
> missed something, I want to make sure I'm not breaking everything
> else...).
>
> Thus I assume that each "pte" in a "big" page table maps a page size
> of 1 << vmm->lpg_shift, is that correct ?
Correct, nv50+ are the only ones that support large pages.
> vmm->pgt_bits is always the same however, thus I assume that PDEs always
> map the same amount of space, but regions for large pages just have
> fewer PTEs, which seem to correspond to what the code does here:
>
> 	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>
> - My basic idea is to move the card page vs system PAGE breakdown
> up into vm/base.c, which seems reasonably simple in the case of
> nouveau_vm_map_sg_table() since we only call vmm->map_sg for one PTE at
> a time, but things get a bit trickier with nouveau_vm_map_sg which
> passes the whole page list down.
>
> Following my idea would mean essentially also making it operate one PTE
> at a time, thus potentially reducing the performance of the map
> operations.
>
> One option is to special case the PAGE_SIZE==12 case to keep the
> existing behaviour and operate in "reduced" (one PTE per call)
> mode in the other case but I dislike special casing like that (bitrots,
> one code path gets untested/unfixed, etc...)
>
> Another option would be to make struct nouveau_mem *mem ->pages always
> be an array of 4k (ie, create a breakdown when constructing that list),
> but I have yet to figure out what the impact would be (where that stuff
> gets created) and whether this is realistic at all...
>
>  - struct nouveau_mem is called "node" in various places (in
> nouveau_ttm) which is confusing. What is the relationship between
> nouveau_mem, nouveau_mm and nouveau_mm_node ? nouveau_mem seems to be
> having things such as "offset" in multiple of PAGE_SIZE, but have a
> "page_shift" member that appears to match the buffer object page_shift,
> hence seem to indicate that it is a card page shift...
>
> Would it be possible, maybe, to add comments next to the fields in
> those various data structure indicating in which units they are ?
>
>  - Similar confusion arises with things like struct ttm_mem_reg *mem.
> For example, in nouveau_ttm.c, I see statements like:
>
> 	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
> 			     NV_MEM_ACCESS_RW, &node->vma[0]);
>
> Which seems to indicate that mem->num_pages is in multiple of 4k always,
> though I would have though that a ttm object was in multiple of
> PAGE_SIZE, am I wrong ?
>
> Especially since the same object is later populated using:
>
> 	mem->start   = node->vma[0].offset >> PAGE_SHIFT;
>
> So the "start" member is in units of PAGE_SHIFT here, not 12.
>
> So I'm still a bit confused :-)
>
The fun has been doubled because TTM expects PAGE units, so some of the PAGE_SHIFT's are
genuine. Some may be a result of PAGE_SHIFT == 12, so honestly I don't know the specific ones.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  6:17           ` Maarten Lankhorst
@ 2013-08-11  7:06             ` Benjamin Herrenschmidt
  2013-08-11  8:04               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-11  7:06 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

On Sun, 2013-08-11 at 08:17 +0200, Maarten Lankhorst wrote:
> > So I'm still a bit confused :-)
> >
> The fun has been doubled because TTM expects PAGE units, so some of
> the PAGE_SHIFT's are
> genuine. Some may be a result of PAGE_SHIFT == 12, so honestly I don't
> know the specific ones.

Right, and the other way around too :-)

I think I found at least two cases where "12" was used where it should
have been PAGE_SHIFT (basically ttm_mem_reg->num_pages). This
is only the tip of the iceberg, so this isn't a formal patch submission,
but I would appreciate your thought as to whether the below is correct
(and thus I'm on the right track) :

--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
 {
        struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
        struct nouveau_mem *node = mem->mm_node;
-       u64 size = mem->num_pages << 12;
+       u64 size = mem->num_pages << PAGE_SHIFT;
 
        if (ttm->sg) {
                node->sg = ttm->sg;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nou
index 19e3757..f0629de 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -252,8 +252,8 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
 
        node->page_shift = 12;
 
-       ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
-                            NV_MEM_ACCESS_RW, &node->vma[0]);
+       ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
+                            node->page_shift, NV_MEM_ACCESS_RW, &node->vma[0]);
        if (ret) {
                kfree(node);

Thanks !

Cheers,
Ben.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  7:06             ` Benjamin Herrenschmidt
@ 2013-08-11  8:04               ` Benjamin Herrenschmidt
  2013-08-11  9:02                 ` Maarten Lankhorst
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-11  8:04 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

On Sun, 2013-08-11 at 17:06 +1000, Benjamin Herrenschmidt wrote:

> I think I found at least two cases where "12" was used where it should
> have been PAGE_SHIFT (basically ttm_mem_reg->num_pages). This
> is only the tip of the iceberg, so this isn't a formal patch submission,
> but I would appreciate your thought as to whether the below is correct
> (and thus I'm on the right track) :

This patch (which needs cleanups, and probably be broken down for
bisectability) makes it work for me. I've disabled nouveau_dri for now
as this has its own problems related to Ajax recent gallium endian
changes.

Note the horrible duplication of nouveau_vm_map_sg...

I think to fix it "cleanly" we probably need to slightly change the
->map_sg API to the vmmr. However, I do have a question whose answer
might make things a LOT easier if "yes" can make things a lot easier:

Can we guarantee that that such an sg object (I assume this is always
a ttm_bo getting placed in the "TT" memory, correct ?) has an alignment
in *card VM space* that is a multiple of the *system page size* ? Ie,
can we make that happen easily ?

For example, if my system page size is 64k, can we guarantee that it
will always be mapped in the card at a virtual address that is 64k
aligned ?

If that is the case, then we *know* that a given page in the page list
passed to nouveau_vm_map_sg() will never cross a pde boundary (will
always be fully contained in the bottom level of the page table). That
allows to simplify the code a bit, and maybe to write a unified function
that can still pass down page lists to the vmmr....

On the other hand, if we have to handle misalignment, then we may as
well stick to 1 PTE at a time like my patch does to avoid horrible
complications.

Cheers,
Ben.

diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
index 5c7433d..c314a5f 100644
--- a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
+++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
@@ -190,8 +190,8 @@ nv40_fifo_chan_ctor(struct nouveau_object *parent,
 	if (size < sizeof(*args))
 		return -EINVAL;
 
-	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0xc00000,
-					  0x1000, args->pushbuf,
+	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0x800000,
+					  0x10000, args->pushbuf,
 					  (1ULL << NVDEV_ENGINE_DMAOBJ) |
 					  (1ULL << NVDEV_ENGINE_SW) |
 					  (1ULL << NVDEV_ENGINE_GR) |
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
index ef3133e..5833851 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
@@ -84,10 +84,11 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 {
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
-	int big = vma->node->type != vmm->spg_shift;
+	u32 shift = vma->node->type;
+	int big = shift != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 bits = shift - 12;
+	u32 num  = length >> shift;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
 	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
 	u32 max  = 1 << (vmm->pgt_bits - bits);
@@ -98,7 +99,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 
 	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
 		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
-		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
+		sglen = sg_dma_len(sg) >> shift;
 
 		end = pte + sglen;
 		if (unlikely(end >= max))
@@ -106,7 +107,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 		len = end - pte;
 
 		for (m = 0; m < len; m++) {
-			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+			dma_addr_t addr = sg_dma_address(sg) + (m << shift);
 
 			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
 			num--;
@@ -121,7 +122,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 		}
 		if (m < sglen) {
 			for (; m < sglen; m++) {
-				dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+				dma_addr_t addr = sg_dma_address(sg) + (m << shift);
 
 				vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
 				num--;
@@ -136,6 +137,7 @@ finish:
 	vmm->flush(vm);
 }
 
+#if PAGE_SHIFT == 12
 void
 nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
 		  struct nouveau_mem *mem)
@@ -143,10 +145,11 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
 	dma_addr_t *list = mem->pages;
-	int big = vma->node->type != vmm->spg_shift;
+	u32 shift = vma->node->type;
+	int big = shift != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 bits = shift - 12;
+	u32 num  = length >> shift;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
 	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
 	u32 max  = 1 << (vmm->pgt_bits - bits);
@@ -173,6 +176,52 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
 
 	vmm->flush(vm);
 }
+#else
+void
+nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
+		  struct nouveau_mem *mem)
+{
+	struct nouveau_vm *vm = vma->vm;
+	struct nouveau_vmmgr *vmm = vm->vmm;
+	dma_addr_t *list = mem->pages;
+	u32 shift = vma->node->type;
+	int big = shift != vmm->spg_shift;
+	u32 offset = vma->node->offset + (delta >> 12);
+	u32 bits = shift - 12;
+	u32 num  = length >> shift;
+	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
+	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
+	u32 max  = 1 << (vmm->pgt_bits - bits);
+	u32 sub_cnt = 1 << (PAGE_SHIFT - shift);
+	u32 sub_rem = 0;
+	u64 phys = 0;
+
+
+	/* XXX This will not work for a big mapping ! */
+	WARN_ON_ONCE(big);
+
+	while (num) {
+		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
+
+		if (sub_rem == 0) {
+			phys = *(list++);
+			sub_rem = sub_cnt;
+		}
+		vmm->map_sg(vma, pgt, mem, pte, 1, &phys);
+
+		num  -= 1;
+		pte  += 1;
+		sub_rem -= 1;
+		phys += 1 << shift;
+		if (unlikely(pte >= max)) {
+			pde++;
+			pte = 0;
+		}
+	}
+
+	vmm->flush(vm);
+}
+#endif
 
 void
 nouveau_vm_unmap_at(struct nouveau_vma *vma, u64 delta, u64 length)
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
index ed45437..f7e2311 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
@@ -39,14 +39,10 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
 {
 	pte = 0x00008 + (pte * 4);
 	while (cnt) {
-		u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
 		u32 phys = (u32)*list++;
-		while (cnt && page--) {
-			nv_wo32(pgt, pte, phys | 3);
-			phys += NV04_PDMA_PAGE;
-			pte += 4;
-			cnt -= 1;
-		}
+		nv_wo32(pgt, pte, phys | 3);
+		pte += 4;
+		cnt -= 1;
 	}
 }
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
index 064c762..a78f624 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
@@ -43,14 +43,10 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
 {
 	pte = pte * 4;
 	while (cnt) {
-		u32 page = PAGE_SIZE / NV41_GART_PAGE;
 		u64 phys = (u64)*list++;
-		while (cnt && page--) {
-			nv_wo32(pgt, pte, (phys >> 7) | 1);
-			phys += NV41_GART_PAGE;
-			pte += 4;
-			cnt -= 1;
-		}
+		nv_wo32(pgt, pte, (phys >> 7) | 1);
+		pte += 4;
+		cnt -= 1;
 	}
 }
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index af20fba..694024d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -226,7 +226,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
 	nvbo->page_shift = 12;
 	if (drm->client.base.vm) {
 		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
-			nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
+			nvbo->page_shift = lpg_shift;
 	}
 
 	nouveau_bo_fixup_align(nvbo, flags, &align, &size);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index ca5492a..494cf88 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
 {
 	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
 	struct nouveau_mem *node = mem->mm_node;
-	u64 size = mem->num_pages << 12;
+	u64 size = mem->num_pages << PAGE_SHIFT;
 
 	if (ttm->sg) {
 		node->sg = ttm->sg;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 19e3757..f0629de 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -252,8 +252,8 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
 
 	node->page_shift = 12;
 
-	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
-			     NV_MEM_ACCESS_RW, &node->vma[0]);
+	ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
+			     node->page_shift, NV_MEM_ACCESS_RW, &node->vma[0]);
 	if (ret) {
 		kfree(node);
 		return ret;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  8:04               ` Benjamin Herrenschmidt
@ 2013-08-11  9:02                 ` Maarten Lankhorst
  2013-08-11  9:35                   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Maarten Lankhorst @ 2013-08-11  9:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

Op 11-08-13 10:04, Benjamin Herrenschmidt schreef:
> On Sun, 2013-08-11 at 17:06 +1000, Benjamin Herrenschmidt wrote:
>
>> I think I found at least two cases where "12" was used where it should
>> have been PAGE_SHIFT (basically ttm_mem_reg->num_pages). This
>> is only the tip of the iceberg, so this isn't a formal patch submission,
>> but I would appreciate your thought as to whether the below is correct
>> (and thus I'm on the right track) :
> This patch (which needs cleanups, and probably be broken down for
> bisectability) makes it work for me. I've disabled nouveau_dri for now
> as this has its own problems related to Ajax recent gallium endian
> changes.
>
> Note the horrible duplication of nouveau_vm_map_sg...
>
> I think to fix it "cleanly" we probably need to slightly change the
> ->map_sg API to the vmmr. However, I do have a question whose answer
> might make things a LOT easier if "yes" can make things a lot easier:
>
> Can we guarantee that that such an sg object (I assume this is always
> a ttm_bo getting placed in the "TT" memory, correct ?) has an alignment
> in *card VM space* that is a multiple of the *system page size* ? Ie,
> can we make that happen easily ?
>
> For example, if my system page size is 64k, can we guarantee that it
> will always be mapped in the card at a virtual address that is 64k
> aligned ?
>
> If that is the case, then we *know* that a given page in the page list
> passed to nouveau_vm_map_sg() will never cross a pde boundary (will
> always be fully contained in the bottom level of the page table). That
> allows to simplify the code a bit, and maybe to write a unified function
> that can still pass down page lists to the vmmr....
>
> On the other hand, if we have to handle misalignment, then we may as
> well stick to 1 PTE at a time like my patch does to avoid horrible
> complications.
>
> Cheers,
> Ben.
>
> diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> index 5c7433d..c314a5f 100644
> --- a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> +++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> @@ -190,8 +190,8 @@ nv40_fifo_chan_ctor(struct nouveau_object *parent,
>  	if (size < sizeof(*args))
>  		return -EINVAL;
>  
> -	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0xc00000,
> -					  0x1000, args->pushbuf,
> +	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0x800000,
> +					  0x10000, args->pushbuf,
>  					  (1ULL << NVDEV_ENGINE_DMAOBJ) |
>  					  (1ULL << NVDEV_ENGINE_SW) |
>  					  (1ULL << NVDEV_ENGINE_GR) |
Why the size change?

> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> index ef3133e..5833851 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> @@ -84,10 +84,11 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>  {
>  	struct nouveau_vm *vm = vma->vm;
>  	struct nouveau_vmmgr *vmm = vm->vmm;
> -	int big = vma->node->type != vmm->spg_shift;
> +	u32 shift = vma->node->type;
> +	int big = shift != vmm->spg_shift;
>  	u32 offset = vma->node->offset + (delta >> 12);
> -	u32 bits = vma->node->type - 12;
> -	u32 num  = length >> vma->node->type;
> +	u32 bits = shift - 12;
> +	u32 num  = length >> shift;
>  	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
>  	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>  	u32 max  = 1 << (vmm->pgt_bits - bits);
> @@ -98,7 +99,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>  
>  	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
>  		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> -		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> +		sglen = sg_dma_len(sg) >> shift;
>  
>  		end = pte + sglen;
>  		if (unlikely(end >= max))
Please add a WARN_ON(big); in map_sg and map_sg_table if you do this.
> @@ -106,7 +107,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>  		len = end - pte;
>  
>  		for (m = 0; m < len; m++) {
> -			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> +			dma_addr_t addr = sg_dma_address(sg) + (m << shift);
>  
>  			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
>  			num--;
> @@ -121,7 +122,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>  		}
>  		if (m < sglen) {
>  			for (; m < sglen; m++) {
> -				dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> +				dma_addr_t addr = sg_dma_address(sg) + (m << shift);
>  
>  				vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
>  				num--;
> @@ -136,6 +137,7 @@ finish:
>  	vmm->flush(vm);
>  }
>  
> +#if PAGE_SHIFT == 12
>  void
>  nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>  		  struct nouveau_mem *mem)
> @@ -143,10 +145,11 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>  	struct nouveau_vm *vm = vma->vm;
>  	struct nouveau_vmmgr *vmm = vm->vmm;
>  	dma_addr_t *list = mem->pages;
> -	int big = vma->node->type != vmm->spg_shift;
> +	u32 shift = vma->node->type;
> +	int big = shift != vmm->spg_shift;
>  	u32 offset = vma->node->offset + (delta >> 12);
> -	u32 bits = vma->node->type - 12;
> -	u32 num  = length >> vma->node->type;
> +	u32 bits = shift - 12;
> +	u32 num  = length >> shift;
>  	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
>  	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>  	u32 max  = 1 << (vmm->pgt_bits - bits);
> @@ -173,6 +176,52 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>  
>  	vmm->flush(vm);
>  }
> +#else
> +void
> +nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
> +		  struct nouveau_mem *mem)
> +{
> +	struct nouveau_vm *vm = vma->vm;
> +	struct nouveau_vmmgr *vmm = vm->vmm;
> +	dma_addr_t *list = mem->pages;
> +	u32 shift = vma->node->type;
> +	int big = shift != vmm->spg_shift;
> +	u32 offset = vma->node->offset + (delta >> 12);
> +	u32 bits = shift - 12;
> +	u32 num  = length >> shift;
> +	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> +	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> +	u32 max  = 1 << (vmm->pgt_bits - bits);
> +	u32 sub_cnt = 1 << (PAGE_SHIFT - shift);
> +	u32 sub_rem = 0;
> +	u64 phys = 0;
> +
> +
> +	/* XXX This will not work for a big mapping ! */
> +	WARN_ON_ONCE(big);
> +
> +	while (num) {
> +		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> +
> +		if (sub_rem == 0) {
> +			phys = *(list++);
> +			sub_rem = sub_cnt;
> +		}
> +		vmm->map_sg(vma, pgt, mem, pte, 1, &phys);
> +
> +		num  -= 1;
> +		pte  += 1;
> +		sub_rem -= 1;
> +		phys += 1 << shift;
> +		if (unlikely(pte >= max)) {
> +			pde++;
> +			pte = 0;
> +		}
> +	}
> +
> +	vmm->flush(vm);
> +}
> +#endif

Considering that map_sg in nv04-50 takes PAGE_SIZE pages, it would be easier to fix map_sg for nv50 and nvc0, this would mean less fixing in map_sg/map_sg_table. I don't like the duplicate definition, and the extra for loop in map_sg will be compiled out when page_size == 12.

Oh fun fact:
on nv50 if PAGE_SIZE = 64K you should just use large pages on the nvidia card for everything. :D
I have no idea if it works for bar, but you could test..
In subdev/bar/nv50.c kmap/umap change the 12 argument to 16, and change vm->pgt[0].obj[0] to vm->pgt[0].obj[1] and vm->pgt[0].refcount[0] to vm->pgt[0].refcount[1].

for nvc0 that would require 128K pages, but I believe it should work too.

>  
>  void
>  nouveau_vm_unmap_at(struct nouveau_vma *vma, u64 delta, u64 length)
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> index ed45437..f7e2311 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> @@ -39,14 +39,10 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>  {
>  	pte = 0x00008 + (pte * 4);
>  	while (cnt) {
> -		u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
>  		u32 phys = (u32)*list++;
> -		while (cnt && page--) {
> -			nv_wo32(pgt, pte, phys | 3);
> -			phys += NV04_PDMA_PAGE;
> -			pte += 4;
> -			cnt -= 1;
> -		}
> +		nv_wo32(pgt, pte, phys | 3);
> +		pte += 4;
> +		cnt -= 1;
>  	}
>  }
>  
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> index 064c762..a78f624 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> @@ -43,14 +43,10 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>  {
>  	pte = pte * 4;
>  	while (cnt) {
> -		u32 page = PAGE_SIZE / NV41_GART_PAGE;
>  		u64 phys = (u64)*list++;
> -		while (cnt && page--) {
> -			nv_wo32(pgt, pte, (phys >> 7) | 1);
> -			phys += NV41_GART_PAGE;
> -			pte += 4;
> -			cnt -= 1;
> -		}
> +		nv_wo32(pgt, pte, (phys >> 7) | 1);
> +		pte += 4;
> +		cnt -= 1;
>  	}
>  }
See above^, it's better if you could fixup nv50/nvc0.c instead.
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index af20fba..694024d 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -226,7 +226,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
>  	nvbo->page_shift = 12;
>  	if (drm->client.base.vm) {
>  		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
> -			nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
> +			nvbo->page_shift = lpg_shift;
>  	}
>  
>  	nouveau_bo_fixup_align(nvbo, flags, &align, &size);
Hm.. I don't know if it will be an issue here. But I'm concerned about the cases where nouveau_vm can end up unaligned.
This will not be an issue for the bar mappings, because they map the entire bo, and bo's are always page aligned.
See nouveau_bo_fixup_align.

But the channel vm mappings are no longer guaranteed to be page aligned. In fact it's very likely they are all misaligned due to some vm allocations
done when mapping a channel. This is only a problem on nv50+ though. Probably not an issue you're hitting.
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> index ca5492a..494cf88 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
>  {
>  	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
>  	struct nouveau_mem *node = mem->mm_node;
> -	u64 size = mem->num_pages << 12;
> +	u64 size = mem->num_pages << PAGE_SHIFT;
>  
>  	if (ttm->sg) {
>  		node->sg = ttm->sg;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> index 19e3757..f0629de 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -252,8 +252,8 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
>  
>  	node->page_shift = 12;
>  
> -	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
> -			     NV_MEM_ACCESS_RW, &node->vma[0]);
> +	ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
> +			     node->page_shift, NV_MEM_ACCESS_RW, &node->vma[0]);
>  	if (ret) {
>  		kfree(node);
>  		return ret;
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  9:02                 ` Maarten Lankhorst
@ 2013-08-11  9:35                   ` Benjamin Herrenschmidt
  2013-08-29  6:49                     ` Ben Skeggs
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-11  9:35 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: nouveau, Ben Skeggs, dri-devel, Stéphane Marchesin

On Sun, 2013-08-11 at 11:02 +0200, Maarten Lankhorst wrote:

> > diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> > index 5c7433d..c314a5f 100644
> > --- a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> > +++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
> > @@ -190,8 +190,8 @@ nv40_fifo_chan_ctor(struct nouveau_object *parent,
> >  	if (size < sizeof(*args))
> >  		return -EINVAL;
> >  
> > -	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0xc00000,
> > -					  0x1000, args->pushbuf,
> > +	ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0x800000,
> > +					  0x10000, args->pushbuf,
> >  					  (1ULL << NVDEV_ENGINE_DMAOBJ) |
> >  					  (1ULL << NVDEV_ENGINE_SW) |
> >  					  (1ULL << NVDEV_ENGINE_GR) |
> Why the size change?

This reverts the value to older ones, however that patch might not be
needed anymore (I was carrying it from Dave but if we don't map the
registers into userspace we shouldn't need to force align them).

> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > index ef3133e..5833851 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > @@ -84,10 +84,11 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
> >  {
> >  	struct nouveau_vm *vm = vma->vm;
> >  	struct nouveau_vmmgr *vmm = vm->vmm;
> > -	int big = vma->node->type != vmm->spg_shift;
> > +	u32 shift = vma->node->type;
> > +	int big = shift != vmm->spg_shift;
> >  	u32 offset = vma->node->offset + (delta >> 12);
> > -	u32 bits = vma->node->type - 12;
> > -	u32 num  = length >> vma->node->type;
> > +	u32 bits = shift - 12;
> > +	u32 num  = length >> shift;
> >  	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> >  	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> >  	u32 max  = 1 << (vmm->pgt_bits - bits);
> > @@ -98,7 +99,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
> >  
> >  	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> >  		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > -		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> > +		sglen = sg_dma_len(sg) >> shift;
> >  
> >  		end = pte + sglen;
> >  		if (unlikely(end >= max))
> Please add a WARN_ON(big); in map_sg and map_sg_table if you do this.

So that's debatable :-)

The above code is map_sg. Arguably my patch *fixes* using it with card
large pages :-)

IE, Look at the "usual" case (PAGE_SHIFT=12). Today, the above means
sglen will be in units of small pages. But everything else in that loop
operates in units of whatever is represented by the pte, which can
either be 4k or larger. 

So adding "sglen" to "pte" was never right for shift != 12 before
(regardless of PAGE_SHIFT).

With my patch, it's more correct, so as such, adding a WARN_ON here
wouldn't be "if I do this" :-)

Now, the "big" case still cannot really work here with PAGE_SHIFT=12,
because that would require the sg segments to be multiple of
"shift" (the card large page) and that is not going to be the case.

So funnily enough, we *could* use card large pages of 64k ("big") if ...
we had PAGE_SHIFT=16 ... which we do on ppc64 :-) But not anywhere else.

But yes, the point remains, in the general case, that function cannot
work for the "big" case, so I wonder if we should catch it with a
WARN_ON and maybe simplify the code a bunch while at it...

> > @@ -106,7 +107,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
> >  		len = end - pte;
> >  
> >  		for (m = 0; m < len; m++) {
> > -			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> > +			dma_addr_t addr = sg_dma_address(sg) + (m << shift);
> >  
> >  			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> >  			num--;
> > @@ -121,7 +122,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
> >  		}
> >  		if (m < sglen) {
> >  			for (; m < sglen; m++) {
> > -				dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> > +				dma_addr_t addr = sg_dma_address(sg) + (m << shift);
> >  
> >  				vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> >  				num--;
> > @@ -136,6 +137,7 @@ finish:
> >  	vmm->flush(vm);
> >  }
> >  
> > +#if PAGE_SHIFT == 12
> >  void
> >  nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
> >  		  struct nouveau_mem *mem)
> > @@ -143,10 +145,11 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
> >  	struct nouveau_vm *vm = vma->vm;
> >  	struct nouveau_vmmgr *vmm = vm->vmm;
> >  	dma_addr_t *list = mem->pages;
> > -	int big = vma->node->type != vmm->spg_shift;
> > +	u32 shift = vma->node->type;
> > +	int big = shift != vmm->spg_shift;
> >  	u32 offset = vma->node->offset + (delta >> 12);
> > -	u32 bits = vma->node->type - 12;
> > -	u32 num  = length >> vma->node->type;
> > +	u32 bits = shift - 12;
> > +	u32 num  = length >> shift;
> >  	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> >  	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> >  	u32 max  = 1 << (vmm->pgt_bits - bits);
> > @@ -173,6 +176,52 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,

So the above is the existing one which I kept mostly intact ... but
cannot work for shift != 12 either for the same reasons so here too, if
we could simplify that ...
  
> >  	vmm->flush(vm);
> >  }
> > +#else
> > +void
> > +nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
> > +		  struct nouveau_mem *mem)
> > +{
> > +	struct nouveau_vm *vm = vma->vm;
> > +	struct nouveau_vmmgr *vmm = vm->vmm;
> > +	dma_addr_t *list = mem->pages;
> > +	u32 shift = vma->node->type;
> > +	int big = shift != vmm->spg_shift;
> > +	u32 offset = vma->node->offset + (delta >> 12);
> > +	u32 bits = shift - 12;
> > +	u32 num  = length >> shift;
> > +	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> > +	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> > +	u32 max  = 1 << (vmm->pgt_bits - bits);
> > +	u32 sub_cnt = 1 << (PAGE_SHIFT - shift);
> > +	u32 sub_rem = 0;
> > +	u64 phys = 0;
> > +
> > +
> > +	/* XXX This will not work for a big mapping ! */
> > +	WARN_ON_ONCE(big);
> > +
> > +	while (num) {
> > +		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > +
> > +		if (sub_rem == 0) {
> > +			phys = *(list++);
> > +			sub_rem = sub_cnt;
> > +		}
> > +		vmm->map_sg(vma, pgt, mem, pte, 1, &phys);
> > +
> > +		num  -= 1;
> > +		pte  += 1;
> > +		sub_rem -= 1;
> > +		phys += 1 << shift;
> > +		if (unlikely(pte >= max)) {
> > +			pde++;
> > +			pte = 0;
> > +		}
> > +	}
> > +
> > +	vmm->flush(vm);
> > +}
> > +#endif
> 
> Considering that map_sg in nv04-50 takes PAGE_SIZE pages, it would be easier to fix
> map_sg for nv50 and nvc0, this would mean less fixing in map_sg/map_sg_table.

Sorry, I'm not sure what you mean exactly ... The code *today* tries to
handle things at the low level (vmm->map_sg) and that cannot work the
way it's done. I'm removing that. Or rather, that cannot work unless we
can guarantee that there will be no crossing of PTE page boundary (no
crossing of pde) by the PAGE_SIZE page, which I think we cannot (see my
discussion in my email asking if we could enforce that somewhat).

Additionally the current code is broken in that the upper layer in
vm/base.c doesn't increment "pte" by the right amount.

Now, if those two assertions can be made always true:

 - Those two functions (map_sg and map_sg_table) never deal with the
"big" case.

 - An object is always mapped at a card address that is a multiple
of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde boundaries
when mapped)

Then we can probably simplify the code significantly *and* go back to
handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking them
up a level above like I do. 

> I don't like the duplicate definition, and the extra for loop in map_sg will be compiled out when page_size == 12.

Sort-of. Right now, my code prevents calling vmm->map_sg with more than
"1" as len which reduces perf in the PAGE_SHIFT=12 case, which is why I
did the duplication.

However this is just an illustration. I'm fully aware that this is not
good as-is, I'm just poking to see what you guys think is the best
approach to *properly* fix it.

> Oh fun fact:
> on nv50 if PAGE_SIZE = 64K you should just use large pages on the nvidia card for everything. :D

I don't think I care at this stage but heh ... 

> I have no idea if it works for bar, but you could test..
> In subdev/bar/nv50.c kmap/umap change the 12 argument to 16, and change vm->pgt[0].obj[0] to vm->pgt[0].obj[1] and vm->pgt[0].refcount[0] to vm->pgt[0].refcount[1].
> 
> for nvc0 that would require 128K pages, but I believe it should work too.

I don't think we want to go there right now, there is little benefit for
the vast majority of platforms (x86 and ARM). Let's stick to making
ppc64 "work" and not bother too much with things that will never be used
in practice :-)

I'd rather make the code simpler by removing the "big" case from those
functions if it's never going to be used.

Cheers,
Ben.

> >  
> >  void
> >  nouveau_vm_unmap_at(struct nouveau_vma *vma, u64 delta, u64 length)
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > index ed45437..f7e2311 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > @@ -39,14 +39,10 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
> >  {
> >  	pte = 0x00008 + (pte * 4);
> >  	while (cnt) {
> > -		u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
> >  		u32 phys = (u32)*list++;
> > -		while (cnt && page--) {
> > -			nv_wo32(pgt, pte, phys | 3);
> > -			phys += NV04_PDMA_PAGE;
> > -			pte += 4;
> > -			cnt -= 1;
> > -		}
> > +		nv_wo32(pgt, pte, phys | 3);
> > +		pte += 4;
> > +		cnt -= 1;
> >  	}
> >  }
> >  
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > index 064c762..a78f624 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > @@ -43,14 +43,10 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
> >  {
> >  	pte = pte * 4;
> >  	while (cnt) {
> > -		u32 page = PAGE_SIZE / NV41_GART_PAGE;
> >  		u64 phys = (u64)*list++;
> > -		while (cnt && page--) {
> > -			nv_wo32(pgt, pte, (phys >> 7) | 1);
> > -			phys += NV41_GART_PAGE;
> > -			pte += 4;
> > -			cnt -= 1;
> > -		}
> > +		nv_wo32(pgt, pte, (phys >> 7) | 1);
> > +		pte += 4;
> > +		cnt -= 1;
> >  	}
> >  }
> See above^, it's better if you could fixup nv50/nvc0.c instead.
> >  
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index af20fba..694024d 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -226,7 +226,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
> >  	nvbo->page_shift = 12;
> >  	if (drm->client.base.vm) {
> >  		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
> > -			nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
> > +			nvbo->page_shift = lpg_shift;
> >  	}
> >  
> >  	nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> Hm.. I don't know if it will be an issue here. But I'm concerned about the cases where nouveau_vm can end up unaligned.
> This will not be an issue for the bar mappings, because they map the entire bo, and bo's are always page aligned.
> See nouveau_bo_fixup_align.
> 
> But the channel vm mappings are no longer guaranteed to be page aligned. In fact it's very likely they are all misaligned due to some vm allocations
> done when mapping a channel. This is only a problem on nv50+ though. Probably not an issue you're hitting.
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > index ca5492a..494cf88 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
> >  {
> >  	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
> >  	struct nouveau_mem *node = mem->mm_node;
> > -	u64 size = mem->num_pages << 12;
> > +	u64 size = mem->num_pages << PAGE_SHIFT;
> >  
> >  	if (ttm->sg) {
> >  		node->sg = ttm->sg;
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > index 19e3757..f0629de 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > @@ -252,8 +252,8 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
> >  
> >  	node->page_shift = 12;
> >  
> > -	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
> > -			     NV_MEM_ACCESS_RW, &node->vma[0]);
> > +	ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
> > +			     node->page_shift, NV_MEM_ACCESS_RW, &node->vma[0]);
> >  	if (ret) {
> >  		kfree(node);
> >  		return ret;
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-11  9:35                   ` Benjamin Herrenschmidt
@ 2013-08-29  6:49                     ` Ben Skeggs
  2013-11-29  6:01                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Skeggs @ 2013-08-29  6:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: nouveau@lists.freedesktop.org, Ben Skeggs,
	dri-devel@lists.freedesktop.org, Stéphane Marchesin

On Sun, Aug 11, 2013 at 7:35 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Sun, 2013-08-11 at 11:02 +0200, Maarten Lankhorst wrote:
>
>> > diff --git a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
>> > index 5c7433d..c314a5f 100644
>> > --- a/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
>> > +++ b/drivers/gpu/drm/nouveau/core/engine/fifo/nv40.c
>> > @@ -190,8 +190,8 @@ nv40_fifo_chan_ctor(struct nouveau_object *parent,
>> >     if (size < sizeof(*args))
>> >             return -EINVAL;
>> >
>> > -   ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0xc00000,
>> > -                                     0x1000, args->pushbuf,
>> > +   ret = nouveau_fifo_channel_create(parent, engine, oclass, 0, 0x800000,
>> > +                                     0x10000, args->pushbuf,
>> >                                       (1ULL << NVDEV_ENGINE_DMAOBJ) |
>> >                                       (1ULL << NVDEV_ENGINE_SW) |
>> >                                       (1ULL << NVDEV_ENGINE_GR) |
>> Why the size change?
>
> This reverts the value to older ones, however that patch might not be
> needed anymore (I was carrying it from Dave but if we don't map the
> registers into userspace we shouldn't need to force align them).
>
>> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
>> > index ef3133e..5833851 100644
>> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
>> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
>> > @@ -84,10 +84,11 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>> >  {
>> >     struct nouveau_vm *vm = vma->vm;
>> >     struct nouveau_vmmgr *vmm = vm->vmm;
>> > -   int big = vma->node->type != vmm->spg_shift;
>> > +   u32 shift = vma->node->type;
>> > +   int big = shift != vmm->spg_shift;
>> >     u32 offset = vma->node->offset + (delta >> 12);
>> > -   u32 bits = vma->node->type - 12;
>> > -   u32 num  = length >> vma->node->type;
>> > +   u32 bits = shift - 12;
>> > +   u32 num  = length >> shift;
>> >     u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
>> >     u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>> >     u32 max  = 1 << (vmm->pgt_bits - bits);
>> > @@ -98,7 +99,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>> >
>> >     for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
>> >             struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
>> > -           sglen = sg_dma_len(sg) >> PAGE_SHIFT;
>> > +           sglen = sg_dma_len(sg) >> shift;
>> >
>> >             end = pte + sglen;
>> >             if (unlikely(end >= max))
>> Please add a WARN_ON(big); in map_sg and map_sg_table if you do this.
>
> So that's debatable :-)
>
> The above code is map_sg. Arguably my patch *fixes* using it with card
> large pages :-)
>
> IE, Look at the "usual" case (PAGE_SHIFT=12). Today, the above means
> sglen will be in units of small pages. But everything else in that loop
> operates in units of whatever is represented by the pte, which can
> either be 4k or larger.
>
> So adding "sglen" to "pte" was never right for shift != 12 before
> (regardless of PAGE_SHIFT).
>
> With my patch, it's more correct, so as such, adding a WARN_ON here
> wouldn't be "if I do this" :-)
>
> Now, the "big" case still cannot really work here with PAGE_SHIFT=12,
> because that would require the sg segments to be multiple of
> "shift" (the card large page) and that is not going to be the case.
>
> So funnily enough, we *could* use card large pages of 64k ("big") if ...
> we had PAGE_SHIFT=16 ... which we do on ppc64 :-) But not anywhere else.
>
> But yes, the point remains, in the general case, that function cannot
> work for the "big" case, so I wonder if we should catch it with a
> WARN_ON and maybe simplify the code a bunch while at it...
>
>> > @@ -106,7 +107,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>> >             len = end - pte;
>> >
>> >             for (m = 0; m < len; m++) {
>> > -                   dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
>> > +                   dma_addr_t addr = sg_dma_address(sg) + (m << shift);
>> >
>> >                     vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
>> >                     num--;
>> > @@ -121,7 +122,7 @@ nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>> >             }
>> >             if (m < sglen) {
>> >                     for (; m < sglen; m++) {
>> > -                           dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
>> > +                           dma_addr_t addr = sg_dma_address(sg) + (m << shift);
>> >
>> >                             vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
>> >                             num--;
>> > @@ -136,6 +137,7 @@ finish:
>> >     vmm->flush(vm);
>> >  }
>> >
>> > +#if PAGE_SHIFT == 12
>> >  void
>> >  nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>> >               struct nouveau_mem *mem)
>> > @@ -143,10 +145,11 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>> >     struct nouveau_vm *vm = vma->vm;
>> >     struct nouveau_vmmgr *vmm = vm->vmm;
>> >     dma_addr_t *list = mem->pages;
>> > -   int big = vma->node->type != vmm->spg_shift;
>> > +   u32 shift = vma->node->type;
>> > +   int big = shift != vmm->spg_shift;
>> >     u32 offset = vma->node->offset + (delta >> 12);
>> > -   u32 bits = vma->node->type - 12;
>> > -   u32 num  = length >> vma->node->type;
>> > +   u32 bits = shift - 12;
>> > +   u32 num  = length >> shift;
>> >     u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
>> >     u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>> >     u32 max  = 1 << (vmm->pgt_bits - bits);
>> > @@ -173,6 +176,52 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>
> So the above is the existing one which I kept mostly intact ... but
> cannot work for shift != 12 either for the same reasons so here too, if
> we could simplify that ...
>
>> >     vmm->flush(vm);
>> >  }
>> > +#else
>> > +void
>> > +nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>> > +             struct nouveau_mem *mem)
>> > +{
>> > +   struct nouveau_vm *vm = vma->vm;
>> > +   struct nouveau_vmmgr *vmm = vm->vmm;
>> > +   dma_addr_t *list = mem->pages;
>> > +   u32 shift = vma->node->type;
>> > +   int big = shift != vmm->spg_shift;
>> > +   u32 offset = vma->node->offset + (delta >> 12);
>> > +   u32 bits = shift - 12;
>> > +   u32 num  = length >> shift;
>> > +   u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
>> > +   u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
>> > +   u32 max  = 1 << (vmm->pgt_bits - bits);
>> > +   u32 sub_cnt = 1 << (PAGE_SHIFT - shift);
>> > +   u32 sub_rem = 0;
>> > +   u64 phys = 0;
>> > +
>> > +
>> > +   /* XXX This will not work for a big mapping ! */
>> > +   WARN_ON_ONCE(big);
>> > +
>> > +   while (num) {
>> > +           struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
>> > +
>> > +           if (sub_rem == 0) {
>> > +                   phys = *(list++);
>> > +                   sub_rem = sub_cnt;
>> > +           }
>> > +           vmm->map_sg(vma, pgt, mem, pte, 1, &phys);
>> > +
>> > +           num  -= 1;
>> > +           pte  += 1;
>> > +           sub_rem -= 1;
>> > +           phys += 1 << shift;
>> > +           if (unlikely(pte >= max)) {
>> > +                   pde++;
>> > +                   pte = 0;
>> > +           }
>> > +   }
>> > +
>> > +   vmm->flush(vm);
>> > +}
>> > +#endif
>>
>> Considering that map_sg in nv04-50 takes PAGE_SIZE pages, it would be easier to fix
>> map_sg for nv50 and nvc0, this would mean less fixing in map_sg/map_sg_table.
>
> Sorry, I'm not sure what you mean exactly ... The code *today* tries to
> handle things at the low level (vmm->map_sg) and that cannot work the
> way it's done. I'm removing that. Or rather, that cannot work unless we
> can guarantee that there will be no crossing of PTE page boundary (no
> crossing of pde) by the PAGE_SIZE page, which I think we cannot (see my
> discussion in my email asking if we could enforce that somewhat).
>
> Additionally the current code is broken in that the upper layer in
> vm/base.c doesn't increment "pte" by the right amount.
>
> Now, if those two assertions can be made always true:
>
>  - Those two functions (map_sg and map_sg_table) never deal with the
> "big" case.
>
>  - An object is always mapped at a card address that is a multiple
> of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde boundaries
> when mapped)
I think these two restrictions are reasonable to enforce, and we should do so.

>
> Then we can probably simplify the code significantly *and* go back to
> handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking them
> up a level above like I do.
Sounds good!

>
>> I don't like the duplicate definition, and the extra for loop in map_sg will be compiled out when page_size == 12.
>
> Sort-of. Right now, my code prevents calling vmm->map_sg with more than
> "1" as len which reduces perf in the PAGE_SHIFT=12 case, which is why I
> did the duplication.
>
> However this is just an illustration. I'm fully aware that this is not
> good as-is, I'm just poking to see what you guys think is the best
> approach to *properly* fix it.
>
>> Oh fun fact:
>> on nv50 if PAGE_SIZE = 64K you should just use large pages on the nvidia card for everything. :D
>
> I don't think I care at this stage but heh ...
>
>> I have no idea if it works for bar, but you could test..
>> In subdev/bar/nv50.c kmap/umap change the 12 argument to 16, and change vm->pgt[0].obj[0] to vm->pgt[0].obj[1] and vm->pgt[0].refcount[0] to vm->pgt[0].refcount[1].
>>
>> for nvc0 that would require 128K pages, but I believe it should work too.
>
> I don't think we want to go there right now, there is little benefit for
> the vast majority of platforms (x86 and ARM). Let's stick to making
> ppc64 "work" and not bother too much with things that will never be used
> in practice :-)
>
> I'd rather make the code simpler by removing the "big" case from those
> functions if it's never going to be used.
Fully agreed!

Ben.

>
> Cheers,
> Ben.
>
>> >
>> >  void
>> >  nouveau_vm_unmap_at(struct nouveau_vma *vma, u64 delta, u64 length)
>> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
>> > index ed45437..f7e2311 100644
>> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
>> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
>> > @@ -39,14 +39,10 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>> >  {
>> >     pte = 0x00008 + (pte * 4);
>> >     while (cnt) {
>> > -           u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
>> >             u32 phys = (u32)*list++;
>> > -           while (cnt && page--) {
>> > -                   nv_wo32(pgt, pte, phys | 3);
>> > -                   phys += NV04_PDMA_PAGE;
>> > -                   pte += 4;
>> > -                   cnt -= 1;
>> > -           }
>> > +           nv_wo32(pgt, pte, phys | 3);
>> > +           pte += 4;
>> > +           cnt -= 1;
>> >     }
>> >  }
>> >
>> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
>> > index 064c762..a78f624 100644
>> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
>> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
>> > @@ -43,14 +43,10 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>> >  {
>> >     pte = pte * 4;
>> >     while (cnt) {
>> > -           u32 page = PAGE_SIZE / NV41_GART_PAGE;
>> >             u64 phys = (u64)*list++;
>> > -           while (cnt && page--) {
>> > -                   nv_wo32(pgt, pte, (phys >> 7) | 1);
>> > -                   phys += NV41_GART_PAGE;
>> > -                   pte += 4;
>> > -                   cnt -= 1;
>> > -           }
>> > +           nv_wo32(pgt, pte, (phys >> 7) | 1);
>> > +           pte += 4;
>> > +           cnt -= 1;
>> >     }
>> >  }
>> See above^, it's better if you could fixup nv50/nvc0.c instead.
>> >
>> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
>> > index af20fba..694024d 100644
>> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
>> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
>> > @@ -226,7 +226,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
>> >     nvbo->page_shift = 12;
>> >     if (drm->client.base.vm) {
>> >             if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
>> > -                   nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
>> > +                   nvbo->page_shift = lpg_shift;
>> >     }
>> >
>> >     nouveau_bo_fixup_align(nvbo, flags, &align, &size);
>> Hm.. I don't know if it will be an issue here. But I'm concerned about the cases where nouveau_vm can end up unaligned.
>> This will not be an issue for the bar mappings, because they map the entire bo, and bo's are always page aligned.
>> See nouveau_bo_fixup_align.
>>
>> But the channel vm mappings are no longer guaranteed to be page aligned. In fact it's very likely they are all misaligned due to some vm allocations
>> done when mapping a channel. This is only a problem on nv50+ though. Probably not an issue you're hitting.
>> > diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
>> > index ca5492a..494cf88 100644
>> > --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
>> > +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
>> > @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
>> >  {
>> >     struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
>> >     struct nouveau_mem *node = mem->mm_node;
>> > -   u64 size = mem->num_pages << 12;
>> > +   u64 size = mem->num_pages << PAGE_SHIFT;
>> >
>> >     if (ttm->sg) {
>> >             node->sg = ttm->sg;
>> > diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
>> > index 19e3757..f0629de 100644
>> > --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
>> > +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
>> > @@ -252,8 +252,8 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
>> >
>> >     node->page_shift = 12;
>> >
>> > -   ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
>> > -                        NV_MEM_ACCESS_RW, &node->vma[0]);
>> > +   ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
>> > +                        node->page_shift, NV_MEM_ACCESS_RW, &node->vma[0]);
>> >     if (ret) {
>> >             kfree(node);
>> >             return ret;
>> >
>> >
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-08-29  6:49                     ` Ben Skeggs
@ 2013-11-29  6:01                       ` Benjamin Herrenschmidt
  2013-12-11  3:19                         ` Ben Skeggs
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2013-11-29  6:01 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: nouveau@lists.freedesktop.org, Ben Skeggs,
	dri-devel@lists.freedesktop.org, Stéphane Marchesin

On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:

> > Additionally the current code is broken in that the upper layer in
> > vm/base.c doesn't increment "pte" by the right amount.
> >
> > Now, if those two assertions can be made always true:
> >
> >  - Those two functions (map_sg and map_sg_table) never deal with the
> > "big" case.
> >
> >  - An object is always mapped at a card address that is a multiple
> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde boundaries
> > when mapped)

> I think these two restrictions are reasonable to enforce, and we should do so.
> 
> >
> > Then we can probably simplify the code significantly *and* go back to
> > handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking them
> > up a level above like I do.

> Sounds good!

Ok so I experimented with that approach a bit. The code could probably
use further simplifications and cleanups but it seems to work. Note that
I haven't been able to test much more than the fbdev and the DDX without
3d accel since GLX is currently broken on nouveau big endian for other
reasons (no visuals) since the Gallium BE rework by Ajax (plus the
nv30/40 merge also introduced artifacts and instabilities on BE which I
haven't tracked down either).

The basic idea here is that the backend vmm->map_sg operates on system
PAGE_SIZE, which allows to continue passing a page array down as we do
today.

That means however that only the nv04 and nv41 backends handle that case
right now, the other ones will have to be fixed eventually but the fix
is rather easy.

Now it's possible that I've missed cases where card objects might be
allocated in vram that isn't system PAGE_SIZE aligned, in which case we
will have breakage due to having a single system PAGE crossing a PDE
boundary, you'll have to help me here figure that out though I haven't
hit any of my WARN_ON's so far :-)

Patch isn't S-O-B yet, first let me know what you think of the approach
(and maybe check I didn't break x86 :-)

diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
index ef3133e..44dc050 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
@@ -82,55 +82,77 @@ void
 nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
 			struct nouveau_mem *mem)
 {
+	/*
+	 * XXX Should the "12" in a couple of places here be replaced
+	 * with vmm->spg_shift for correctness ? Might help if we ever
+	 * support 64k card pages on 64k PAGE_SIZE systems
+	 */
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
-	int big = vma->node->type != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 shift = vma->node->type;
+	u32 order = PAGE_SHIFT - shift;
+	u32 num  = length >> PAGE_SHIFT;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
-	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
-	u32 max  = 1 << (vmm->pgt_bits - bits);
-	unsigned m, sglen;
-	u32 end, len;
+	u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
+	u32 max  = 1 << vmm->pgt_bits;
+	u32 end, len, cardlen;
 	int i;
 	struct scatterlist *sg;
 
-	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
-		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
-		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
+	/* We don't handle "big" pages here */
+	if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
+		return;
 
-		end = pte + sglen;
-		if (unlikely(end >= max))
-			end = max;
-		len = end - pte;
+	/* We dont' handle objects that aren't PAGE_SIZE aligned either */
+	if (WARN_ON((offset << 12) & ~PAGE_MASK))
+		return;
 
-		for (m = 0; m < len; m++) {
-			dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+	/* Iterate sglist elements */
+	for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
+		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
+		unsigned long m, sglen;
+		dma_addr_t addr;
 
-			vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
-			num--;
-			pte++;
+		/* Number of system pages and base address */
+		sglen = sg_dma_len(sg) >> PAGE_SHIFT;
+		addr = sg_dma_address(sg);
+
+		/* Iterate over system pages in the segment and
+		 * covered PDEs
+		 */
+		while(sglen) {
+			/* Number of card PTEs */
+			cardlen = sglen << order;
+			end = pte + cardlen;
+			if (unlikely(end > max))
+				end = max;
+			cardlen = end - pte;
 
-			if (num == 0)
-				goto finish;
-		}
-		if (unlikely(end >= max)) {
-			pde++;
-			pte = 0;
-		}
-		if (m < sglen) {
-			for (; m < sglen; m++) {
-				dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+			/* Convert back to system pages after cropping */
+			len = cardlen >> order;
+
+			/* Ensure this fits system pages */
+			if (WARN_ON((len << order) != cardlen))
+				break;
 
+			/* For each system page in the segment */
+			for (m = 0; m < len; m++) {
 				vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
+				addr += PAGE_SIZE;
 				num--;
-				pte++;
+				pte += (1u << order);
 				if (num == 0)
 					goto finish;
 			}
-		}
+			sglen -= len;
 
+			/* Wrap to next PDE ? */
+			if (unlikely(end >= max)) {
+				pde++;
+				pte = 0;
+			}
+		}
 	}
 finish:
 	vmm->flush(vm);
@@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
 	struct nouveau_vm *vm = vma->vm;
 	struct nouveau_vmmgr *vmm = vm->vmm;
 	dma_addr_t *list = mem->pages;
-	int big = vma->node->type != vmm->spg_shift;
 	u32 offset = vma->node->offset + (delta >> 12);
-	u32 bits = vma->node->type - 12;
-	u32 num  = length >> vma->node->type;
+	u32 shift = vma->node->type;
+	u32 order = PAGE_SHIFT - shift;
+	u32 num  = length >> PAGE_SHIFT;
 	u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
-	u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
-	u32 max  = 1 << (vmm->pgt_bits - bits);
-	u32 end, len;
+	u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
+	u32 max  = 1 << vmm->pgt_bits;
+	u32 end, len, cardlen;
+
+	/* We don't handle "big" pages here */
+	if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
+		return;
+
+	/* We dont' handle objects that aren't PAGE_SIZE aligned either */
+	if (WARN_ON((offset << 12) & ~PAGE_MASK))
+		return;
 
 	while (num) {
-		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
+		struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
 
-		end = (pte + num);
-		if (unlikely(end >= max))
+		/* "num" is remaining system pages, check how many fit
+		 * in the PDE
+		 */
+		end = (pte + (num << order));
+		if (unlikely(end > max))
 			end = max;
-		len = end - pte;
+		cardlen = end - pte;
+		len = cardlen >> order;
+
+		/* Ensure this fits system pages */
+		if (WARN_ON((len << order) != cardlen))
+			break;
 
 		vmm->map_sg(vma, pgt, mem, pte, len, list);
 
 		num  -= len;
-		pte  += len;
 		list += len;
+		pte  += cardlen;
 		if (unlikely(end >= max)) {
 			pde++;
 			pte = 0;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
index ed45437..c1e7c11 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
@@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
 	       struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
 {
 	pte = 0x00008 + (pte * 4);
-	while (cnt) {
+	while (cnt--) {
 		u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
 		u32 phys = (u32)*list++;
-		while (cnt && page--) {
+		while (page--) {
 			nv_wo32(pgt, pte, phys | 3);
 			phys += NV04_PDMA_PAGE;
 			pte += 4;
-			cnt -= 1;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
index 064c762..09570d7 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
@@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
 	       struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
 {
 	pte = pte * 4;
-	while (cnt) {
+	while (cnt--) {
 		u32 page = PAGE_SIZE / NV41_GART_PAGE;
 		u64 phys = (u64)*list++;
-		while (cnt && page--) {
+		while (page--) {
 			nv_wo32(pgt, pte, (phys >> 7) | 1);
 			phys += NV41_GART_PAGE;
 			pte += 4;
-			cnt -= 1;
 		}
 	}
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index c0fde6b..16dce89 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags,
 		*size = roundup(*size, (1 << nvbo->page_shift));
 		*align = max((1 <<  nvbo->page_shift), *align);
 	}
-
+	*align = roundup(*align, PAGE_SIZE);
 	*size = roundup(*size, PAGE_SIZE);
 }
 
@@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
 	nvbo->page_shift = 12;
 	if (drm->client.base.vm) {
 		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
-			nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
+			nvbo->page_shift = lpg_shift;
 	}
 
 	nouveau_bo_fixup_align(nvbo, flags, &align, &size);
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index 0843ebc..f255ff8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
 {
 	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
 	struct nouveau_mem *node = mem->mm_node;
-	u64 size = mem->num_pages << 12;
+	u64 size = mem->num_pages << PAGE_SHIFT;
 
 	if (ttm->sg) {
 		node->sg = ttm->sg;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 19e3757..b7fc456 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
 
 	node->page_shift = 12;
 
-	ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
-			     NV_MEM_ACCESS_RW, &node->vma[0]);
+	ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
+			     node->page_shift, NV_MEM_ACCESS_RW,
+			     &node->vma[0]);
 	if (ret) {
 		kfree(node);
 		return ret;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
  2013-11-29  6:01                       ` Benjamin Herrenschmidt
@ 2013-12-11  3:19                         ` Ben Skeggs
       [not found]                           ` <CACAvsv7L4qApEDGBLe7gCeivg0jCHE4sg4YUcX-z4vQya5fGyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Skeggs @ 2013-12-11  3:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: nouveau@lists.freedesktop.org, Ben Skeggs,
	dri-devel@lists.freedesktop.org, Stéphane Marchesin

On Fri, Nov 29, 2013 at 4:01 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:
>
>> > Additionally the current code is broken in that the upper layer in
>> > vm/base.c doesn't increment "pte" by the right amount.
>> >
>> > Now, if those two assertions can be made always true:
>> >
>> >  - Those two functions (map_sg and map_sg_table) never deal with the
>> > "big" case.
>> >
>> >  - An object is always mapped at a card address that is a multiple
>> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde boundaries
>> > when mapped)
>
>> I think these two restrictions are reasonable to enforce, and we should do so.
>>
>> >
>> > Then we can probably simplify the code significantly *and* go back to
>> > handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking them
>> > up a level above like I do.
>
>> Sounds good!
>
> Ok so I experimented with that approach a bit. The code could probably
> use further simplifications and cleanups but it seems to work. Note that
> I haven't been able to test much more than the fbdev and the DDX without
> 3d accel since GLX is currently broken on nouveau big endian for other
> reasons (no visuals) since the Gallium BE rework by Ajax (plus the
> nv30/40 merge also introduced artifacts and instabilities on BE which I
> haven't tracked down either).
>
> The basic idea here is that the backend vmm->map_sg operates on system
> PAGE_SIZE, which allows to continue passing a page array down as we do
> today.
>
> That means however that only the nv04 and nv41 backends handle that case
> right now, the other ones will have to be fixed eventually but the fix
> is rather easy.
>
> Now it's possible that I've missed cases where card objects might be
> allocated in vram that isn't system PAGE_SIZE aligned, in which case we
> will have breakage due to having a single system PAGE crossing a PDE
> boundary, you'll have to help me here figure that out though I haven't
> hit any of my WARN_ON's so far :-)
>
> Patch isn't S-O-B yet, first let me know what you think of the approach
> (and maybe check I didn't break x86 :-)
Hey Ben,

I definitely think the approach is the one that makes the most sense,
for sure.  The patch so far looks fine also, minor comments inline,
but nothing exciting to add really.

>
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> index ef3133e..44dc050 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> @@ -82,55 +82,77 @@ void
>  nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
>                         struct nouveau_mem *mem)
>  {
> +       /*
> +        * XXX Should the "12" in a couple of places here be replaced
> +        * with vmm->spg_shift for correctness ? Might help if we ever
> +        * support 64k card pages on 64k PAGE_SIZE systems
> +        */
The only "12"s that were left (yes, i missed some too obviously) were
intended to just be used to represent the smallest allocation unit for
the address space allocator, and not necessarily related to the
small/large page shift or the host page size.  However, it's probably
completely useless to have an allocation unit smaller than the small
page size anyway, so, go ahead.

I didn't review the map_sg_table hunks explicitly, assuming just
changes in similar spirit to map_sg.

>         struct nouveau_vm *vm = vma->vm;
>         struct nouveau_vmmgr *vmm = vm->vmm;
> -       int big = vma->node->type != vmm->spg_shift;
>         u32 offset = vma->node->offset + (delta >> 12);
> -       u32 bits = vma->node->type - 12;
> -       u32 num  = length >> vma->node->type;
> +       u32 shift = vma->node->type;
> +       u32 order = PAGE_SHIFT - shift;
> +       u32 num  = length >> PAGE_SHIFT;
>         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> -       u32 max  = 1 << (vmm->pgt_bits - bits);
> -       unsigned m, sglen;
> -       u32 end, len;
> +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> +       u32 max  = 1 << vmm->pgt_bits;
> +       u32 end, len, cardlen;
>         int i;
>         struct scatterlist *sg;
>
> -       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> -               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> +       /* We don't handle "big" pages here */
> +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> +               return;
>
> -               end = pte + sglen;
> -               if (unlikely(end >= max))
> -                       end = max;
> -               len = end - pte;
> +       /* We dont' handle objects that aren't PAGE_SIZE aligned either */
> +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> +               return;
>
> -               for (m = 0; m < len; m++) {
> -                       dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> +       /* Iterate sglist elements */
> +       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> +               unsigned long m, sglen;
> +               dma_addr_t addr;
>
> -                       vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> -                       num--;
> -                       pte++;
> +               /* Number of system pages and base address */
> +               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> +               addr = sg_dma_address(sg);
> +
> +               /* Iterate over system pages in the segment and
> +                * covered PDEs
> +                */
> +               while(sglen) {
> +                       /* Number of card PTEs */
> +                       cardlen = sglen << order;
> +                       end = pte + cardlen;
> +                       if (unlikely(end > max))
> +                               end = max;
> +                       cardlen = end - pte;
>
> -                       if (num == 0)
> -                               goto finish;
> -               }
> -               if (unlikely(end >= max)) {
> -                       pde++;
> -                       pte = 0;
> -               }
> -               if (m < sglen) {
> -                       for (; m < sglen; m++) {
> -                               dma_addr_t addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> +                       /* Convert back to system pages after cropping */
> +                       len = cardlen >> order;
> +
> +                       /* Ensure this fits system pages */
> +                       if (WARN_ON((len << order) != cardlen))
> +                               break;
>
> +                       /* For each system page in the segment */
> +                       for (m = 0; m < len; m++) {
>                                 vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> +                               addr += PAGE_SIZE;
>                                 num--;
> -                               pte++;
> +                               pte += (1u << order);
>                                 if (num == 0)
>                                         goto finish;
>                         }
> -               }
> +                       sglen -= len;
>
> +                       /* Wrap to next PDE ? */
> +                       if (unlikely(end >= max)) {
> +                               pde++;
> +                               pte = 0;
> +                       }
> +               }
>         }
>  finish:
>         vmm->flush(vm);
> @@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64 delta, u64 length,
>         struct nouveau_vm *vm = vma->vm;
>         struct nouveau_vmmgr *vmm = vm->vmm;
>         dma_addr_t *list = mem->pages;
> -       int big = vma->node->type != vmm->spg_shift;
>         u32 offset = vma->node->offset + (delta >> 12);
> -       u32 bits = vma->node->type - 12;
> -       u32 num  = length >> vma->node->type;
> +       u32 shift = vma->node->type;
> +       u32 order = PAGE_SHIFT - shift;
> +       u32 num  = length >> PAGE_SHIFT;
>         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> -       u32 max  = 1 << (vmm->pgt_bits - bits);
> -       u32 end, len;
> +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> +       u32 max  = 1 << vmm->pgt_bits;
> +       u32 end, len, cardlen;
> +
> +       /* We don't handle "big" pages here */
> +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> +               return;
> +
> +       /* We dont' handle objects that aren't PAGE_SIZE aligned either */
> +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> +               return;
>
>         while (num) {
> -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
>
> -               end = (pte + num);
> -               if (unlikely(end >= max))
> +               /* "num" is remaining system pages, check how many fit
> +                * in the PDE
> +                */
> +               end = (pte + (num << order));
> +               if (unlikely(end > max))
>                         end = max;
> -               len = end - pte;
> +               cardlen = end - pte;
> +               len = cardlen >> order;
> +
> +               /* Ensure this fits system pages */
> +               if (WARN_ON((len << order) != cardlen))
> +                       break;
>
>                 vmm->map_sg(vma, pgt, mem, pte, len, list);
>
>                 num  -= len;
> -               pte  += len;
>                 list += len;
> +               pte  += cardlen;
I think this all looks OK too.

>                 if (unlikely(end >= max)) {
>                         pde++;
>                         pte = 0;
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> index ed45437..c1e7c11 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> @@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
>  {
>         pte = 0x00008 + (pte * 4);
> -       while (cnt) {
> +       while (cnt--) {
>                 u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
>                 u32 phys = (u32)*list++;
> -               while (cnt && page--) {
> +               while (page--) {
>                         nv_wo32(pgt, pte, phys | 3);
>                         phys += NV04_PDMA_PAGE;
>                         pte += 4;
> -                       cnt -= 1;
>                 }
>         }
>  }
Ack

> diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> index 064c762..09570d7 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> @@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct nouveau_gpuobj *pgt,
>                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t *list)
>  {
>         pte = pte * 4;
> -       while (cnt) {
> +       while (cnt--) {
>                 u32 page = PAGE_SIZE / NV41_GART_PAGE;
>                 u64 phys = (u64)*list++;
> -               while (cnt && page--) {
> +               while (page--) {
>                         nv_wo32(pgt, pte, (phys >> 7) | 1);
>                         phys += NV41_GART_PAGE;
>                         pte += 4;
> -                       cnt -= 1;
>                 }
>         }
>  }
Ack

> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index c0fde6b..16dce89 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags,
>                 *size = roundup(*size, (1 << nvbo->page_shift));
>                 *align = max((1 <<  nvbo->page_shift), *align);
>         }
> -
> +       *align = roundup(*align, PAGE_SIZE);
>         *size = roundup(*size, PAGE_SIZE);
>  }
>
> @@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
>         nvbo->page_shift = 12;
>         if (drm->client.base.vm) {
>                 if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
> -                       nvbo->page_shift = drm->client.base.vm->vmm->lpg_shift;
> +                       nvbo->page_shift = lpg_shift;
>         }
Ack both hunks.

>
>         nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> index 0843ebc..f255ff8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg *mem)
>  {
>         struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
>         struct nouveau_mem *node = mem->mm_node;
> -       u64 size = mem->num_pages << 12;
> +       u64 size = mem->num_pages << PAGE_SHIFT;
Ack.  However, a patch doing this already exists in the Nouveau kernel
tree (http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=19ab15062b8fde70ff04784bd49abc330e92310d).


>
>         if (ttm->sg) {
>                 node->sg = ttm->sg;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> index 19e3757..b7fc456 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager *man,
>
>         node->page_shift = 12;
>
> -       ret = nouveau_vm_get(man->priv, mem->num_pages << 12, node->page_shift,
> -                            NV_MEM_ACCESS_RW, &node->vma[0]);
> +       ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
> +                            node->page_shift, NV_MEM_ACCESS_RW,
> +                            &node->vma[0]);
Ack.

>         if (ret) {
>                 kfree(node);
>                 return ret;
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Fixing nouveau for >4k PAGE_SIZE
       [not found]                           ` <CACAvsv7L4qApEDGBLe7gCeivg0jCHE4sg4YUcX-z4vQya5fGyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-12-11 14:44                             ` Patrick Baggett
  0 siblings, 0 replies; 11+ messages in thread
From: Patrick Baggett @ 2013-12-11 14:44 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: Benjamin Herrenschmidt, Stéphane Marchesin, Ben Skeggs,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org


[-- Attachment #1.1: Type: text/plain, Size: 15680 bytes --]

I'm looking forward to something that fixes page size bugs. I still have
the SPARC box that I reported but #58984 [1] on and I'm ready to test
changes. Is there an easy way to apply these patches to nouveau master and
try it out? SPARC has (default) 8KB page size, though, not 64KB.



[1] https://bugs.freedesktop.org/show_bug.cgi?id=58984


On Tue, Dec 10, 2013 at 9:19 PM, Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> On Fri, Nov 29, 2013 at 4:01 PM, Benjamin Herrenschmidt
> <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org> wrote:
> > On Thu, 2013-08-29 at 16:49 +1000, Ben Skeggs wrote:
> >
> >> > Additionally the current code is broken in that the upper layer in
> >> > vm/base.c doesn't increment "pte" by the right amount.
> >> >
> >> > Now, if those two assertions can be made always true:
> >> >
> >> >  - Those two functions (map_sg and map_sg_table) never deal with the
> >> > "big" case.
> >> >
> >> >  - An object is always mapped at a card address that is a multiple
> >> > of PAGE_SIZE (ie, invividual PAGE_SIZE pages don't cross pde
> boundaries
> >> > when mapped)
> >
> >> I think these two restrictions are reasonable to enforce, and we should
> do so.
> >>
> >> >
> >> > Then we can probably simplify the code significantly *and* go back to
> >> > handling PAGE_SIZE pages in the vmm->map_sg() instead of breaking them
> >> > up a level above like I do.
> >
> >> Sounds good!
> >
> > Ok so I experimented with that approach a bit. The code could probably
> > use further simplifications and cleanups but it seems to work. Note that
> > I haven't been able to test much more than the fbdev and the DDX without
> > 3d accel since GLX is currently broken on nouveau big endian for other
> > reasons (no visuals) since the Gallium BE rework by Ajax (plus the
> > nv30/40 merge also introduced artifacts and instabilities on BE which I
> > haven't tracked down either).
> >
> > The basic idea here is that the backend vmm->map_sg operates on system
> > PAGE_SIZE, which allows to continue passing a page array down as we do
> > today.
> >
> > That means however that only the nv04 and nv41 backends handle that case
> > right now, the other ones will have to be fixed eventually but the fix
> > is rather easy.
> >
> > Now it's possible that I've missed cases where card objects might be
> > allocated in vram that isn't system PAGE_SIZE aligned, in which case we
> > will have breakage due to having a single system PAGE crossing a PDE
> > boundary, you'll have to help me here figure that out though I haven't
> > hit any of my WARN_ON's so far :-)
> >
> > Patch isn't S-O-B yet, first let me know what you think of the approach
> > (and maybe check I didn't break x86 :-)
> Hey Ben,
>
> I definitely think the approach is the one that makes the most sense,
> for sure.  The patch so far looks fine also, minor comments inline,
> but nothing exciting to add really.
>
> >
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > index ef3133e..44dc050 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/base.c
> > @@ -82,55 +82,77 @@ void
> >  nouveau_vm_map_sg_table(struct nouveau_vma *vma, u64 delta, u64 length,
> >                         struct nouveau_mem *mem)
> >  {
> > +       /*
> > +        * XXX Should the "12" in a couple of places here be replaced
> > +        * with vmm->spg_shift for correctness ? Might help if we ever
> > +        * support 64k card pages on 64k PAGE_SIZE systems
> > +        */
> The only "12"s that were left (yes, i missed some too obviously) were
> intended to just be used to represent the smallest allocation unit for
> the address space allocator, and not necessarily related to the
> small/large page shift or the host page size.  However, it's probably
> completely useless to have an allocation unit smaller than the small
> page size anyway, so, go ahead.
>
> I didn't review the map_sg_table hunks explicitly, assuming just
> changes in similar spirit to map_sg.
>
> >         struct nouveau_vm *vm = vma->vm;
> >         struct nouveau_vmmgr *vmm = vm->vmm;
> > -       int big = vma->node->type != vmm->spg_shift;
> >         u32 offset = vma->node->offset + (delta >> 12);
> > -       u32 bits = vma->node->type - 12;
> > -       u32 num  = length >> vma->node->type;
> > +       u32 shift = vma->node->type;
> > +       u32 order = PAGE_SHIFT - shift;
> > +       u32 num  = length >> PAGE_SHIFT;
> >         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> > -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> > -       u32 max  = 1 << (vmm->pgt_bits - bits);
> > -       unsigned m, sglen;
> > -       u32 end, len;
> > +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> > +       u32 max  = 1 << vmm->pgt_bits;
> > +       u32 end, len, cardlen;
> >         int i;
> >         struct scatterlist *sg;
> >
> > -       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> > -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > -               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> > +       /* We don't handle "big" pages here */
> > +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> > +               return;
> >
> > -               end = pte + sglen;
> > -               if (unlikely(end >= max))
> > -                       end = max;
> > -               len = end - pte;
> > +       /* We dont' handle objects that aren't PAGE_SIZE aligned either
> */
> > +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> > +               return;
> >
> > -               for (m = 0; m < len; m++) {
> > -                       dma_addr_t addr = sg_dma_address(sg) + (m <<
> PAGE_SHIFT);
> > +       /* Iterate sglist elements */
> > +       for_each_sg(mem->sg->sgl, sg, mem->sg->nents, i) {
> > +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> > +               unsigned long m, sglen;
> > +               dma_addr_t addr;
> >
> > -                       vmm->map_sg(vma, pgt, mem, pte, 1, &addr);
> > -                       num--;
> > -                       pte++;
> > +               /* Number of system pages and base address */
> > +               sglen = sg_dma_len(sg) >> PAGE_SHIFT;
> > +               addr = sg_dma_address(sg);
> > +
> > +               /* Iterate over system pages in the segment and
> > +                * covered PDEs
> > +                */
> > +               while(sglen) {
> > +                       /* Number of card PTEs */
> > +                       cardlen = sglen << order;
> > +                       end = pte + cardlen;
> > +                       if (unlikely(end > max))
> > +                               end = max;
> > +                       cardlen = end - pte;
> >
> > -                       if (num == 0)
> > -                               goto finish;
> > -               }
> > -               if (unlikely(end >= max)) {
> > -                       pde++;
> > -                       pte = 0;
> > -               }
> > -               if (m < sglen) {
> > -                       for (; m < sglen; m++) {
> > -                               dma_addr_t addr = sg_dma_address(sg) +
> (m << PAGE_SHIFT);
> > +                       /* Convert back to system pages after cropping */
> > +                       len = cardlen >> order;
> > +
> > +                       /* Ensure this fits system pages */
> > +                       if (WARN_ON((len << order) != cardlen))
> > +                               break;
> >
> > +                       /* For each system page in the segment */
> > +                       for (m = 0; m < len; m++) {
> >                                 vmm->map_sg(vma, pgt, mem, pte, 1,
> &addr);
> > +                               addr += PAGE_SIZE;
> >                                 num--;
> > -                               pte++;
> > +                               pte += (1u << order);
> >                                 if (num == 0)
> >                                         goto finish;
> >                         }
> > -               }
> > +                       sglen -= len;
> >
> > +                       /* Wrap to next PDE ? */
> > +                       if (unlikely(end >= max)) {
> > +                               pde++;
> > +                               pte = 0;
> > +                       }
> > +               }
> >         }
> >  finish:
> >         vmm->flush(vm);
> > @@ -143,28 +165,44 @@ nouveau_vm_map_sg(struct nouveau_vma *vma, u64
> delta, u64 length,
> >         struct nouveau_vm *vm = vma->vm;
> >         struct nouveau_vmmgr *vmm = vm->vmm;
> >         dma_addr_t *list = mem->pages;
> > -       int big = vma->node->type != vmm->spg_shift;
> >         u32 offset = vma->node->offset + (delta >> 12);
> > -       u32 bits = vma->node->type - 12;
> > -       u32 num  = length >> vma->node->type;
> > +       u32 shift = vma->node->type;
> > +       u32 order = PAGE_SHIFT - shift;
> > +       u32 num  = length >> PAGE_SHIFT;
> >         u32 pde  = (offset >> vmm->pgt_bits) - vm->fpde;
> > -       u32 pte  = (offset & ((1 << vmm->pgt_bits) - 1)) >> bits;
> > -       u32 max  = 1 << (vmm->pgt_bits - bits);
> > -       u32 end, len;
> > +       u32 pte  = offset & ((1 << vmm->pgt_bits) - 1);
> > +       u32 max  = 1 << vmm->pgt_bits;
> > +       u32 end, len, cardlen;
> > +
> > +       /* We don't handle "big" pages here */
> > +       if (WARN_ON(shift != vmm->spg_shift || shift > PAGE_SHIFT))
> > +               return;
> > +
> > +       /* We dont' handle objects that aren't PAGE_SIZE aligned either
> */
> > +       if (WARN_ON((offset << 12) & ~PAGE_MASK))
> > +               return;
> >
> >         while (num) {
> > -               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[big];
> > +               struct nouveau_gpuobj *pgt = vm->pgt[pde].obj[0];
> >
> > -               end = (pte + num);
> > -               if (unlikely(end >= max))
> > +               /* "num" is remaining system pages, check how many fit
> > +                * in the PDE
> > +                */
> > +               end = (pte + (num << order));
> > +               if (unlikely(end > max))
> >                         end = max;
> > -               len = end - pte;
> > +               cardlen = end - pte;
> > +               len = cardlen >> order;
> > +
> > +               /* Ensure this fits system pages */
> > +               if (WARN_ON((len << order) != cardlen))
> > +                       break;
> >
> >                 vmm->map_sg(vma, pgt, mem, pte, len, list);
> >
> >                 num  -= len;
> > -               pte  += len;
> >                 list += len;
> > +               pte  += cardlen;
> I think this all looks OK too.
>
> >                 if (unlikely(end >= max)) {
> >                         pde++;
> >                         pte = 0;
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > index ed45437..c1e7c11 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv04.c
> > @@ -38,14 +38,13 @@ nv04_vm_map_sg(struct nouveau_vma *vma, struct
> nouveau_gpuobj *pgt,
> >                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t
> *list)
> >  {
> >         pte = 0x00008 + (pte * 4);
> > -       while (cnt) {
> > +       while (cnt--) {
> >                 u32 page = PAGE_SIZE / NV04_PDMA_PAGE;
> >                 u32 phys = (u32)*list++;
> > -               while (cnt && page--) {
> > +               while (page--) {
> >                         nv_wo32(pgt, pte, phys | 3);
> >                         phys += NV04_PDMA_PAGE;
> >                         pte += 4;
> > -                       cnt -= 1;
> >                 }
> >         }
> >  }
> Ack
>
> > diff --git a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > index 064c762..09570d7 100644
> > --- a/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > +++ b/drivers/gpu/drm/nouveau/core/subdev/vm/nv41.c
> > @@ -42,14 +42,13 @@ nv41_vm_map_sg(struct nouveau_vma *vma, struct
> nouveau_gpuobj *pgt,
> >                struct nouveau_mem *mem, u32 pte, u32 cnt, dma_addr_t
> *list)
> >  {
> >         pte = pte * 4;
> > -       while (cnt) {
> > +       while (cnt--) {
> >                 u32 page = PAGE_SIZE / NV41_GART_PAGE;
> >                 u64 phys = (u64)*list++;
> > -               while (cnt && page--) {
> > +               while (page--) {
> >                         nv_wo32(pgt, pte, (phys >> 7) | 1);
> >                         phys += NV41_GART_PAGE;
> >                         pte += 4;
> > -                       cnt -= 1;
> >                 }
> >         }
> >  }
> Ack
>
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index c0fde6b..16dce89 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -178,7 +178,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32
> flags,
> >                 *size = roundup(*size, (1 << nvbo->page_shift));
> >                 *align = max((1 <<  nvbo->page_shift), *align);
> >         }
> > -
> > +       *align = roundup(*align, PAGE_SIZE);
> >         *size = roundup(*size, PAGE_SIZE);
> >  }
> >
> > @@ -221,7 +221,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int
> align,
> >         nvbo->page_shift = 12;
> >         if (drm->client.base.vm) {
> >                 if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
> > -                       nvbo->page_shift =
> drm->client.base.vm->vmm->lpg_shift;
> > +                       nvbo->page_shift = lpg_shift;
> >         }
> Ack both hunks.
>
> >
> >         nouveau_bo_fixup_align(nvbo, flags, &align, &size);
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > index 0843ebc..f255ff8 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
> > @@ -31,7 +31,7 @@ nv04_sgdma_bind(struct ttm_tt *ttm, struct ttm_mem_reg
> *mem)
> >  {
> >         struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)ttm;
> >         struct nouveau_mem *node = mem->mm_node;
> > -       u64 size = mem->num_pages << 12;
> > +       u64 size = mem->num_pages << PAGE_SHIFT;
> Ack.  However, a patch doing this already exists in the Nouveau kernel
> tree (
> http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=19ab15062b8fde70ff04784bd49abc330e92310d
> ).
>
>
> >
> >         if (ttm->sg) {
> >                 node->sg = ttm->sg;
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > index 19e3757..b7fc456 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> > @@ -252,8 +252,9 @@ nv04_gart_manager_new(struct ttm_mem_type_manager
> *man,
> >
> >         node->page_shift = 12;
> >
> > -       ret = nouveau_vm_get(man->priv, mem->num_pages << 12,
> node->page_shift,
> > -                            NV_MEM_ACCESS_RW, &node->vma[0]);
> > +       ret = nouveau_vm_get(man->priv, mem->num_pages << PAGE_SHIFT,
> > +                            node->page_shift, NV_MEM_ACCESS_RW,
> > +                            &node->vma[0]);
> Ack.
>
> >         if (ret) {
> >                 kfree(node);
> >                 return ret;
> >
> >
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
>

[-- Attachment #1.2: Type: text/html, Size: 19300 bytes --]

[-- Attachment #2: Type: text/plain, Size: 181 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-12-11 14:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1372740099.4820.24.camel@pasglop>
     [not found] ` <CAPM=9txvRW5MM849FVqOoSL28g=+hTDiLWaMtn5Kj-X3h1kJ8g@mail.gmail.com>
     [not found]   ` <1376175111.32100.53.camel@pasglop>
     [not found]     ` <1376179046.32100.60.camel@pasglop>
2013-08-11  0:41       ` Fixing nouveau for >4k PAGE_SIZE Benjamin Herrenschmidt
2013-08-11  5:36         ` Benjamin Herrenschmidt
2013-08-11  6:17           ` Maarten Lankhorst
2013-08-11  7:06             ` Benjamin Herrenschmidt
2013-08-11  8:04               ` Benjamin Herrenschmidt
2013-08-11  9:02                 ` Maarten Lankhorst
2013-08-11  9:35                   ` Benjamin Herrenschmidt
2013-08-29  6:49                     ` Ben Skeggs
2013-11-29  6:01                       ` Benjamin Herrenschmidt
2013-12-11  3:19                         ` Ben Skeggs
     [not found]                           ` <CACAvsv7L4qApEDGBLe7gCeivg0jCHE4sg4YUcX-z4vQya5fGyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-11 14:44                             ` Patrick Baggett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.