still nfs problems [Was: Linux 2.6.37-rc8]

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* still nfs problems [Was: Linux 2.6.37-rc8]
@ 2011-01-05 19:05 James Bottomley
  2011-01-05 19:18 ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-05 19:05 UTC (permalink / raw)
  To: linux-arm-kernel

[sorry for the unthreaded insertion.  We're seeing this on parisc too]

> On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
> > OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> > pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> > interfaces, but we can end up reading them via a virtual address range
> > that gets set up via vm_map_ram() (that range gets set up before the
> > write occurs).
> 
> kmap of lowmem pages will always reuses the existing kernel direct
> mapping, so there won't be a problem there.
> 
> > Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> > the data on ARM in this kind of scenario?
> 
> Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
> it take care of page colouring - so any architecture where cache aliasing
> can occur will see this problem.  It should not limited to ARM.
> 
> Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
> There's two problems here:
> 
> 	addr = kmap(lowmem_page);
> 	*addr = stuff;
> 	kunmap(lowmem_page);
> 
> Such lowmem pages are accessed through their kernel direct mapping.
> 
> 	ptr = vm_map_ram(lowmem_page);
> 	read = *ptr;
> 
> This creates a new mapping which can alias with the kernel direct mapping.
> Now, as this is a new mapping, there should be no cache lines associated
> with it.  (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
> free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
> region.  vb_free() also calls flush_cache_vunmap() too.)
> 
> If the write after kmap() hits an already present cache line, the cache
> line will be updated, but it won't be written back to memory.  So, on
> a subsequent vm_map_ram(), with any kind of aliasing cache, there's
> no guarantee that you'll hit that cache line and read the data just
> written there.
> 
> The kernel direct mapping would need to be flushed.
> 
> I'm really getting to the point of hating the poliferation of RAM
> remapping interfaces - it's going to (and is) causing nothing but lots
> of pain on virtual cache architectures, needing more and more cache
> flushing interfaces to be created.
> 
> Is there any other solution to this?

I think the solution for the kernel direct mapping problem is to take
the expected flushes and invalidates into kmap/kunmap[_atomic].  I think
the original reason for not doing this was efficiency:  the user should
know what they did with the data (i.e. if they're only reading it, it
doesn't need to be flushed on unmap).  However, the difficulty of
getting all this right seems to outweigh the efficiency of only using
the necessary flushing.  At least on some architectures, we can look at
the TLB flags to see if the page was dirtied (and only flush if it was).

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 19:05 still nfs problems [Was: Linux 2.6.37-rc8] James Bottomley
@ 2011-01-05 19:18 ` Linus Torvalds
  2011-01-05 19:36   ` James Bottomley
  0 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 5, 2011 at 11:05 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> I think the solution for the kernel direct mapping problem is to take
> the expected flushes and invalidates into kmap/kunmap[_atomic].

No, we really can't do that. Most of the time, the kmap() is the only
way we access the page anyway, so flushing things would just be
stupid. Why waste time and energy on doing something pointless?

In fact, kmap() here is a total non-issue. It's not the kmap() that
introduces any virtual aliases, and never has been. It's the
"vm_map_ram()" that is the problem. Unlike the kmap(), that really
_does_ introduce a virtual alias, and is a problem for any virtual
cache.

So don't blame kmap(). It's innocent and irrelevant - the bug could
happen entirely without it (think a 64-bit address space that doesn't
even _have_ kmap, but has software that mixes vm_map_ram() with
non-mapped accesses).

                           Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 19:18 ` Linus Torvalds
@ 2011-01-05 19:36   ` James Bottomley
  2011-01-05 19:49     ` Linus Torvalds
  2011-01-05 20:00     ` Russell King - ARM Linux
  0 siblings, 2 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-05 19:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 11:18 -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 11:05 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> > I think the solution for the kernel direct mapping problem is to take
> > the expected flushes and invalidates into kmap/kunmap[_atomic].
> 
> No, we really can't do that. Most of the time, the kmap() is the only
> way we access the page anyway, so flushing things would just be
> stupid. Why waste time and energy on doing something pointless?

It's hardly pointless.  The kmap sets up an inequivalent alias in the
cache.  When you write to the kmap region, you dirty the CPU caches for
that alias.  If you tear down the mapping without flushing, the CPU will
write out the cache lines at its leisure.  If you access the line via
the other mapping *before* the CPU does writeout, you see stale data.

When the kernel dirties a kmap region, it always has to flush somehow
before kunmap.  One of the problems here is that that flush isn't in the
NFS code.

> In fact, kmap() here is a total non-issue. It's not the kmap() that
> introduces any virtual aliases, and never has been. It's the
> "vm_map_ram()" that is the problem. Unlike the kmap(), that really
> _does_ introduce a virtual alias, and is a problem for any virtual
> cache.
> 
> So don't blame kmap(). It's innocent and irrelevant - the bug could
> happen entirely without it (think a 64-bit address space that doesn't
> even _have_ kmap, but has software that mixes vm_map_ram() with
> non-mapped accesses).

I didn't say it was kmap's entire problem ... I just said, can't we
simplify some of this by consolidating the flushing into the interfaces.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 19:36   ` James Bottomley
@ 2011-01-05 19:49     ` Linus Torvalds
  2011-01-05 20:35       ` James Bottomley
  2011-01-05 20:00     ` Russell King - ARM Linux
  1 sibling, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 19:49 UTC (permalink / raw)
  To: linux-arm-kernel

2011/1/5 James Bottomley <James.Bottomley@hansenpartnership.com>:
>>
>> No, we really can't do that. Most of the time, the kmap() is the only
>> way we access the page anyway, so flushing things would just be
>> stupid. Why waste time and energy on doing something pointless?
>
> It's hardly pointless. ?The kmap sets up an inequivalent alias in the
> cache.

NO IT DOES NOT.

Stop arguing, when you are so wrong.

kmap() does not create any aliases. For low-memory, it just returns
the physical address. No alias. And for high memory, there is no
equivalent low memory address to alias _with_.

Now, when you actually mix multiple kmap's and you have a virtually
based cache, then the kmap's obviously need to flush that particular
page when switching between each other. But that has nothing to do
with the actual page being kmap'ed, it's entirely an internal issue
about the particular virtual memory area being re-used. And ARM (and
any other virtually based CPU) already does that in __kunmap_atomic().

But notice the case of the low-mem. And understand that you are WRONG
about the "inequivalent alias" thing.

So I repeat: this has absolutely *NOTHING* to do with kmap(). Stop blathering.

It's _purely_ an issue of vm_map_ram(). Nothing else.

                          Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 19:49     ` Linus Torvalds
@ 2011-01-05 20:35       ` James Bottomley
  0 siblings, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-05 20:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 11:49 -0800, Linus Torvalds wrote:
> 2011/1/5 James Bottomley <James.Bottomley@hansenpartnership.com>:
> >>
> >> No, we really can't do that. Most of the time, the kmap() is the only
> >> way we access the page anyway, so flushing things would just be
> >> stupid. Why waste time and energy on doing something pointless?
> >
> > It's hardly pointless.  The kmap sets up an inequivalent alias in the
> > cache.
> 
> NO IT DOES NOT.

Well, it does ... but not in this case because the page is freshly
allocated (which I missed before) so it has no use cache colour yet.

James

> Stop arguing, when you are so wrong.
> 
> kmap() does not create any aliases. For low-memory, it just returns
> the physical address. No alias. And for high memory, there is no
> equivalent low memory address to alias _with_.
> 
> Now, when you actually mix multiple kmap's and you have a virtually
> based cache, then the kmap's obviously need to flush that particular
> page when switching between each other. But that has nothing to do
> with the actual page being kmap'ed, it's entirely an internal issue
> about the particular virtual memory area being re-used. And ARM (and
> any other virtually based CPU) already does that in __kunmap_atomic().
> 
> But notice the case of the low-mem. And understand that you are WRONG
> about the "inequivalent alias" thing.
> 
> So I repeat: this has absolutely *NOTHING* to do with kmap(). Stop blathering.
> 
> It's _purely_ an issue of vm_map_ram(). Nothing else.
> 
>                           Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 19:36   ` James Bottomley
  2011-01-05 19:49     ` Linus Torvalds
@ 2011-01-05 20:00     ` Russell King - ARM Linux
  2011-01-05 20:33       ` James Bottomley
  1 sibling, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 20:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 01:36:09PM -0600, James Bottomley wrote:
> On Wed, 2011-01-05 at 11:18 -0800, Linus Torvalds wrote:
> > On Wed, Jan 5, 2011 at 11:05 AM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > >
> > > I think the solution for the kernel direct mapping problem is to take
> > > the expected flushes and invalidates into kmap/kunmap[_atomic].
> > 
> > No, we really can't do that. Most of the time, the kmap() is the only
> > way we access the page anyway, so flushing things would just be
> > stupid. Why waste time and energy on doing something pointless?
> 
> It's hardly pointless.  The kmap sets up an inequivalent alias in the
> cache.

No it doesn't.  For pages which are inaccessible, it sets up a mapping
for those pages.  On aliasing cache architectures, when you tear down
such a mapping, you have to flush the cache before you do so (otherwise
you can end up with cache lines existing in the cache for inaccessible
mappings.)

For lowmem pages, kmap() (should always) bypass the 'setup mapping' stage
because all lowmem pages are already accessible.  So kunmap() doesn't
do anything - just like the !HIGHMEM implementation for these macros.

So, for highmem-enabled systems:

	low_addr = kmap_atomic(lowmem_page);
	high_addr = kmap_atomic(highmem_page);

results in low_addr in the kernel direct-mapped region, and high_addr
in the kmap_atomic region.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 20:00     ` Russell King - ARM Linux
@ 2011-01-05 20:33       ` James Bottomley
  2011-01-05 20:48         ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-05 20:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 20:00 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 01:36:09PM -0600, James Bottomley wrote:
> > On Wed, 2011-01-05 at 11:18 -0800, Linus Torvalds wrote:
> > > On Wed, Jan 5, 2011 at 11:05 AM, James Bottomley
> > > <James.Bottomley@hansenpartnership.com> wrote:
> > > >
> > > > I think the solution for the kernel direct mapping problem is to take
> > > > the expected flushes and invalidates into kmap/kunmap[_atomic].
> > > 
> > > No, we really can't do that. Most of the time, the kmap() is the only
> > > way we access the page anyway, so flushing things would just be
> > > stupid. Why waste time and energy on doing something pointless?
> > 
> > It's hardly pointless.  The kmap sets up an inequivalent alias in the
> > cache.
> 
> No it doesn't.  For pages which are inaccessible, it sets up a mapping
> for those pages.  On aliasing cache architectures, when you tear down
> such a mapping, you have to flush the cache before you do so (otherwise
> you can end up with cache lines existing in the cache for inaccessible
> mappings.)
> 
> For lowmem pages, kmap() (should always) bypass the 'setup mapping' stage
> because all lowmem pages are already accessible.  So kunmap() doesn't
> do anything - just like the !HIGHMEM implementation for these macros.

well, that depends.  For us on parisc, kmap of a user page in !HIGHMEM
sets up an inequivalent aliase still ... because the cache colour of the
user and kernel virtual addresses are different.  Depending on the
return path to userspace, we still usually have to flush to get the user
to see the changes the kernel has made.

James

> So, for highmem-enabled systems:
> 
> 	low_addr = kmap_atomic(lowmem_page);
> 	high_addr = kmap_atomic(highmem_page);
> 
> results in low_addr in the kernel direct-mapped region, and high_addr
> in the kmap_atomic region.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 20:33       ` James Bottomley
@ 2011-01-05 20:48         ` Linus Torvalds
  2011-01-05 21:04           ` Russell King - ARM Linux
  2011-01-05 21:16           ` James Bottomley
  0 siblings, 2 replies; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 5, 2011 at 12:33 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> well, that depends. ?For us on parisc, kmap of a user page in !HIGHMEM
> sets up an inequivalent aliase still ... because the cache colour of the
> user and kernel virtual addresses are different. ?Depending on the
> return path to userspace, we still usually have to flush to get the user
> to see the changes the kernel has made.

Umm. Again, that has nothing to do with kmap().

This time it's about the user space mapping.

Repeat after me: even without the kmap(), the kernel access to that
mapping would have caused cache aliases.

See? Once more, the kmap() is entirely innocent. You can have a
non-highmem mapping that you never use kmap for, and that you map into
user space, and you'd see exactly the same aliases. Notice? Look ma,
no kmap().

So clearly kmap() is not the issue. The issue continues to be a
totally separate virtual mapping. Whether it's a user mapping or
vm_map_ram() is obviously immaterial - as far as the CPU is concerned,
there is no difference between the two (apart from the trivial
differences of virtual location and permissions).

(You can also force the problem with vmalloc() an then following the
kernel page tables, but I hope nobody does that any more. I suspect
I'm wrong, though, there's probably code that mixes vmalloc and
physical page accesses in various drivers)

                    Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 20:48         ` Linus Torvalds
@ 2011-01-05 21:04           ` Russell King - ARM Linux
  2011-01-05 21:08             ` Linus Torvalds
  2011-01-05 21:16           ` James Bottomley
  1 sibling, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 21:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 12:48:32PM -0800, Linus Torvalds wrote:
> (You can also force the problem with vmalloc() an then following the
> kernel page tables, but I hope nobody does that any more. I suspect
> I'm wrong, though, there's probably code that mixes vmalloc and
> physical page accesses in various drivers)

Should vmalloc_to_page() (84 users)/vmalloc_to_pfn() (17 users) be
deprecated then? ;)

However, we seem have new ways of doing this - rather than asking
vmalloc() for some memory, and then getting at the pages by following
the page tables, we now have ways to create mappings using arrays of
struct pages and access them via their already known mappings:

- vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t prot)
- map_kernel_range_noflush(unsigned long addr, unsigned long size, pgprot_t prot, struct page **pages)
- map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
- vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot)

So really it's the same problem, just created by some other easier
to use methods.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 21:04           ` Russell King - ARM Linux
@ 2011-01-05 21:08             ` Linus Torvalds
  2011-01-05 21:16               ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 21:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 5, 2011 at 1:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Jan 05, 2011 at 12:48:32PM -0800, Linus Torvalds wrote:
>> (You can also force the problem with vmalloc() an then following the
>> kernel page tables, but I hope nobody does that any more. I suspect
>> I'm wrong, though, there's probably code that mixes vmalloc and
>> physical page accesses in various drivers)
>
> Should vmalloc_to_page() (84 users)/vmalloc_to_pfn() (17 users) be
> deprecated then? ;)

I do think that the "modern" way of doing it is
"vmap()"/"vm_map_ram()" and friends, and it should be preferred over
using vmalloc() and then looking up the pages.

But in the end, the two approaches really are equivalent, so it's not
like it really matters. So I don't think we need to deprecate things
officially, but obviously we should make people more aware of the
whole virtual alias thing that crops up whenever you use any of these
approaches.

                           Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 21:08             ` Linus Torvalds
@ 2011-01-05 21:16               ` Trond Myklebust
  2011-01-05 21:30                 ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 21:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 13:08 -0800, Linus Torvalds wrote: 
> On Wed, Jan 5, 2011 at 1:04 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Wed, Jan 05, 2011 at 12:48:32PM -0800, Linus Torvalds wrote:
> >> (You can also force the problem with vmalloc() an then following the
> >> kernel page tables, but I hope nobody does that any more. I suspect
> >> I'm wrong, though, there's probably code that mixes vmalloc and
> >> physical page accesses in various drivers)
> >
> > Should vmalloc_to_page() (84 users)/vmalloc_to_pfn() (17 users) be
> > deprecated then? ;)
> 
> I do think that the "modern" way of doing it is
> "vmap()"/"vm_map_ram()" and friends, and it should be preferred over
> using vmalloc() and then looking up the pages.
> 
> But in the end, the two approaches really are equivalent, so it's not
> like it really matters. So I don't think we need to deprecate things
> officially, but obviously we should make people more aware of the
> whole virtual alias thing that crops up whenever you use any of these
> approaches.

So what should be the preferred way to ensure data gets flushed when
you've written directly to a page, and then want to read through the
vm_map_ram() virtual range? Should we be adding new semantics to
flush_kernel_dcache_page()?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 21:16               ` Trond Myklebust
@ 2011-01-05 21:30                 ` Linus Torvalds
  2011-01-05 23:06                   ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 21:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 5, 2011 at 1:16 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
>
> So what should be the preferred way to ensure data gets flushed when
> you've written directly to a page, and then want to read through the
> vm_map_ram() virtual range? Should we be adding new semantics to
> flush_kernel_dcache_page()?

The "preferred way" is actually simple: "don't do that". IOW, if some
page is accessed through a virtual mapping you've set up, then
_always_ access it through that virtual mapping.

Now, when that is impossible (and yes, it sometimes is), then you
should flush after doing all writes. And if you do the write through
the regular kernel mapping, you should use flush_dcache_page(). And if
you did it through the virtual mapping, you should use
"flush_kernel_vmap_range()" or whatever.

NOTE! I really didn't look those up very closely, and if the accesses
can happen concurrently you are basically screwed, so you do need to
do locking or something else to guarantee that there is some nice
sequential order.  And maybe I forgot something.  Which is why I do
suggest "don't do that" as a primary approach to the problem if at all
possible.

Oh, and you may need to flush before reading too (and many writes do
end up being "read-modify-write" cycles) in case it's possible that
you have stale data from a previous read that was then invalidated by
a write to the aliasing address. Even if that write was flushed out,
the stale read data may exist at the virtual address. I forget what
all we required - in the end the only sane model is "virtual caches
suck so bad that anybody who does them should be laughed at for being
a retard".

                            Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 21:30                 ` Linus Torvalds
@ 2011-01-05 23:06                   ` Trond Myklebust
  2011-01-05 23:28                     ` James Bottomley
  2011-01-05 23:28                     ` Linus Torvalds
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 23:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 13:30 -0800, Linus Torvalds wrote: 
> On Wed, Jan 5, 2011 at 1:16 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> >
> > So what should be the preferred way to ensure data gets flushed when
> > you've written directly to a page, and then want to read through the
> > vm_map_ram() virtual range? Should we be adding new semantics to
> > flush_kernel_dcache_page()?
> 
> The "preferred way" is actually simple: "don't do that". IOW, if some
> page is accessed through a virtual mapping you've set up, then
> _always_ access it through that virtual mapping.
> 
> Now, when that is impossible (and yes, it sometimes is), then you
> should flush after doing all writes. And if you do the write through
> the regular kernel mapping, you should use flush_dcache_page(). And if
> you did it through the virtual mapping, you should use
> "flush_kernel_vmap_range()" or whatever.
> 
> NOTE! I really didn't look those up very closely, and if the accesses
> can happen concurrently you are basically screwed, so you do need to
> do locking or something else to guarantee that there is some nice
> sequential order.  And maybe I forgot something.  Which is why I do
> suggest "don't do that" as a primary approach to the problem if at all
> possible.
> 
> Oh, and you may need to flush before reading too (and many writes do
> end up being "read-modify-write" cycles) in case it's possible that
> you have stale data from a previous read that was then invalidated by
> a write to the aliasing address. Even if that write was flushed out,
> the stale read data may exist at the virtual address. I forget what
> all we required - in the end the only sane model is "virtual caches
> suck so bad that anybody who does them should be laughed at for being
> a retard".

Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
which takes care of invalidating the cache prior to a virtual address
read.

My question was specifically about the write through the regular kernel
mapping: according to Russell and my reading of the cachetlb.txt
documentation, flush_dcache_page() is only guaranteed to have an effect
on page cache pages.
flush_kernel_dcache_page() (not to be confused with flush_dcache_page)
would appear to be the closest fit according to my reading of the
documentation, however the ARM implementation appears to be a no-op...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 23:06                   ` Trond Myklebust
@ 2011-01-05 23:28                     ` James Bottomley
  2011-01-06 17:40                       ` James Bottomley
  2011-01-05 23:28                     ` Linus Torvalds
  1 sibling, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-05 23:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 18:06 -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 13:30 -0800, Linus Torvalds wrote: 
> > On Wed, Jan 5, 2011 at 1:16 PM, Trond Myklebust
> > <Trond.Myklebust@netapp.com> wrote:
> > >
> > > So what should be the preferred way to ensure data gets flushed when
> > > you've written directly to a page, and then want to read through the
> > > vm_map_ram() virtual range? Should we be adding new semantics to
> > > flush_kernel_dcache_page()?
> > 
> > The "preferred way" is actually simple: "don't do that". IOW, if some
> > page is accessed through a virtual mapping you've set up, then
> > _always_ access it through that virtual mapping.
> > 
> > Now, when that is impossible (and yes, it sometimes is), then you
> > should flush after doing all writes. And if you do the write through
> > the regular kernel mapping, you should use flush_dcache_page(). And if
> > you did it through the virtual mapping, you should use
> > "flush_kernel_vmap_range()" or whatever.
> > 
> > NOTE! I really didn't look those up very closely, and if the accesses
> > can happen concurrently you are basically screwed, so you do need to
> > do locking or something else to guarantee that there is some nice
> > sequential order.  And maybe I forgot something.  Which is why I do
> > suggest "don't do that" as a primary approach to the problem if at all
> > possible.
> > 
> > Oh, and you may need to flush before reading too (and many writes do
> > end up being "read-modify-write" cycles) in case it's possible that
> > you have stale data from a previous read that was then invalidated by
> > a write to the aliasing address. Even if that write was flushed out,
> > the stale read data may exist at the virtual address. I forget what
> > all we required - in the end the only sane model is "virtual caches
> > suck so bad that anybody who does them should be laughed at for being
> > a retard".
> 
> Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> which takes care of invalidating the cache prior to a virtual address
> read.
> 
> My question was specifically about the write through the regular kernel
> mapping: according to Russell and my reading of the cachetlb.txt
> documentation, flush_dcache_page() is only guaranteed to have an effect
> on page cache pages.
> flush_kernel_dcache_page() (not to be confused with flush_dcache_page)
> would appear to be the closest fit according to my reading of the
> documentation, however the ARM implementation appears to be a no-op...

It depends on exactly what you're doing.  In the worst case, (ping pong
reads and writes through both aliases) you have to flush and invalidate
both alias 1 alias 2 each time you access on one and then another.

Can you explain how the code works? it looks to me like you read the xdr
stuff through the vmap region then write it out directly to the pages?
*if* this is just a conversion, *and* you never need to read the new
data through the vmap alias, you might be able to get away with a
flush_dcache_page in nfs_readdir_release_array().  If the access pattern
is more complex, you'll need more stuff splashed through the loop
(including vmap invalidation/flushing).

 Is there any way you could just rewrite nfs_readdir_add_to_array() to
use the vmap address instead of doing a kmap?  That way everything will
go through a single alias and not end up with this incoherency.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 23:28                     ` James Bottomley
@ 2011-01-06 17:40                       ` James Bottomley
  2011-01-06 17:47                         ` Trond Myklebust
                                           ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-06 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
> Can you explain how the code works? it looks to me like you read the xdr
> stuff through the vmap region then write it out directly to the pages? 

OK, I think I see how this is supposed to work:  It's a sequential loop
of reading in via the pages (i.e. through the kernel mapping) and then
updating those pages via the vmap.  In which case, I think this patch is
what you need.

The theory of operation is that the readdir on pages actually uses the
network DMA operations to perform, so when it's finished, the underlying
page is up to date.  After this you invalidate the vmap range, so we
have no cache lines above it (so it picks up the values from the
uptodate page).  Finally, after the operation on the vmap region has
finished, you flush it so that any updated contents go back to the pages
themselves before the next iteration begins.

Does this look right to people?  I've verified it fixes the issues on
parisc.

James

---

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 996dd89..bde1911 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -587,12 +587,16 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 		if (status < 0)
 			break;
 		pglen = status;
+
+		invalidate_kernel_vmap_range(pages_ptr, pglen);
+
 		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
 		if (status < 0) {
 			if (status == -ENOSPC)
 				status = 0;
 			break;
 		}
+		flush_kernel_vmap_range(pages_ptr, pglen);
 	} while (array->eof_index < 0);

 	nfs_readdir_free_large_page(pages_ptr, pages, array_size);

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:40                       ` James Bottomley
@ 2011-01-06 17:47                         ` Trond Myklebust
  2011-01-06 17:51                           ` James Bottomley
  2011-01-06 17:55                           ` Linus Torvalds
  2011-01-06 18:05                         ` Russell King - ARM Linux
  2011-01-06 20:19                         ` John Stoffel
  2 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-06 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 11:40 -0600, James Bottomley wrote: 
> On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
> > Can you explain how the code works? it looks to me like you read the xdr
> > stuff through the vmap region then write it out directly to the pages? 
> 
> OK, I think I see how this is supposed to work:  It's a sequential loop
> of reading in via the pages (i.e. through the kernel mapping) and then
> updating those pages via the vmap.  In which case, I think this patch is
> what you need.
> 
> The theory of operation is that the readdir on pages actually uses the
> network DMA operations to perform, so when it's finished, the underlying
> page is up to date.  After this you invalidate the vmap range, so we
> have no cache lines above it (so it picks up the values from the
> uptodate page).  Finally, after the operation on the vmap region has
> finished, you flush it so that any updated contents go back to the pages
> themselves before the next iteration begins.
> 
> Does this look right to people?  I've verified it fixes the issues on
> parisc.
> 
> James
> 
> ---
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 996dd89..bde1911 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -587,12 +587,16 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
>  		if (status < 0)
>  			break;
>  		pglen = status;
> +
> +		invalidate_kernel_vmap_range(pages_ptr, pglen);
> +
>  		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
>  		if (status < 0) {
>  			if (status == -ENOSPC)
>  				status = 0;
>  			break;
>  		}
> +		flush_kernel_vmap_range(pages_ptr, pglen);

Why is this line needed? We're not writing through the virtual mapping.

We checked using just the invalidate_kernel_vmap_range(), and that
appeared to suffice to fix the problem on ARM.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:47                         ` Trond Myklebust
@ 2011-01-06 17:51                           ` James Bottomley
  2011-01-06 17:55                           ` Linus Torvalds
  1 sibling, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-06 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 12:47 -0500, Trond Myklebust wrote:
> On Thu, 2011-01-06 at 11:40 -0600, James Bottomley wrote: 
> > On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
> > > Can you explain how the code works? it looks to me like you read the xdr
> > > stuff through the vmap region then write it out directly to the pages? 
> > 
> > OK, I think I see how this is supposed to work:  It's a sequential loop
> > of reading in via the pages (i.e. through the kernel mapping) and then
> > updating those pages via the vmap.  In which case, I think this patch is
> > what you need.
> > 
> > The theory of operation is that the readdir on pages actually uses the
> > network DMA operations to perform, so when it's finished, the underlying
> > page is up to date.  After this you invalidate the vmap range, so we
> > have no cache lines above it (so it picks up the values from the
> > uptodate page).  Finally, after the operation on the vmap region has
> > finished, you flush it so that any updated contents go back to the pages
> > themselves before the next iteration begins.
> > 
> > Does this look right to people?  I've verified it fixes the issues on
> > parisc.
> > 
> > James
> > 
> > ---
> > 
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 996dd89..bde1911 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -587,12 +587,16 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
> >  		if (status < 0)
> >  			break;
> >  		pglen = status;
> > +
> > +		invalidate_kernel_vmap_range(pages_ptr, pglen);
> > +
> >  		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
> >  		if (status < 0) {
> >  			if (status == -ENOSPC)
> >  				status = 0;
> >  			break;
> >  		}
> > +		flush_kernel_vmap_range(pages_ptr, pglen);
> 
> Why is this line needed? We're not writing through the virtual mapping.

If you're not altering it, it isn't ... the problem on parisc is that
invalidate is a nop for us because flush does it all, but I can fix
that.

James

> We checked using just the invalidate_kernel_vmap_range(), and that
> appeared to suffice to fix the problem on ARM.
> 
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:47                         ` Trond Myklebust
  2011-01-06 17:51                           ` James Bottomley
@ 2011-01-06 17:55                           ` Linus Torvalds
  2011-01-07 18:53                             ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-06 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 6, 2011 at 9:47 AM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
>
> Why is this line needed? We're not writing through the virtual mapping.

I haven't looked at the sequence of accesses, but you need to be
_very_ aware that "write-through" is absolutely NOT sufficient for
cache coherency.

In cache coherency, you have three options:

 - true coherency (eg physically indexed/tagged caches)

 - exclusion (eg virtual caches, but with an exclusion guarantee that
guarantees that aliases cannot happen: either by using physical
tagging or by not allowing cases that could cause virtual aliases)

 - write-through AND non-cached reads (ie "no caching at all").

You seem to be forgetting the "no cached reads" part. It's not
sufficient to flush after a write - you need to make sure that you
also don't have a cached copy of the alias for the read.

So "We're not writing through the virtual mapping" is NOT a sufficient
excuse. If you're reading through the virtual mapping, you need to
make sure that the virtual mapping is flushed _after_ any writes
through any other mapping and _before_ any reads through the virtual
one.

This is why you really really really generally don't want to have
aliasing. Purely virtual caches are pure crap. Really.

                       Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:55                           ` Linus Torvalds
@ 2011-01-07 18:53                             ` Trond Myklebust
  2011-01-07 19:02                               ` Russell King - ARM Linux
  2011-01-07 19:05                               ` James Bottomley
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-07 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 09:55 -0800, Linus Torvalds wrote: 
> On Thu, Jan 6, 2011 at 9:47 AM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> >
> > Why is this line needed? We're not writing through the virtual mapping.
> 
> I haven't looked at the sequence of accesses, but you need to be
> _very_ aware that "write-through" is absolutely NOT sufficient for
> cache coherency.
> 
> In cache coherency, you have three options:
> 
>  - true coherency (eg physically indexed/tagged caches)
> 
>  - exclusion (eg virtual caches, but with an exclusion guarantee that
> guarantees that aliases cannot happen: either by using physical
> tagging or by not allowing cases that could cause virtual aliases)
> 
>  - write-through AND non-cached reads (ie "no caching at all").
> 
> You seem to be forgetting the "no cached reads" part. It's not
> sufficient to flush after a write - you need to make sure that you
> also don't have a cached copy of the alias for the read.
> 
> So "We're not writing through the virtual mapping" is NOT a sufficient
> excuse. If you're reading through the virtual mapping, you need to
> make sure that the virtual mapping is flushed _after_ any writes
> through any other mapping and _before_ any reads through the virtual
> one.

I'm aware of that. That part should be taken care of by the call to
invalidate_kernel_vmap_range() which was in both James and my patch.

There is already code in the SUNRPC layer that calls flush_dcache_page()
after writing (although as Russell pointed out earlier, that is
apparently a no-op for non-page cache pages such as these).

> This is why you really really really generally don't want to have
> aliasing. Purely virtual caches are pure crap. Really.

Well, it looks as if NOMMU is giving us problems due to the lack of a
vm_map_ram() (see https://bugzilla.kernel.org/show_bug.cgi?id=26262).

I'd still like to keep the existing code for those architectures that
don't have problems, since that allows us to send 32k READDIR requests
instead of being limited to 4k. For large directories, that is a clear
win.
For the NOMMU case we will just go back to using a single page for
storage (and 4k READDIR requests only). Should I just do the same for
architectures like ARM and PARISC?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-07 18:53                             ` Trond Myklebust
@ 2011-01-07 19:02                               ` Russell King - ARM Linux
  2011-01-07 19:11                                 ` James Bottomley
  2011-01-07 19:13                                 ` Trond Myklebust
  2011-01-07 19:05                               ` James Bottomley
  1 sibling, 2 replies; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-07 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 07, 2011 at 01:53:25PM -0500, Trond Myklebust wrote:
> I'd still like to keep the existing code for those architectures that
> don't have problems, since that allows us to send 32k READDIR requests
> instead of being limited to 4k. For large directories, that is a clear
> win.
> For the NOMMU case we will just go back to using a single page for
> storage (and 4k READDIR requests only). Should I just do the same for
> architectures like ARM and PARISC?

I think you said that readdir reads via the vmalloc mapping of the
group of pages, but XDR writes to the individual pages.

As I understand NFS, you receive a packet, you then have to use XDR
to unpack the data, which you presumably write into the set of
struct page *'s using kmap?

Isn't a solution to have XDR write directly into the vmalloc mapping
rather than using struct page * and kmap?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-07 19:02                               ` Russell King - ARM Linux
@ 2011-01-07 19:11                                 ` James Bottomley
  2011-01-08 16:49                                   ` Trond Myklebust
  2011-01-07 19:13                                 ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-07 19:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-01-07 at 19:02 +0000, Russell King - ARM Linux wrote:
> On Fri, Jan 07, 2011 at 01:53:25PM -0500, Trond Myklebust wrote:
> > I'd still like to keep the existing code for those architectures that
> > don't have problems, since that allows us to send 32k READDIR requests
> > instead of being limited to 4k. For large directories, that is a clear
> > win.
> > For the NOMMU case we will just go back to using a single page for
> > storage (and 4k READDIR requests only). Should I just do the same for
> > architectures like ARM and PARISC?
> 
> I think you said that readdir reads via the vmalloc mapping of the
> group of pages, but XDR writes to the individual pages.

Actually it's the other way around, but the point still stands.

> As I understand NFS, you receive a packet, you then have to use XDR
> to unpack the data, which you presumably write into the set of
> struct page *'s using kmap?
> 
> Isn't a solution to have XDR write directly into the vmalloc mapping
> rather than using struct page * and kmap?

So, unfortuantely, I looked at doing this and we can't.  the ->readdir()
call takes an array of pages, not a kernel virtual address of the pages,
so there's no way to tell it to use a different mapping from the usual
kernel one on them.

On the other hand, the xdr routines, since they take the pages anyway,
could use a scatterlist approach to writing through the kernel mapping
instead of using vmap ... we have all the machinery for this in
lib/scatterlist.c ... it's not designed for this case, since it's
designed to allow arbitrary linear reads and writes on a block
scatterlist, but the principle is the same ... it looks like it would be
rather a big patch, though ... 

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-07 19:11                                 ` James Bottomley
@ 2011-01-08 16:49                                   ` Trond Myklebust
  2011-01-08 23:15                                     ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-08 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-01-07 at 13:11 -0600, James Bottomley wrote: 
> On the other hand, the xdr routines, since they take the pages anyway,
> could use a scatterlist approach to writing through the kernel mapping
> instead of using vmap ... we have all the machinery for this in
> lib/scatterlist.c ... it's not designed for this case, since it's
> designed to allow arbitrary linear reads and writes on a block
> scatterlist, but the principle is the same ... it looks like it would be
> rather a big patch, though ... 

The following alternative seems to work for me, but has only been
lightly tested so far. It's a bit large for a stable patch, but not too
ungainly.

It modifies xdr_stream, adding the ability to iterate through page data.
To avoid kmap()/kunmap(), it does require that pages be allocated in
lowmem, but since the only use case here is when using page arrays as
temporary buffers, that seems like an acceptable compromise.

Cheers
  Trond

--------------------------------------------------------------------------------- 
>From f87f13e3198ec536c1b9cfe19ad47df4fb922382 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Fri, 7 Jan 2011 18:51:33 -0500
Subject: [PATCH] NFS: Don't use vm_map_ram() in readdir

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c               |   45 ++++++++---------
 fs/nfs/nfs2xdr.c           |    6 --
 fs/nfs/nfs3xdr.c           |    6 --
 fs/nfs/nfs4xdr.c           |    6 --
 include/linux/sunrpc/xdr.h |    4 +-
 net/sunrpc/xdr.c           |  115 ++++++++++++++++++++++++++++++++++++--------
 6 files changed, 120 insertions(+), 62 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 996dd89..ad9e5e0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,7 +33,6 @@
 #include <linux/namei.h>
 #include <linux/mount.h>
 #include <linux/sched.h>
-#include <linux/vmalloc.h>
 #include <linux/kmemleak.h>
 
 #include "delegation.h"
@@ -459,25 +458,27 @@ out:
 /* Perform conversion from xdr to cache array */
 static
 int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry,
-				void *xdr_page, struct page *page, unsigned int buflen)
+				struct page **xdr_pages, struct page *page, unsigned int buflen)
 {
 	struct xdr_stream stream;
-	struct xdr_buf buf;
-	__be32 *ptr = xdr_page;
+	struct xdr_buf buf = {
+		.pages = xdr_pages,
+		.page_len = buflen,
+		.buflen = buflen,
+		.len = buflen,
+	};
 	struct nfs_cache_array *array;
 	unsigned int count = 0;
+	struct page *scratch;
 	int status;
 
-	buf.head->iov_base = xdr_page;
-	buf.head->iov_len = buflen;
-	buf.tail->iov_len = 0;
-	buf.page_base = 0;
-	buf.page_len = 0;
-	buf.buflen = buf.head->iov_len;
-	buf.len = buf.head->iov_len;
-
-	xdr_init_decode(&stream, &buf, ptr);
+	scratch = alloc_page(GFP_KERNEL);
+	if (scratch == NULL)
+		return -ENOMEM;
 
+	xdr_init_decode(&stream, &buf, NULL);
+	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
+	xdr_enter_page(&stream, buflen);
 
 	do {
 		status = xdr_decode(desc, entry, &stream);
@@ -506,6 +507,8 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
 		} else
 			status = PTR_ERR(array);
 	}
+
+	put_page(scratch);
 	return status;
 }
 
@@ -521,7 +524,6 @@ static
 void nfs_readdir_free_large_page(void *ptr, struct page **pages,
 		unsigned int npages)
 {
-	vm_unmap_ram(ptr, npages);
 	nfs_readdir_free_pagearray(pages, npages);
 }
 
@@ -530,9 +532,8 @@ void nfs_readdir_free_large_page(void *ptr, struct page **pages,
  * to nfs_readdir_free_large_page
  */
 static
-void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
+int nfs_readdir_large_page(struct page **pages, unsigned int npages)
 {
-	void *ptr;
 	unsigned int i;
 
 	for (i = 0; i < npages; i++) {
@@ -541,13 +542,11 @@ void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
 			goto out_freepages;
 		pages[i] = page;
 	}
+	return 0;
 
-	ptr = vm_map_ram(pages, npages, 0, PAGE_KERNEL);
-	if (!IS_ERR_OR_NULL(ptr))
-		return ptr;
 out_freepages:
 	nfs_readdir_free_pagearray(pages, i);
-	return NULL;
+	return -ENOMEM;
 }
 
 static
@@ -577,8 +576,8 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 	memset(array, 0, sizeof(struct nfs_cache_array));
 	array->eof_index = -1;
 
-	pages_ptr = nfs_readdir_large_page(pages, array_size);
-	if (!pages_ptr)
+	status = nfs_readdir_large_page(pages, array_size);
+	if (status < 0)
 		goto out_release_array;
 	do {
 		unsigned int pglen;
@@ -587,7 +586,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 		if (status < 0)
 			break;
 		pglen = status;
-		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
+		status = nfs_readdir_page_filler(desc, &entry, pages, page, pglen);
 		if (status < 0) {
 			if (status == -ENOSPC)
 				status = 0;
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 5914a19..b382a1b 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -487,12 +487,6 @@ nfs_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_se
 
 	entry->d_type = DT_UNKNOWN;
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index f6cc60f..ba91236 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -647,12 +647,6 @@ nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_s
 			memset((u8*)(entry->fh), 0, sizeof(*entry->fh));
 	}
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 9f1826b..0662a98 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6215,12 +6215,6 @@ __be32 *nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	if (verify_attr_len(xdr, p, len) < 0)
 		goto out_overflow;
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 498ab93..7783c68 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -201,6 +201,8 @@ struct xdr_stream {
 
 	__be32 *end;		/* end of available buffer space */
 	struct kvec *iov;	/* pointer to the current kvec */
+	struct kvec scratch;	/* Scratch buffer */
+	struct page **page_ptr;	/* pointer to the current page */
 };
 
 extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
@@ -208,7 +210,7 @@ extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
 		unsigned int base, unsigned int len);
 extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
-extern __be32 *xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen);
 extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
 extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index cd9e841..cff0974 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -569,29 +569,112 @@ void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
 	xdr->iov = iov;
 	xdr->p = p;
 	xdr->end = (__be32 *)((char *)iov->iov_base + len);
+	xdr->page_ptr = NULL;
+	xdr->scratch.iov_base = NULL;
+	xdr->scratch.iov_len = 0;
 }
 EXPORT_SYMBOL_GPL(xdr_init_decode);
 
 /**
- * xdr_inline_peek - Allow read-ahead in the XDR data stream
+ * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data.
  * @xdr: pointer to xdr_stream struct
- * @nbytes: number of bytes of data to decode
+ * @buf: pointer to an empty buffer
+ * @buflen: size of 'buf'
  *
- * Check if the input buffer is long enough to enable us to decode
- * 'nbytes' more bytes of data starting at the current position.
- * If so return the current pointer without updating the current
- * pointer position.
+ * The scratch buffer is used when decoding from an array of pages.
+ * If an xdr_inline_decode() call spans across page boundaries, then
+ * we copy the data into the scratch buffer in order to allow linear
+ * access.
  */
-__be32 * xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes)
+void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen)
+{
+	xdr->scratch.iov_base = buf;
+	xdr->scratch.iov_len = buflen;
+}
+EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer);
+
+static int xdr_set_page_base(struct xdr_stream *xdr,
+		unsigned int base, unsigned int len)
+{
+	unsigned int pgnr;
+	unsigned int maxlen;
+	unsigned int pgoff;
+	unsigned int pgend;
+	void *kaddr;
+
+	maxlen = xdr->buf->page_len;
+	if (base >= maxlen)
+		return -EINVAL;
+	maxlen -= base;
+
+	if (len > maxlen)
+		len = maxlen;
+
+	base += xdr->buf->page_base;
+
+	pgnr = base >> PAGE_SHIFT;
+	xdr->page_ptr = &xdr->buf->pages[pgnr];
+	kaddr = page_address(*xdr->page_ptr);
+
+	pgoff = base & ~PAGE_MASK;
+	xdr->p = (__be32*)(kaddr + pgoff);
+
+	pgend = pgoff + len;
+	if (pgend > PAGE_SIZE)
+		pgend = PAGE_SIZE;
+	xdr->end = (__be32*)(kaddr + pgend);
+	return 0;
+}
+
+static int xdr_set_next_page(struct xdr_stream *xdr)
+{
+	unsigned int newbase;
+
+	newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
+	newbase -= xdr->buf->page_base;
+
+	return xdr_set_page_base(xdr, newbase, PAGE_SIZE);
+}
+
+static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
 {
 	__be32 *p = xdr->p;
 	__be32 *q = p + XDR_QUADLEN(nbytes);
 
 	if (unlikely(q > xdr->end || q < p))
 		return NULL;
+	xdr->p = q;
 	return p;
 }
-EXPORT_SYMBOL_GPL(xdr_inline_peek);
+
+static __be32 *xdr_page_inline_decode(struct xdr_stream *xdr, size_t nbytes)
+{
+	size_t cplen;
+	void *cpdest;
+	__be32 *p;
+
+	if (xdr->p == xdr->end && xdr_set_next_page(xdr) < 0)
+		return NULL;
+
+	p = __xdr_inline_decode(xdr, nbytes);
+	if (p != NULL)
+		return p;
+
+	if (nbytes > xdr->scratch.iov_len)
+		return NULL;
+	cplen = (char *)xdr->end - (char *)xdr->p;
+	cpdest = xdr->scratch.iov_base;
+	memcpy(cpdest, xdr->p, cplen);
+	cpdest += cplen;
+	nbytes -= cplen;
+	if (xdr_set_next_page(xdr) < 0)
+		return NULL;
+	p = __xdr_inline_decode(xdr, nbytes);
+	if (p == NULL)
+		return NULL;
+	memcpy(cpdest, p, nbytes);
+	return xdr->scratch.iov_base;
+}
 
 /**
  * xdr_inline_decode - Retrieve non-page XDR data to decode
@@ -605,13 +688,9 @@ EXPORT_SYMBOL_GPL(xdr_inline_peek);
  */
 __be32 * xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
 {
-	__be32 *p = xdr->p;
-	__be32 *q = p + XDR_QUADLEN(nbytes);
-
-	if (unlikely(q > xdr->end || q < p))
-		return NULL;
-	xdr->p = q;
-	return p;
+	if (xdr->page_ptr == NULL)
+		return __xdr_inline_decode(xdr, nbytes);
+	return xdr_page_inline_decode(xdr, nbytes);
 }
 EXPORT_SYMBOL_GPL(xdr_inline_decode);
 
@@ -671,16 +750,12 @@ EXPORT_SYMBOL_GPL(xdr_read_pages);
  */
 void xdr_enter_page(struct xdr_stream *xdr, unsigned int len)
 {
-	char * kaddr = page_address(xdr->buf->pages[0]);
 	xdr_read_pages(xdr, len);
 	/*
 	 * Position current pointer at beginning of tail, and
 	 * set remaining message length.
 	 */
-	if (len > PAGE_CACHE_SIZE - xdr->buf->page_base)
-		len = PAGE_CACHE_SIZE - xdr->buf->page_base;
-	xdr->p = (__be32 *)(kaddr + xdr->buf->page_base);
-	xdr->end = (__be32 *)((char *)xdr->p + len);
+	xdr_set_page_base(xdr, 0, len);
 }
 EXPORT_SYMBOL_GPL(xdr_enter_page);
 
-- 
1.7.3.4



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-08 16:49                                   ` Trond Myklebust
@ 2011-01-08 23:15                                     ` Trond Myklebust
  2011-01-10 10:50                                       ` Uwe Kleine-König
  2011-01-10 12:44                                       ` Marc Kleine-Budde
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-08 23:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2011-01-08 at 11:49 -0500, Trond Myklebust wrote: 
> On Fri, 2011-01-07 at 13:11 -0600, James Bottomley wrote: 
> > On the other hand, the xdr routines, since they take the pages anyway,
> > could use a scatterlist approach to writing through the kernel mapping
> > instead of using vmap ... we have all the machinery for this in
> > lib/scatterlist.c ... it's not designed for this case, since it's
> > designed to allow arbitrary linear reads and writes on a block
> > scatterlist, but the principle is the same ... it looks like it would be
> > rather a big patch, though ... 
> 
> The following alternative seems to work for me, but has only been
> lightly tested so far. It's a bit large for a stable patch, but not too
> ungainly.
> 
> It modifies xdr_stream, adding the ability to iterate through page data.
> To avoid kmap()/kunmap(), it does require that pages be allocated in
> lowmem, but since the only use case here is when using page arrays as
> temporary buffers, that seems like an acceptable compromise.

...and here is an update which makes the whole process transparent to
the decoder. It basically teaches xdr_inline_decode() how to switch
buffers when it hits the end of the current iovec and/or page.

Cheers
  Trond
----------------------------------------------------------------------------------- 
>From 8b2e60cef5c65eef41ab61286f62dec6bfb1ac27 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Sat, 8 Jan 2011 17:45:38 -0500
Subject: [PATCH] NFS: Don't use vm_map_ram() in readdir

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/dir.c               |   44 ++++++-------
 fs/nfs/nfs2xdr.c           |    6 --
 fs/nfs/nfs3xdr.c           |    6 --
 fs/nfs/nfs4xdr.c           |    6 --
 include/linux/sunrpc/xdr.h |    4 +-
 net/sunrpc/xdr.c           |  155 +++++++++++++++++++++++++++++++++++---------
 6 files changed, 148 insertions(+), 73 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 996dd89..0108cf4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -33,7 +33,6 @@
 #include <linux/namei.h>
 #include <linux/mount.h>
 #include <linux/sched.h>
-#include <linux/vmalloc.h>
 #include <linux/kmemleak.h>
 
 #include "delegation.h"
@@ -459,25 +458,26 @@ out:
 /* Perform conversion from xdr to cache array */
 static
 int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry,
-				void *xdr_page, struct page *page, unsigned int buflen)
+				struct page **xdr_pages, struct page *page, unsigned int buflen)
 {
 	struct xdr_stream stream;
-	struct xdr_buf buf;
-	__be32 *ptr = xdr_page;
+	struct xdr_buf buf = {
+		.pages = xdr_pages,
+		.page_len = buflen,
+		.buflen = buflen,
+		.len = buflen,
+	};
+	struct page *scratch;
 	struct nfs_cache_array *array;
 	unsigned int count = 0;
 	int status;
 
-	buf.head->iov_base = xdr_page;
-	buf.head->iov_len = buflen;
-	buf.tail->iov_len = 0;
-	buf.page_base = 0;
-	buf.page_len = 0;
-	buf.buflen = buf.head->iov_len;
-	buf.len = buf.head->iov_len;
-
-	xdr_init_decode(&stream, &buf, ptr);
+	scratch = alloc_page(GFP_KERNEL);
+	if (scratch == NULL)
+		return -ENOMEM;
 
+	xdr_init_decode(&stream, &buf, NULL);
+	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
 
 	do {
 		status = xdr_decode(desc, entry, &stream);
@@ -506,6 +506,8 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
 		} else
 			status = PTR_ERR(array);
 	}
+
+	put_page(scratch);
 	return status;
 }
 
@@ -521,7 +523,6 @@ static
 void nfs_readdir_free_large_page(void *ptr, struct page **pages,
 		unsigned int npages)
 {
-	vm_unmap_ram(ptr, npages);
 	nfs_readdir_free_pagearray(pages, npages);
 }
 
@@ -530,9 +531,8 @@ void nfs_readdir_free_large_page(void *ptr, struct page **pages,
  * to nfs_readdir_free_large_page
  */
 static
-void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
+int nfs_readdir_large_page(struct page **pages, unsigned int npages)
 {
-	void *ptr;
 	unsigned int i;
 
 	for (i = 0; i < npages; i++) {
@@ -541,13 +541,11 @@ void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
 			goto out_freepages;
 		pages[i] = page;
 	}
+	return 0;
 
-	ptr = vm_map_ram(pages, npages, 0, PAGE_KERNEL);
-	if (!IS_ERR_OR_NULL(ptr))
-		return ptr;
 out_freepages:
 	nfs_readdir_free_pagearray(pages, i);
-	return NULL;
+	return -ENOMEM;
 }
 
 static
@@ -577,8 +575,8 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 	memset(array, 0, sizeof(struct nfs_cache_array));
 	array->eof_index = -1;
 
-	pages_ptr = nfs_readdir_large_page(pages, array_size);
-	if (!pages_ptr)
+	status = nfs_readdir_large_page(pages, array_size);
+	if (status < 0)
 		goto out_release_array;
 	do {
 		unsigned int pglen;
@@ -587,7 +585,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 		if (status < 0)
 			break;
 		pglen = status;
-		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
+		status = nfs_readdir_page_filler(desc, &entry, pages, page, pglen);
 		if (status < 0) {
 			if (status == -ENOSPC)
 				status = 0;
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 5914a19..b382a1b 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -487,12 +487,6 @@ nfs_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_se
 
 	entry->d_type = DT_UNKNOWN;
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index f6cc60f..ba91236 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -647,12 +647,6 @@ nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_s
 			memset((u8*)(entry->fh), 0, sizeof(*entry->fh));
 	}
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 9f1826b..0662a98 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6215,12 +6215,6 @@ __be32 *nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	if (verify_attr_len(xdr, p, len) < 0)
 		goto out_overflow;
 
-	p = xdr_inline_peek(xdr, 8);
-	if (p != NULL)
-		entry->eof = !p[0] && p[1];
-	else
-		entry->eof = 0;
-
 	return p;
 
 out_overflow:
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 498ab93..7783c68 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -201,6 +201,8 @@ struct xdr_stream {
 
 	__be32 *end;		/* end of available buffer space */
 	struct kvec *iov;	/* pointer to the current kvec */
+	struct kvec scratch;	/* Scratch buffer */
+	struct page **page_ptr;	/* pointer to the current page */
 };
 
 extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
@@ -208,7 +210,7 @@ extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
 		unsigned int base, unsigned int len);
 extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
-extern __be32 *xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen);
 extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
 extern void xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
 extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index cd9e841..679cd67 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -552,6 +552,74 @@ void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, unsigned int b
 }
 EXPORT_SYMBOL_GPL(xdr_write_pages);
 
+static void xdr_set_iov(struct xdr_stream *xdr, struct kvec *iov,
+		__be32 *p, unsigned int len)
+{
+	if (len > iov->iov_len)
+		len = iov->iov_len;
+	if (p == NULL)
+		p = (__be32*)iov->iov_base;
+	xdr->p = p;
+	xdr->end = (__be32*)(iov->iov_base + len);
+	xdr->iov = iov;
+	xdr->page_ptr = NULL;
+}
+
+static int xdr_set_page_base(struct xdr_stream *xdr,
+		unsigned int base, unsigned int len)
+{
+	unsigned int pgnr;
+	unsigned int maxlen;
+	unsigned int pgoff;
+	unsigned int pgend;
+	void *kaddr;
+
+	maxlen = xdr->buf->page_len;
+	if (base >= maxlen)
+		return -EINVAL;
+	maxlen -= base;
+	if (len > maxlen)
+		len = maxlen;
+
+	base += xdr->buf->page_base;
+
+	pgnr = base >> PAGE_SHIFT;
+	xdr->page_ptr = &xdr->buf->pages[pgnr];
+	kaddr = page_address(*xdr->page_ptr);
+
+	pgoff = base & ~PAGE_MASK;
+	xdr->p = (__be32*)(kaddr + pgoff);
+
+	pgend = pgoff + len;
+	if (pgend > PAGE_SIZE)
+		pgend = PAGE_SIZE;
+	xdr->end = (__be32*)(kaddr + pgend);
+	xdr->iov = NULL;
+	return 0;
+}
+
+static void xdr_set_next_page(struct xdr_stream *xdr)
+{
+	unsigned int newbase;
+
+	newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
+	newbase -= xdr->buf->page_base;
+
+	if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
+		xdr_set_iov(xdr, xdr->buf->tail, NULL, xdr->buf->len);
+}
+
+static bool xdr_set_next_buffer(struct xdr_stream *xdr)
+{
+	if (xdr->page_ptr != NULL)
+		xdr_set_next_page(xdr);
+	else if (xdr->iov == xdr->buf->head) {
+		if (xdr_set_page_base(xdr, 0, PAGE_SIZE) < 0)
+			xdr_set_iov(xdr, xdr->buf->tail, NULL, xdr->buf->len);
+	}
+	return xdr->p != xdr->end;
+}
+
 /**
  * xdr_init_decode - Initialize an xdr_stream for decoding data.
  * @xdr: pointer to xdr_stream struct
@@ -560,41 +628,67 @@ EXPORT_SYMBOL_GPL(xdr_write_pages);
  */
 void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
 {
-	struct kvec *iov = buf->head;
-	unsigned int len = iov->iov_len;
-
-	if (len > buf->len)
-		len = buf->len;
 	xdr->buf = buf;
-	xdr->iov = iov;
-	xdr->p = p;
-	xdr->end = (__be32 *)((char *)iov->iov_base + len);
+	xdr->scratch.iov_base = NULL;
+	xdr->scratch.iov_len = 0;
+	if (buf->head[0].iov_len != 0)
+		xdr_set_iov(xdr, buf->head, p, buf->len);
+	else if (buf->page_len != 0)
+		xdr_set_page_base(xdr, 0, buf->len);
 }
 EXPORT_SYMBOL_GPL(xdr_init_decode);
 
-/**
- * xdr_inline_peek - Allow read-ahead in the XDR data stream
- * @xdr: pointer to xdr_stream struct
- * @nbytes: number of bytes of data to decode
- *
- * Check if the input buffer is long enough to enable us to decode
- * 'nbytes' more bytes of data starting at the current position.
- * If so return the current pointer without updating the current
- * pointer position.
- */
-__be32 * xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes)
+static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
 {
 	__be32 *p = xdr->p;
 	__be32 *q = p + XDR_QUADLEN(nbytes);
 
 	if (unlikely(q > xdr->end || q < p))
 		return NULL;
+	xdr->p = q;
 	return p;
 }
-EXPORT_SYMBOL_GPL(xdr_inline_peek);
 
 /**
- * xdr_inline_decode - Retrieve non-page XDR data to decode
+ * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data.
+ * @xdr: pointer to xdr_stream struct
+ * @buf: pointer to an empty buffer
+ * @buflen: size of 'buf'
+ *
+ * The scratch buffer is used when decoding from an array of pages.
+ * If an xdr_inline_decode() call spans across page boundaries, then
+ * we copy the data into the scratch buffer in order to allow linear
+ * access.
+ */
+void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen)
+{
+	xdr->scratch.iov_base = buf;
+	xdr->scratch.iov_len = buflen;
+}
+EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer);
+
+static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes)
+{
+	__be32 *p;
+	void *cpdest = xdr->scratch.iov_base;
+	size_t cplen = (char *)xdr->end - (char *)xdr->p;
+
+	if (nbytes > xdr->scratch.iov_len)
+		return NULL;
+	memcpy(cpdest, xdr->p, cplen);
+	cpdest += cplen;
+	nbytes -= cplen;
+	if (!xdr_set_next_buffer(xdr))
+		return NULL;
+	p = __xdr_inline_decode(xdr, nbytes);
+	if (p == NULL)
+		return NULL;
+	memcpy(cpdest, p, nbytes);
+	return xdr->scratch.iov_base;
+}
+
+/**
+ * xdr_inline_decode - Retrieve XDR data to decode
  * @xdr: pointer to xdr_stream struct
  * @nbytes: number of bytes of data to decode
  *
@@ -605,13 +699,16 @@ EXPORT_SYMBOL_GPL(xdr_inline_peek);
  */
 __be32 * xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
 {
-	__be32 *p = xdr->p;
-	__be32 *q = p + XDR_QUADLEN(nbytes);
+	__be32 *p;
 
-	if (unlikely(q > xdr->end || q < p))
+	if (nbytes == 0)
+		return xdr->p;
+	if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr))
 		return NULL;
-	xdr->p = q;
-	return p;
+	p = __xdr_inline_decode(xdr, nbytes);
+	if (p != NULL)
+		return p;
+	return xdr_copy_to_scratch(xdr, nbytes);
 }
 EXPORT_SYMBOL_GPL(xdr_inline_decode);
 
@@ -671,16 +768,12 @@ EXPORT_SYMBOL_GPL(xdr_read_pages);
  */
 void xdr_enter_page(struct xdr_stream *xdr, unsigned int len)
 {
-	char * kaddr = page_address(xdr->buf->pages[0]);
 	xdr_read_pages(xdr, len);
 	/*
 	 * Position current pointer at beginning of tail, and
 	 * set remaining message length.
 	 */
-	if (len > PAGE_CACHE_SIZE - xdr->buf->page_base)
-		len = PAGE_CACHE_SIZE - xdr->buf->page_base;
-	xdr->p = (__be32 *)(kaddr + xdr->buf->page_base);
-	xdr->end = (__be32 *)((char *)xdr->p + len);
+	xdr_set_page_base(xdr, 0, len);
 }
 EXPORT_SYMBOL_GPL(xdr_enter_page);
 
-- 
1.7.3.4



-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-08 23:15                                     ` Trond Myklebust
@ 2011-01-10 10:50                                       ` Uwe Kleine-König
  2011-01-10 16:25                                         ` Trond Myklebust
  2011-01-10 12:44                                       ` Marc Kleine-Budde
  1 sibling, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-10 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Trond,

On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
> ----------------------------------------------------------------------------------- 
It would be great if you could add a "8<" in the line above next time.
Then git-am -c does the right thing (at least I think so).

> From 8b2e60cef5c65eef41ab61286f62dec6bfb1ac27 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Sat, 8 Jan 2011 17:45:38 -0500
> Subject: [PATCH] NFS: Don't use vm_map_ram() in readdir
> 
> vm_map_ram() is not available on NOMMU platforms, and causes trouble
> on incoherrent architectures such as ARM when we access the page data
> through both the direct and the virtual mapping.
> 
> The alternative is to use the direct mapping to access page data
> for the case when we are not crossing a page boundary, but to copy
> the data into a linear scratch buffer when we are accessing data
> that spans page boundaries.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Uwe Kleine-K?nig <u.kleine-koenig@pengutronix.de>

on tx28.

Thanks
Uwe

> ---
>  fs/nfs/dir.c               |   44 ++++++-------
>  fs/nfs/nfs2xdr.c           |    6 --
>  fs/nfs/nfs3xdr.c           |    6 --
>  fs/nfs/nfs4xdr.c           |    6 --
>  include/linux/sunrpc/xdr.h |    4 +-
>  net/sunrpc/xdr.c           |  155 +++++++++++++++++++++++++++++++++++---------
>  6 files changed, 148 insertions(+), 73 deletions(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 996dd89..0108cf4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -33,7 +33,6 @@
>  #include <linux/namei.h>
>  #include <linux/mount.h>
>  #include <linux/sched.h>
> -#include <linux/vmalloc.h>
>  #include <linux/kmemleak.h>
>  
>  #include "delegation.h"
> @@ -459,25 +458,26 @@ out:
>  /* Perform conversion from xdr to cache array */
>  static
>  int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry,
> -				void *xdr_page, struct page *page, unsigned int buflen)
> +				struct page **xdr_pages, struct page *page, unsigned int buflen)
>  {
>  	struct xdr_stream stream;
> -	struct xdr_buf buf;
> -	__be32 *ptr = xdr_page;
> +	struct xdr_buf buf = {
> +		.pages = xdr_pages,
> +		.page_len = buflen,
> +		.buflen = buflen,
> +		.len = buflen,
> +	};
> +	struct page *scratch;
>  	struct nfs_cache_array *array;
>  	unsigned int count = 0;
>  	int status;
>  
> -	buf.head->iov_base = xdr_page;
> -	buf.head->iov_len = buflen;
> -	buf.tail->iov_len = 0;
> -	buf.page_base = 0;
> -	buf.page_len = 0;
> -	buf.buflen = buf.head->iov_len;
> -	buf.len = buf.head->iov_len;
> -
> -	xdr_init_decode(&stream, &buf, ptr);
> +	scratch = alloc_page(GFP_KERNEL);
> +	if (scratch == NULL)
> +		return -ENOMEM;
>  
> +	xdr_init_decode(&stream, &buf, NULL);
> +	xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE);
>  
>  	do {
>  		status = xdr_decode(desc, entry, &stream);
> @@ -506,6 +506,8 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
>  		} else
>  			status = PTR_ERR(array);
>  	}
> +
> +	put_page(scratch);
>  	return status;
>  }
>  
> @@ -521,7 +523,6 @@ static
>  void nfs_readdir_free_large_page(void *ptr, struct page **pages,
>  		unsigned int npages)
>  {
> -	vm_unmap_ram(ptr, npages);
>  	nfs_readdir_free_pagearray(pages, npages);
>  }
>  
> @@ -530,9 +531,8 @@ void nfs_readdir_free_large_page(void *ptr, struct page **pages,
>   * to nfs_readdir_free_large_page
>   */
>  static
> -void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
> +int nfs_readdir_large_page(struct page **pages, unsigned int npages)
>  {
> -	void *ptr;
>  	unsigned int i;
>  
>  	for (i = 0; i < npages; i++) {
> @@ -541,13 +541,11 @@ void *nfs_readdir_large_page(struct page **pages, unsigned int npages)
>  			goto out_freepages;
>  		pages[i] = page;
>  	}
> +	return 0;
>  
> -	ptr = vm_map_ram(pages, npages, 0, PAGE_KERNEL);
> -	if (!IS_ERR_OR_NULL(ptr))
> -		return ptr;
>  out_freepages:
>  	nfs_readdir_free_pagearray(pages, i);
> -	return NULL;
> +	return -ENOMEM;
>  }
>  
>  static
> @@ -577,8 +575,8 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
>  	memset(array, 0, sizeof(struct nfs_cache_array));
>  	array->eof_index = -1;
>  
> -	pages_ptr = nfs_readdir_large_page(pages, array_size);
> -	if (!pages_ptr)
> +	status = nfs_readdir_large_page(pages, array_size);
> +	if (status < 0)
>  		goto out_release_array;
>  	do {
>  		unsigned int pglen;
> @@ -587,7 +585,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
>  		if (status < 0)
>  			break;
>  		pglen = status;
> -		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
> +		status = nfs_readdir_page_filler(desc, &entry, pages, page, pglen);
>  		if (status < 0) {
>  			if (status == -ENOSPC)
>  				status = 0;
> diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> index 5914a19..b382a1b 100644
> --- a/fs/nfs/nfs2xdr.c
> +++ b/fs/nfs/nfs2xdr.c
> @@ -487,12 +487,6 @@ nfs_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_se
>  
>  	entry->d_type = DT_UNKNOWN;
>  
> -	p = xdr_inline_peek(xdr, 8);
> -	if (p != NULL)
> -		entry->eof = !p[0] && p[1];
> -	else
> -		entry->eof = 0;
> -
>  	return p;
>  
>  out_overflow:
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index f6cc60f..ba91236 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -647,12 +647,6 @@ nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, struct nfs_s
>  			memset((u8*)(entry->fh), 0, sizeof(*entry->fh));
>  	}
>  
> -	p = xdr_inline_peek(xdr, 8);
> -	if (p != NULL)
> -		entry->eof = !p[0] && p[1];
> -	else
> -		entry->eof = 0;
> -
>  	return p;
>  
>  out_overflow:
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 9f1826b..0662a98 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -6215,12 +6215,6 @@ __be32 *nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
>  	if (verify_attr_len(xdr, p, len) < 0)
>  		goto out_overflow;
>  
> -	p = xdr_inline_peek(xdr, 8);
> -	if (p != NULL)
> -		entry->eof = !p[0] && p[1];
> -	else
> -		entry->eof = 0;
> -
>  	return p;
>  
>  out_overflow:
> diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
> index 498ab93..7783c68 100644
> --- a/include/linux/sunrpc/xdr.h
> +++ b/include/linux/sunrpc/xdr.h
> @@ -201,6 +201,8 @@ struct xdr_stream {
>  
>  	__be32 *end;		/* end of available buffer space */
>  	struct kvec *iov;	/* pointer to the current kvec */
> +	struct kvec scratch;	/* Scratch buffer */
> +	struct page **page_ptr;	/* pointer to the current page */
>  };
>  
>  extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
> @@ -208,7 +210,7 @@ extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
>  extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
>  		unsigned int base, unsigned int len);
>  extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
> -extern __be32 *xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes);
> +extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen);
>  extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
>  extern void xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
>  extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index cd9e841..679cd67 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -552,6 +552,74 @@ void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, unsigned int b
>  }
>  EXPORT_SYMBOL_GPL(xdr_write_pages);
>  
> +static void xdr_set_iov(struct xdr_stream *xdr, struct kvec *iov,
> +		__be32 *p, unsigned int len)
> +{
> +	if (len > iov->iov_len)
> +		len = iov->iov_len;
> +	if (p == NULL)
> +		p = (__be32*)iov->iov_base;
> +	xdr->p = p;
> +	xdr->end = (__be32*)(iov->iov_base + len);
> +	xdr->iov = iov;
> +	xdr->page_ptr = NULL;
> +}
> +
> +static int xdr_set_page_base(struct xdr_stream *xdr,
> +		unsigned int base, unsigned int len)
> +{
> +	unsigned int pgnr;
> +	unsigned int maxlen;
> +	unsigned int pgoff;
> +	unsigned int pgend;
> +	void *kaddr;
> +
> +	maxlen = xdr->buf->page_len;
> +	if (base >= maxlen)
> +		return -EINVAL;
> +	maxlen -= base;
> +	if (len > maxlen)
> +		len = maxlen;
> +
> +	base += xdr->buf->page_base;
> +
> +	pgnr = base >> PAGE_SHIFT;
> +	xdr->page_ptr = &xdr->buf->pages[pgnr];
> +	kaddr = page_address(*xdr->page_ptr);
> +
> +	pgoff = base & ~PAGE_MASK;
> +	xdr->p = (__be32*)(kaddr + pgoff);
> +
> +	pgend = pgoff + len;
> +	if (pgend > PAGE_SIZE)
> +		pgend = PAGE_SIZE;
> +	xdr->end = (__be32*)(kaddr + pgend);
> +	xdr->iov = NULL;
> +	return 0;
> +}
> +
> +static void xdr_set_next_page(struct xdr_stream *xdr)
> +{
> +	unsigned int newbase;
> +
> +	newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
> +	newbase -= xdr->buf->page_base;
> +
> +	if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
> +		xdr_set_iov(xdr, xdr->buf->tail, NULL, xdr->buf->len);
> +}
> +
> +static bool xdr_set_next_buffer(struct xdr_stream *xdr)
> +{
> +	if (xdr->page_ptr != NULL)
> +		xdr_set_next_page(xdr);
> +	else if (xdr->iov == xdr->buf->head) {
> +		if (xdr_set_page_base(xdr, 0, PAGE_SIZE) < 0)
> +			xdr_set_iov(xdr, xdr->buf->tail, NULL, xdr->buf->len);
> +	}
> +	return xdr->p != xdr->end;
> +}
> +
>  /**
>   * xdr_init_decode - Initialize an xdr_stream for decoding data.
>   * @xdr: pointer to xdr_stream struct
> @@ -560,41 +628,67 @@ EXPORT_SYMBOL_GPL(xdr_write_pages);
>   */
>  void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
>  {
> -	struct kvec *iov = buf->head;
> -	unsigned int len = iov->iov_len;
> -
> -	if (len > buf->len)
> -		len = buf->len;
>  	xdr->buf = buf;
> -	xdr->iov = iov;
> -	xdr->p = p;
> -	xdr->end = (__be32 *)((char *)iov->iov_base + len);
> +	xdr->scratch.iov_base = NULL;
> +	xdr->scratch.iov_len = 0;
> +	if (buf->head[0].iov_len != 0)
> +		xdr_set_iov(xdr, buf->head, p, buf->len);
> +	else if (buf->page_len != 0)
> +		xdr_set_page_base(xdr, 0, buf->len);
>  }
>  EXPORT_SYMBOL_GPL(xdr_init_decode);
>  
> -/**
> - * xdr_inline_peek - Allow read-ahead in the XDR data stream
> - * @xdr: pointer to xdr_stream struct
> - * @nbytes: number of bytes of data to decode
> - *
> - * Check if the input buffer is long enough to enable us to decode
> - * 'nbytes' more bytes of data starting at the current position.
> - * If so return the current pointer without updating the current
> - * pointer position.
> - */
> -__be32 * xdr_inline_peek(struct xdr_stream *xdr, size_t nbytes)
> +static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
>  {
>  	__be32 *p = xdr->p;
>  	__be32 *q = p + XDR_QUADLEN(nbytes);
>  
>  	if (unlikely(q > xdr->end || q < p))
>  		return NULL;
> +	xdr->p = q;
>  	return p;
>  }
> -EXPORT_SYMBOL_GPL(xdr_inline_peek);
>  
>  /**
> - * xdr_inline_decode - Retrieve non-page XDR data to decode
> + * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data.
> + * @xdr: pointer to xdr_stream struct
> + * @buf: pointer to an empty buffer
> + * @buflen: size of 'buf'
> + *
> + * The scratch buffer is used when decoding from an array of pages.
> + * If an xdr_inline_decode() call spans across page boundaries, then
> + * we copy the data into the scratch buffer in order to allow linear
> + * access.
> + */
> +void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen)
> +{
> +	xdr->scratch.iov_base = buf;
> +	xdr->scratch.iov_len = buflen;
> +}
> +EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer);
> +
> +static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes)
> +{
> +	__be32 *p;
> +	void *cpdest = xdr->scratch.iov_base;
> +	size_t cplen = (char *)xdr->end - (char *)xdr->p;
> +
> +	if (nbytes > xdr->scratch.iov_len)
> +		return NULL;
> +	memcpy(cpdest, xdr->p, cplen);
> +	cpdest += cplen;
> +	nbytes -= cplen;
> +	if (!xdr_set_next_buffer(xdr))
> +		return NULL;
> +	p = __xdr_inline_decode(xdr, nbytes);
> +	if (p == NULL)
> +		return NULL;
> +	memcpy(cpdest, p, nbytes);
> +	return xdr->scratch.iov_base;
> +}
> +
> +/**
> + * xdr_inline_decode - Retrieve XDR data to decode
>   * @xdr: pointer to xdr_stream struct
>   * @nbytes: number of bytes of data to decode
>   *
> @@ -605,13 +699,16 @@ EXPORT_SYMBOL_GPL(xdr_inline_peek);
>   */
>  __be32 * xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
>  {
> -	__be32 *p = xdr->p;
> -	__be32 *q = p + XDR_QUADLEN(nbytes);
> +	__be32 *p;
>  
> -	if (unlikely(q > xdr->end || q < p))
> +	if (nbytes == 0)
> +		return xdr->p;
> +	if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr))
>  		return NULL;
> -	xdr->p = q;
> -	return p;
> +	p = __xdr_inline_decode(xdr, nbytes);
> +	if (p != NULL)
> +		return p;
> +	return xdr_copy_to_scratch(xdr, nbytes);
>  }
>  EXPORT_SYMBOL_GPL(xdr_inline_decode);
>  
> @@ -671,16 +768,12 @@ EXPORT_SYMBOL_GPL(xdr_read_pages);
>   */
>  void xdr_enter_page(struct xdr_stream *xdr, unsigned int len)
>  {
> -	char * kaddr = page_address(xdr->buf->pages[0]);
>  	xdr_read_pages(xdr, len);
>  	/*
>  	 * Position current pointer at beginning of tail, and
>  	 * set remaining message length.
>  	 */
> -	if (len > PAGE_CACHE_SIZE - xdr->buf->page_base)
> -		len = PAGE_CACHE_SIZE - xdr->buf->page_base;
> -	xdr->p = (__be32 *)(kaddr + xdr->buf->page_base);
> -	xdr->end = (__be32 *)((char *)xdr->p + len);
> +	xdr_set_page_base(xdr, 0, len);
>  }
>  EXPORT_SYMBOL_GPL(xdr_enter_page);
>  
> -- 
> 1.7.3.4
> 
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust at netapp.com
> www.netapp.com
> 
> 

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 10:50                                       ` Uwe Kleine-König
@ 2011-01-10 16:25                                         ` Trond Myklebust
  2011-01-10 17:08                                           ` Marc Kleine-Budde
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-10 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-10 at 11:50 +0100, Uwe Kleine-K?nig wrote: 
> Hi Trond,
> 
> On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
> > ----------------------------------------------------------------------------------- 
> It would be great if you could add a "8<" in the line above next time.
> Then git-am -c does the right thing (at least I think so).

Sorry. I wasn't aware of that particular idiom. So something like

8<------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: .....


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 16:25                                         ` Trond Myklebust
@ 2011-01-10 17:08                                           ` Marc Kleine-Budde
  2011-01-10 17:20                                             ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-10 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/10/2011 05:25 PM, Trond Myklebust wrote:
> On Mon, 2011-01-10 at 11:50 +0100, Uwe Kleine-K?nig wrote: 
>> Hi Trond,
>>
>> On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
>>> ----------------------------------------------------------------------------------- 
>> It would be great if you could add a "8<" in the line above next time.
>> Then git-am -c does the right thing (at least I think so).
> 
> Sorry. I wasn't aware of that particular idiom. So something like
> 
> 8<------------------------------------------------------------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Subject: .....

Yes.

>From "man git-mailinfo":
"A line that mainly consists of scissors (either ">8" or "8<") and
perforation (dash "-")"

BTW:
Is this patch a candidate for stable?

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110110/db4d5898/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 17:08                                           ` Marc Kleine-Budde
@ 2011-01-10 17:20                                             ` Trond Myklebust
  2011-01-10 17:26                                               ` Marc Kleine-Budde
  2011-01-10 19:25                                               ` Uwe Kleine-König
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-10 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-10 at 18:08 +0100, Marc Kleine-Budde wrote: 
> On 01/10/2011 05:25 PM, Trond Myklebust wrote:
> > On Mon, 2011-01-10 at 11:50 +0100, Uwe Kleine-K?nig wrote: 
> >> Hi Trond,
> >>
> >> On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
> >>> ----------------------------------------------------------------------------------- 
> >> It would be great if you could add a "8<" in the line above next time.
> >> Then git-am -c does the right thing (at least I think so).
> > 
> > Sorry. I wasn't aware of that particular idiom. So something like
> > 
> > 8<------------------------------------------------------------
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Subject: .....
> 
> Yes.
> 
> From "man git-mailinfo":
> "A line that mainly consists of scissors (either ">8" or "8<") and
> perforation (dash "-")"
> 
> BTW:
> Is this patch a candidate for stable?

Yes. I'm planning on sending it to the stable list after Linus merges it
into mainline.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 17:20                                             ` Trond Myklebust
@ 2011-01-10 17:26                                               ` Marc Kleine-Budde
  2011-01-10 19:25                                               ` Uwe Kleine-König
  1 sibling, 0 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-10 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/10/2011 06:20 PM, Trond Myklebust wrote:
>> Is this patch a candidate for stable?
> 
> Yes. I'm planning on sending it to the stable list after Linus merges it
> into mainline.

Fine!

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110110/110762ce/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 17:20                                             ` Trond Myklebust
  2011-01-10 17:26                                               ` Marc Kleine-Budde
@ 2011-01-10 19:25                                               ` Uwe Kleine-König
  2011-01-10 19:29                                                 ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-10 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Trond,

On Mon, Jan 10, 2011 at 12:20:35PM -0500, Trond Myklebust wrote:
> On Mon, 2011-01-10 at 18:08 +0100, Marc Kleine-Budde wrote: 
> > On 01/10/2011 05:25 PM, Trond Myklebust wrote:
> > > On Mon, 2011-01-10 at 11:50 +0100, Uwe Kleine-K?nig wrote: 
> > >> Hi Trond,
> > >>
> > >> On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
> > >>> ----------------------------------------------------------------------------------- 
> > >> It would be great if you could add a "8<" in the line above next time.
> > >> Then git-am -c does the right thing (at least I think so).
> > > 
> > > Sorry. I wasn't aware of that particular idiom. So something like
> > > 
> > > 8<------------------------------------------------------------
> > > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > > Subject: .....
> > 
> > Yes.
> > 
> > From "man git-mailinfo":
> > "A line that mainly consists of scissors (either ">8" or "8<") and
> > perforation (dash "-")"
> > 
> > BTW:
> > Is this patch a candidate for stable?
> 
> Yes. I'm planning on sending it to the stable list after Linus merges it
> into mainline.
So there is another idiom for you:  just put

	Cc: stable at kernel.org

in the S-o-b block and Greg will pick it off "automatically".  (Just in
case you don't know, and if you do, maybe someone else learned
something.)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 19:25                                               ` Uwe Kleine-König
@ 2011-01-10 19:29                                                 ` Trond Myklebust
  2011-01-10 19:31                                                   ` James Bottomley
  2011-01-10 19:34                                                   ` Linus Torvalds
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-10 19:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-10 at 20:25 +0100, Uwe Kleine-K?nig wrote: 
> Hello Trond,
> 
> On Mon, Jan 10, 2011 at 12:20:35PM -0500, Trond Myklebust wrote:
> > On Mon, 2011-01-10 at 18:08 +0100, Marc Kleine-Budde wrote: 
> > > On 01/10/2011 05:25 PM, Trond Myklebust wrote:
> > > > On Mon, 2011-01-10 at 11:50 +0100, Uwe Kleine-K?nig wrote: 
> > > >> Hi Trond,
> > > >>
> > > >> On Sat, Jan 08, 2011 at 06:15:51PM -0500, Trond Myklebust wrote:
> > > >>> ----------------------------------------------------------------------------------- 
> > > >> It would be great if you could add a "8<" in the line above next time.
> > > >> Then git-am -c does the right thing (at least I think so).
> > > > 
> > > > Sorry. I wasn't aware of that particular idiom. So something like
> > > > 
> > > > 8<------------------------------------------------------------
> > > > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > > > Subject: .....
> > > 
> > > Yes.
> > > 
> > > From "man git-mailinfo":
> > > "A line that mainly consists of scissors (either ">8" or "8<") and
> > > perforation (dash "-")"
> > > 
> > > BTW:
> > > Is this patch a candidate for stable?
> > 
> > Yes. I'm planning on sending it to the stable list after Linus merges it
> > into mainline.
> So there is another idiom for you:  just put
> 
> 	Cc: stable at kernel.org
> 
> in the S-o-b block and Greg will pick it off "automatically".  (Just in
> case you don't know, and if you do, maybe someone else learned
> something.)

I usually do this, but there is a slight problem with that approach:
Greg gets to do all the work of figuring out to which stable kernels
this particular patch applies. In this case, since we're only talking
about the 2.6.37 kernel, I prefer to use the mailing lists.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 19:29                                                 ` Trond Myklebust
@ 2011-01-10 19:31                                                   ` James Bottomley
  2011-01-10 19:34                                                   ` Linus Torvalds
  1 sibling, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-10 19:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-10 at 14:29 -0500, Trond Myklebust wrote:
> I usually do this, but there is a slight problem with that approach:
> Greg gets to do all the work of figuring out to which stable kernels
> this particular patch applies. In this case, since we're only talking
> about the 2.6.37 kernel, I prefer to use the mailing lists.

So the non-standard, but accepted way of doing this is

Cc: Stable Tree <stable@kernel.org>	[2.6.37]

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 19:29                                                 ` Trond Myklebust
  2011-01-10 19:31                                                   ` James Bottomley
@ 2011-01-10 19:34                                                   ` Linus Torvalds
  2011-01-10 20:15                                                     ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-10 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

2011/1/10 Trond Myklebust <Trond.Myklebust@netapp.com>:
>
> I usually do this, but there is a slight problem with that approach:
> Greg gets to do all the work of figuring out to which stable kernels
> this particular patch applies. In this case, since we're only talking
> about the 2.6.37 kernel, I prefer to use the mailing lists.

Just do

  Cc: stable at kernel.org  [2.6.37]

or similar. It's quite common.

So EVEN IF you want to email people around about the patch separately,
do add the "Cc: stable" marker. It's worthwhile information about the
patch for everybody involved.

                     Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-10 19:34                                                   ` Linus Torvalds
@ 2011-01-10 20:15                                                     ` Trond Myklebust
  0 siblings, 0 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-10 20:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-10 at 11:34 -0800, Linus Torvalds wrote: 
> 2011/1/10 Trond Myklebust <Trond.Myklebust@netapp.com>:
> >
> > I usually do this, but there is a slight problem with that approach:
> > Greg gets to do all the work of figuring out to which stable kernels
> > this particular patch applies. In this case, since we're only talking
> > about the 2.6.37 kernel, I prefer to use the mailing lists.
> 
> Just do
> 
>   Cc: stable at kernel.org  [2.6.37]
> 
> or similar. It's quite common.
> 
> So EVEN IF you want to email people around about the patch separately,
> do add the "Cc: stable" marker. It's worthwhile information about the
> patch for everybody involved.

OK. Patch description amended and recommitted in git. Thanks to all for
the tips...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-08 23:15                                     ` Trond Myklebust
  2011-01-10 10:50                                       ` Uwe Kleine-König
@ 2011-01-10 12:44                                       ` Marc Kleine-Budde
  1 sibling, 0 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-10 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/09/2011 12:15 AM, Trond Myklebust wrote:
> From 8b2e60cef5c65eef41ab61286f62dec6bfb1ac27 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Sat, 8 Jan 2011 17:45:38 -0500
> Subject: [PATCH] NFS: Don't use vm_map_ram() in readdir
> 
> vm_map_ram() is not available on NOMMU platforms, and causes trouble
> on incoherrent architectures such as ARM when we access the page data
> through both the direct and the virtual mapping.
> 
> The alternative is to use the direct mapping to access page data
> for the case when we are not crossing a page boundary, but to copy
> the data into a linear scratch buffer when we are accessing data
> that spans page boundaries.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>

..on AT91 (armv5)

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110110/f1fdce67/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-07 19:02                               ` Russell King - ARM Linux
  2011-01-07 19:11                                 ` James Bottomley
@ 2011-01-07 19:13                                 ` Trond Myklebust
  1 sibling, 0 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-07 19:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-01-07 at 19:02 +0000, Russell King - ARM Linux wrote: 
> On Fri, Jan 07, 2011 at 01:53:25PM -0500, Trond Myklebust wrote:
> > I'd still like to keep the existing code for those architectures that
> > don't have problems, since that allows us to send 32k READDIR requests
> > instead of being limited to 4k. For large directories, that is a clear
> > win.
> > For the NOMMU case we will just go back to using a single page for
> > storage (and 4k READDIR requests only). Should I just do the same for
> > architectures like ARM and PARISC?
> 
> I think you said that readdir reads via the vmalloc mapping of the
> group of pages, but XDR writes to the individual pages.
> 
> As I understand NFS, you receive a packet, you then have to use XDR
> to unpack the data, which you presumably write into the set of
> struct page *'s using kmap?

No. The socket or RDMA layers place the data directly into the struct
pages. We then unpack them in the XDR layer using the vmalloc mapping
and place the resulting processed readdir data into the page cache.

> Isn't a solution to have XDR write directly into the vmalloc mapping
> rather than using struct page * and kmap?

Unfortunately that isn't possible. :-(

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-07 18:53                             ` Trond Myklebust
  2011-01-07 19:02                               ` Russell King - ARM Linux
@ 2011-01-07 19:05                               ` James Bottomley
  1 sibling, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-07 19:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2011-01-07 at 13:53 -0500, Trond Myklebust wrote:
> There is already code in the SUNRPC layer that calls flush_dcache_page()
> after writing (although as Russell pointed out earlier, that is
> apparently a no-op for non-page cache pages such as these).

Actually (and possibly fortunately) none of our flush_dcache_page()
implementations do this (check for an actual non page cache page and nop
if they find one).  Although, they may according to the docs which say
that flush_dcache_page() is only called on page cache pages.

But it's definitely using the API outside its documented scope.  We have
lots of places in the VFS where we don't call flush_dcache_page() even
after altering a kernel page (even in the page cache) if we know the
page will never be mapped to userspace.  The assumption here is that the
kernel never sets up non-user aliases of these pages, so not doing the
flushing is an optimisation since we only access them through the kernel
address space.  Of course, setting up vmap areas of these pages within
the kernel violates this assumption.

> > This is why you really really really generally don't want to have
> > aliasing. Purely virtual caches are pure crap. Really.
> 
> Well, it looks as if NOMMU is giving us problems due to the lack of a
> vm_map_ram() (see https://bugzilla.kernel.org/show_bug.cgi?id=26262).
> 
> I'd still like to keep the existing code for those architectures that
> don't have problems, since that allows us to send 32k READDIR requests
> instead of being limited to 4k. For large directories, that is a clear
> win.
> For the NOMMU case we will just go back to using a single page for
> storage (and 4k READDIR requests only). Should I just do the same for
> architectures like ARM and PARISC?

Well, that would include any VI architecture (like SPARC and others) as
well.  However, I think we can just make the
invalidate_kernel_vmap_range() work.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:40                       ` James Bottomley
  2011-01-06 17:47                         ` Trond Myklebust
@ 2011-01-06 18:05                         ` Russell King - ARM Linux
  2011-01-06 18:14                           ` James Bottomley
  2011-01-06 20:19                         ` John Stoffel
  2 siblings, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-06 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 06, 2011 at 11:40:13AM -0600, James Bottomley wrote:
> On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
> > Can you explain how the code works? it looks to me like you read the xdr
> > stuff through the vmap region then write it out directly to the pages? 
> 
> OK, I think I see how this is supposed to work:  It's a sequential loop
> of reading in via the pages (i.e. through the kernel mapping) and then
> updating those pages via the vmap.  In which case, I think this patch is
> what you need.
> 
> The theory of operation is that the readdir on pages actually uses the
> network DMA operations to perform, so when it's finished, the underlying

What network DMA operations - what if your NIC doesn't do DMA because
it's an SMSC device?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 18:05                         ` Russell King - ARM Linux
@ 2011-01-06 18:14                           ` James Bottomley
  2011-01-06 18:25                             ` James Bottomley
  0 siblings, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-06 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 18:05 +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 06, 2011 at 11:40:13AM -0600, James Bottomley wrote:
> > On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
> > > Can you explain how the code works? it looks to me like you read the xdr
> > > stuff through the vmap region then write it out directly to the pages? 
> > 
> > OK, I think I see how this is supposed to work:  It's a sequential loop
> > of reading in via the pages (i.e. through the kernel mapping) and then
> > updating those pages via the vmap.  In which case, I think this patch is
> > what you need.
> > 
> > The theory of operation is that the readdir on pages actually uses the
> > network DMA operations to perform, so when it's finished, the underlying
> 
> What network DMA operations - what if your NIC doesn't do DMA because
> it's an SMSC device?

So this is the danger area ... we might be caught by our own flushing
tricks.  I can't test this on parisc since all my network drivers use
DMA (which automatically coheres the kernel mapping by
flush/invalidate).

What should happen is that the kernel mapping pages go through the
->readdir() path.  Any return from this has to be ready to map the pages
back to user space, so the kernel alias has to be flushed to make the
underlying page up to date.

The exception is pages we haven't yet mapped to userspace.  Here we set
the PG_dcache_dirty bit (sparc trick) but don't flush the page, since we
expect the addition of a userspace mapping will detect this case and do
the flush and clear the bit before the mapping goes live.  I assume
you're thinking that because this page is allocated and freed internally
to NFS, it never gets a userspace mapping and therefore, we can return
from ->readdir() with a dirty kernel cache (and the corresponding flag
set)?  I think that is a possible hypothesis in certain cases.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 18:14                           ` James Bottomley
@ 2011-01-06 18:25                             ` James Bottomley
  2011-01-06 21:07                               ` James Bottomley
  0 siblings, 1 reply; 79+ messages in thread
From: James Bottomley @ 2011-01-06 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 12:14 -0600, James Bottomley wrote:
> On Thu, 2011-01-06 at 18:05 +0000, Russell King - ARM Linux wrote:
> > What network DMA operations - what if your NIC doesn't do DMA because
> > it's an SMSC device?
> 
> So this is the danger area ... we might be caught by our own flushing
> tricks.  I can't test this on parisc since all my network drivers use
> DMA (which automatically coheres the kernel mapping by
> flush/invalidate).
> 
> What should happen is that the kernel mapping pages go through the
> ->readdir() path.  Any return from this has to be ready to map the pages
> back to user space, so the kernel alias has to be flushed to make the
> underlying page up to date.
> 
> The exception is pages we haven't yet mapped to userspace.  Here we set
> the PG_dcache_dirty bit (sparc trick) but don't flush the page, since we
> expect the addition of a userspace mapping will detect this case and do
> the flush and clear the bit before the mapping goes live.  I assume
> you're thinking that because this page is allocated and freed internally
> to NFS, it never gets a userspace mapping and therefore, we can return
> from ->readdir() with a dirty kernel cache (and the corresponding flag
> set)?  I think that is a possible hypothesis in certain cases.

OK, so thinking about this, it seems that the only danger is actually
what NFS is doing: reading cache pages via a vmap.  In that case, since
the requirement is to invalidate the vmap range to prepare for read, we
could have invalidate_kernel_vmap_range loop over the underlying pages
and flush them through the kernel alias if the architecture specific
flag indicates their contents might be dirty.

The loop adds expense that is probably largely unnecessary to
invalidate_kernel_vmap_range() but the alternative is adding to the API
proliferation with something that only flushes the kernel pages if the
arch specific flag says they're dirty.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 18:25                             ` James Bottomley
@ 2011-01-06 21:07                               ` James Bottomley
  0 siblings, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-06 21:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-06 at 12:25 -0600, James Bottomley wrote:
> OK, so thinking about this, it seems that the only danger is actually
> what NFS is doing: reading cache pages via a vmap.  In that case, since
> the requirement is to invalidate the vmap range to prepare for read, we
> could have invalidate_kernel_vmap_range loop over the underlying pages
> and flush them through the kernel alias if the architecture specific
> flag indicates their contents might be dirty.
> 
> The loop adds expense that is probably largely unnecessary to
> invalidate_kernel_vmap_range() but the alternative is adding to the API
> proliferation with something that only flushes the kernel pages if the
> arch specific flag says they're dirty.

This is what I think the arm patch would look like (example only: I
can't compile it).  Is something like this too expensive? the loop can't
be optimised away because of the need to check the pages (and
vmalloc_to_page is a three level page table lookup).

James

---

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 3acd8fa..34469ca 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -414,8 +414,17 @@ static inline void flush_kernel_vmap_range(void *addr, int size)
 }
 static inline void invalidate_kernel_vmap_range(void *addr, int size)
 {
-	if ((cache_is_vivt() || cache_is_vipt_aliasing()))
-	  __cpuc_flush_dcache_area(addr, (size_t)size);
+	if ((cache_is_vivt() || cache_is_vipt_aliasing())) {
+		void *cursor = addr;
+
+		for ( ; cursor < addr + size; cursor += PAGE_SIZE) {
+			struct page *page = vmalloc_to_page(cursor);
+
+			if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+				__flush_dcache_page(page_mapping(page), page);
+		}
+		__cpuc_flush_dcache_area(addr, (size_t)size);
+	}
 }
 
 #define ARCH_HAS_FLUSH_ANON_PAGE

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-06 17:40                       ` James Bottomley
  2011-01-06 17:47                         ` Trond Myklebust
  2011-01-06 18:05                         ` Russell King - ARM Linux
@ 2011-01-06 20:19                         ` John Stoffel
  2 siblings, 0 replies; 79+ messages in thread
From: John Stoffel @ 2011-01-06 20:19 UTC (permalink / raw)
  To: linux-arm-kernel

>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

James> On Wed, 2011-01-05 at 23:28 +0000, James Bottomley wrote:
>> Can you explain how the code works? it looks to me like you read the xdr
>> stuff through the vmap region then write it out directly to the pages? 

James> OK, I think I see how this is supposed to work: It's a
James> sequential loop of reading in via the pages (i.e. through the
James> kernel mapping) and then updating those pages via the vmap.  In
James> which case, I think this patch is what you need.

James> The theory of operation is that the readdir on pages actually
James> uses the network DMA operations to perform, so when it's
James> finished, the underlying page is up to date.  After this you
James> invalidate the vmap range, so we have no cache lines above it
James> (so it picks up the values from the uptodate page).  Finally,
James> after the operation on the vmap region has finished, you flush
James> it so that any updated contents go back to the pages themselves
James> before the next iteration begins.

You need to re-spin this patch to include the above description into
the magic steps your taking here, or at least document it more clearly
somewhere why you need to make these funky steps.  

John

James> James

James> ---

James> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
James> index 996dd89..bde1911 100644
James> --- a/fs/nfs/dir.c
James> +++ b/fs/nfs/dir.c
James> @@ -587,12 +587,16 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
James>  		if (status < 0)
James>  			break;
James>  		pglen = status;
James> +
James> +		invalidate_kernel_vmap_range(pages_ptr, pglen);
James> +
James>  		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
James>  		if (status < 0) {
James>  			if (status == -ENOSPC)
James>  				status = 0;
James>  			break;
James>  		}
James> +		flush_kernel_vmap_range(pages_ptr, pglen);
James>  	} while (array->eof_index < 0);

James>  	nfs_readdir_free_large_page(pages_ptr, pages, array_size);

James> --
James> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
James> the body of a message to majordomo at vger.kernel.org
James> More majordomo info at  http://vger.kernel.org/majordomo-info.html
James> Please read the FAQ at  http://www.tux.org/lkml/

-- 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 23:06                   ` Trond Myklebust
  2011-01-05 23:28                     ` James Bottomley
@ 2011-01-05 23:28                     ` Linus Torvalds
  2011-01-05 23:59                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2011-01-05 23:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 5, 2011 at 3:06 PM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
>
> Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> which takes care of invalidating the cache prior to a virtual address
> read.
>
> My question was specifically about the write through the regular kernel
> mapping: according to Russell and my reading of the cachetlb.txt
> documentation, flush_dcache_page() is only guaranteed to have an effect
> on page cache pages.

I don't think that should ever matter. It's not like the hardware can
know whether it's a dcache page or not.

And if the sw implementation cares, it's doing something really odd.
But who knows - there's a lot of crap out there, and people sometimes
do really odd things to work around the brokenness of a VIVT cache
with aliases.

                       Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 23:28                     ` Linus Torvalds
@ 2011-01-05 23:59                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 23:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 03:28:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 3:06 PM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> >
> > Yes. The fix I sent out was a call to invalidate_kernel_vmap_range(),
> > which takes care of invalidating the cache prior to a virtual address
> > read.
> >
> > My question was specifically about the write through the regular kernel
> > mapping: according to Russell and my reading of the cachetlb.txt
> > documentation, flush_dcache_page() is only guaranteed to have an effect
> > on page cache pages.
> 
> I don't think that should ever matter. It's not like the hardware can
> know whether it's a dcache page or not.
> 
> And if the sw implementation cares, it's doing something really odd.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 20:48         ` Linus Torvalds
  2011-01-05 21:04           ` Russell King - ARM Linux
@ 2011-01-05 21:16           ` James Bottomley
  1 sibling, 0 replies; 79+ messages in thread
From: James Bottomley @ 2011-01-05 21:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 12:48 -0800, Linus Torvalds wrote:
> On Wed, Jan 5, 2011 at 12:33 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> > well, that depends.  For us on parisc, kmap of a user page in !HIGHMEM
> > sets up an inequivalent aliase still ... because the cache colour of the
> > user and kernel virtual addresses are different.  Depending on the
> > return path to userspace, we still usually have to flush to get the user
> > to see the changes the kernel has made.
> 
> Umm. Again, that has nothing to do with kmap().
> 
> This time it's about the user space mapping.
> 
> Repeat after me: even without the kmap(), the kernel access to that
> mapping would have caused cache aliases.
> 
> See? Once more, the kmap() is entirely innocent. You can have a
> non-highmem mapping that you never use kmap for, and that you map into
> user space, and you'd see exactly the same aliases. Notice? Look ma,
> no kmap().

Yes, I understand that (we have no highmem on parisc, so kmap is a nop).
The problem (at least as I see it) is that once something within the
kernel (well, OK, mostly within drivers) touches a user page via its
kernel mapping, the flush often gets forgotten (mainly because it always
works on x86). What I was thinking about is that every time the kernel
touches a user space page, it has to be within a kmap/kunmap pair
(because the page might be highmem) ... so it's possible to make
kmap/kunmap do the flushing for this case so the driver writer can't
ever forget it.

I think the problem case is only really touching scatter/gather elements
outside of the DMA API (i.e. the driver pio case), so this may be
overkill.  Russell also pointed out that a lot of the PIO iterators do
excessive kmap_atomic/kunmap_atomic on the same page, so adding a flush
could damage performance to the point where the flash root devices on
arm might not work.  Plus the pio iterators already contain the
appropriate flush, so perhaps just using them in every case fixes the
problem.

> So clearly kmap() is not the issue. The issue continues to be a
> totally separate virtual mapping. Whether it's a user mapping or
> vm_map_ram() is obviously immaterial - as far as the CPU is concerned,
> there is no difference between the two (apart from the trivial
> differences of virtual location and permissions).
> 
> (You can also force the problem with vmalloc() an then following the
> kernel page tables, but I hope nobody does that any more. I suspect
> I'm wrong, though, there's probably code that mixes vmalloc and
> physical page accesses in various drivers)

Yes, unfortunately, we have seen this quite a bit; mainly to get large
buffers.  Its not just confined to drivers:  xfs used to fail on both
arm and parisc because it used a vmalloc region for its log buffer which
it then had to do I/O on.

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

[parent not found: <AANLkTi=-dNeeDjcSoznKtwcaNyw1mMXSqepFY89R2i+2@mail.gmail.com>]

* still nfs problems [Was: Linux 2.6.37-rc8]
       [not found] <AANLkTi=-dNeeDjcSoznKtwcaNyw1mMXSqepFY89R2i+2@mail.gmail.com>
@ 2010-12-30 17:14 ` Uwe Kleine-König
  2010-12-30 17:57   ` Linus Torvalds
  2010-12-30 17:59   ` Trond Myklebust
  0 siblings, 2 replies; 79+ messages in thread
From: Uwe Kleine-König @ 2010-12-30 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

I wonder if the nfs-stuff is considered to be solved, because I still
see strange things.

During boot my machine sometimes (approx one out of two times) hangs with
the output pasted below on Sysctl-l.  The irq 

I'm not 100% sure it's related, but at least it seems to hang in
nfs_readdir.  (When the serial irq happend that triggered the sysrq the
program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.)

This is on 2.6.37-rc8 plus some patches for machine support on an ARM
machine.

Best regards
Uwe

[ 2700.100000] SysRq : Show State
[ 2700.100000]   task                PC stack   pid father
[ 2700.100000] init          S c0285d80     0     1      0 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
[ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
[ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
[ 2700.100000]  r8:c0034088 r7:00000072 r6:00000001 r5:0000001b r4:0140b228
[ 2700.100000] kthreadd      S c0285d80     0     2      0 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006a30c>] (kthreadd+0x70/0xfc)
[ 2700.100000] [<c006a29c>] (kthreadd+0x0/0xfc) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000] ksoftirqd/0   S c0285d80     0     3      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c0052714>] (run_ksoftirqd+0x5c/0x110)
[ 2700.100000] [<c00526b8>] (run_ksoftirqd+0x0/0x110) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000]  r8:00000000 r7:c00526b8 r6:00000000 r5:c7843f1c r4:c7859fac
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
[ 2700.100000] kworker/0:0   S c0285d80     0     4      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
[ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
[ 2700.100000] kworker/u:0   S c0285d80     0     5      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
[ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
[ 2700.100000] watchdog/0    S c0285d80     0     6      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c008b418>] (watchdog+0xc0/0x110)
[ 2700.100000] [<c008b358>] (watchdog+0x0/0x110) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000]  r6:00000000 r5:c7843efc r4:c785ffac
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843efc
[ 2700.100000] khelper       S c0285d80     0     7      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
[ 2700.100000] sync_supers   S c0285d80     0     8      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00cd114>] (bdi_sync_supers+0x38/0x50)
[ 2700.100000] [<c00cd0dc>] (bdi_sync_supers+0x0/0x50) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000]  r5:c7843f2c r4:c7895fac
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f2c
[ 2700.100000] bdi-default   S c0285d80     0     9      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c02864b4>] (schedule_timeout+0x22c/0x27c)
[ 2700.100000] [<c0286288>] (schedule_timeout+0x0/0x27c) from [<c00ce014>] (bdi_forker_thread+0x3a8/0x41c)
[ 2700.100000]  r8:c0363f80 r7:00000000 r6:00000000 r5:c03641e8 r4:00000000
[ 2700.100000] [<c00cdc6c>] (bdi_forker_thread+0x0/0x41c) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843efc
[ 2700.100000] kintegrityd   S c0285d80     0    10      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843e9c
[ 2700.100000] kblockd       S c0285d80     0    11      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ec4
[ 2700.100000] rpciod        S c0285d80     0    12      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ebc
[ 2700.100000] kworker/0:1   S c0285d80     0    13      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
[ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c785be94
[ 2700.100000] khungtaskd    S c0285d80     0    14      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c02864b4>] (schedule_timeout+0x22c/0x27c)
[ 2700.100000] [<c0286288>] (schedule_timeout+0x0/0x27c) from [<c0286584>] (schedule_timeout_interruptible+0x28/0x2c)
[ 2700.100000]  r8:00000078 r7:00007fe9 r6:000003e9 r5:c034eef0 r4:00000064
[ 2700.100000] [<c028655c>] (schedule_timeout_interruptible+0x0/0x2c) from [<c008ada8>] (watchdog+0x54/0x2e8)
[ 2700.100000] [<c008ad54>] (watchdog+0x0/0x2e8) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f2c
[ 2700.100000] kswapd0       S c0285d80     0    15      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00c5ea4>] (kswapd+0x210/0x74c)
[ 2700.100000] [<c00c5c94>] (kswapd+0x0/0x74c) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
[ 2700.100000] fsnotify_mark S c0285d80     0    16      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c011f884>] (fsnotify_mark_destroy+0x11c/0x144)
[ 2700.100000] [<c011f768>] (fsnotify_mark_destroy+0x0/0x144) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f34
[ 2700.100000] aio           S c0285d80     0    17      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
[ 2700.100000] nfsiod        S c0285d80     0    18      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
[ 2700.100000] crypto        S c0285d80     0    19      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
[ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ee4
[ 2700.100000] kworker/u:1   S c0285d80     0    24      2 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
[ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
[ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
[ 2700.100000]  r7:00000013 r6:c004f4d8 r5:c006a208 r4:c785de94
[ 2700.100000] rcS           S c0285d80     0    27      1 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
[ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
[ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
[ 2700.100000]  r8:c0034088 r7:00000072 r6:ffffffff r5:bee7880c r4:00000000
[ 2700.100000] run-parts     S c0285d80     0    35     27 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
[ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
[ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
[ 2700.100000]  r8:c0034088 r7:00000072 r6:00000024 r5:bef7dcc4 r4:00000000
[ 2700.100000] S00splashutil R running      0    36     35 0x00000000
[ 2700.100000] Backtrace: 
[ 2700.100000] [<c0037b54>] (dump_backtrace+0x0/0x110) from [<c0037c80>] (show_stack+0x1c/0x20)
[ 2700.100000]  r7:c79cfd64 r6:00000000 r5:c7954600 r4:00000000
[ 2700.100000] [<c0037c64>] (show_stack+0x0/0x20) from [<c0046b78>] (sched_show_task+0xb0/0xcc)
[ 2700.100000] [<c0046ac8>] (sched_show_task+0x0/0xcc) from [<c0046bf0>] (show_state_filter+0x5c/0xc8)
[ 2700.100000]  r5:c7954600 r4:c7954600
[ 2700.100000] [<c0046b94>] (show_state_filter+0x0/0xc8) from [<c01c5c40>] (sysrq_handle_showstate+0x18/0x1c)
[ 2700.100000]  r8:20000093 r7:00000007 r6:00000001 r5:00000074 r4:c036ec5c
[ 2700.100000] [<c01c5c28>] (sysrq_handle_showstate+0x0/0x1c) from [<c01c6040>] (__handle_sysrq+0xe0/0x190)
[ 2700.100000] [<c01c5f60>] (__handle_sysrq+0x0/0x190) from [<c01c62d8>] (handle_sysrq+0x38/0x44)
[ 2700.100000]  r8:c7999000 r7:00000100 r6:c7973640 r5:00010074 r4:c7864300
[ 2700.100000] [<c01c62a0>] (handle_sysrq+0x0/0x44) from [<c01da100>] (pl011_int+0x18c/0x5a4)
[ 2700.100000] [<c01d9f74>] (pl011_int+0x0/0x5a4) from [<c008b8b0>] (handle_IRQ_event+0x7c/0x1a8)
[ 2700.100000] [<c008b834>] (handle_IRQ_event+0x0/0x1a8) from [<c008de5c>] (handle_level_irq+0xc8/0x148)
[ 2700.100000] [<c008dd94>] (handle_level_irq+0x0/0x148) from [<c002d080>] (asm_do_IRQ+0x80/0xa4)
[ 2700.100000]  r7:c74a05a4 r6:c74a0508 r5:00000000 r4:0000002f
[ 2700.100000] [<c002d000>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
[ 2700.100000] Exception stack(0xc79cfe88 to 0xc79cfed0)
[ 2700.100000] fe80:                   c74a0508 00000000 c0145d24 c7487e60 00000000 c79cfee8
[ 2700.100000] fea0: c74a0508 c74a05a4 c7487e60 c79cfee8 c74a0508 c79cff4c c79ce000 c79cfed0
[ 2700.100000] fec0: c016ff10 c014601c 60000013 ffffffff
[ 2700.100000]  r5:f5000000 r4:ffffffff
[ 2700.100000] [<c0145f0c>] (nfs_readdir+0x0/0x458) from [<c00fa298>] (vfs_readdir+0x7c/0xb0)
[ 2700.100000] [<c00fa21c>] (vfs_readdir+0x0/0xb0) from [<c00fa3fc>] (sys_getdents+0x70/0xb8)
[ 2700.100000] [<c00fa38c>] (sys_getdents+0x0/0xb8) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
[ 2700.100000]  r7:0000008d r6:00000000 r5:402ed00c r4:402ed020
[ 2700.100000] Sched Debug Version: v0.09, 2.6.37-rc8-00065-g1cd48e3-dirty #35
[ 2700.100000] now at 2701202.749966 msecs
[ 2700.100000]   .jiffies                                 : 240010
[ 2700.100000]   .sysctl_sched_latency                    : 6.000000
[ 2700.100000]   .sysctl_sched_min_granularity            : 0.750000
[ 2700.100000]   .sysctl_sched_wakeup_granularity         : 1.000000
[ 2700.100000]   .sysctl_sched_child_runs_first           : 0
[ 2700.100000]   .sysctl_sched_features                   : 31855
[ 2700.100000]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[ 2700.100000] 
[ 2700.100000] cpu#0
[ 2700.100000]   .nr_running                    : 1
[ 2700.100000]   .load                          : 1024
[ 2700.100000]   .nr_switches                   : 11875
[ 2700.100000]   .nr_load_updates               : 269696
[ 2700.100000]   .nr_uninterruptible            : 0
[ 2700.100000]   .next_balance                  : 0.000000
[ 2700.100000]   .curr->pid                     : 36
[ 2700.100000]   .clock                         : 2700100.000000
[ 2700.100000]   .cpu_load[0]                   : 1024
[ 2700.100000]   .cpu_load[1]                   : 1024
[ 2700.100000]   .cpu_load[2]                   : 1024
[ 2700.100000]   .cpu_load[3]                   : 1024
[ 2700.100000]   .cpu_load[4]                   : 1024
[ 2700.100000] 
[ 2700.100000] cfs_rq[0]:
[ 2700.100000]   .exec_clock                    : 0.000000
[ 2700.100000]   .MIN_vruntime                  : 0.000001
[ 2700.100000]   .min_vruntime                  : 2695651.938408
[ 2700.100000]   .max_vruntime                  : 0.000001
[ 2700.100000]   .spread                        : 0.000000
[ 2700.100000]   .spread0                       : 0.000000
[ 2700.100000]   .nr_running                    : 1
[ 2700.100000]   .load                          : 1024
[ 2700.100000]   .nr_spread_over                : 0
[ 2700.100000] 
[ 2700.100000] rt_rq[0]:
[ 2700.100000]   .rt_nr_running                 : 0
[ 2700.100000]   .rt_throttled                  : 0
[ 2700.100000]   .rt_time                       : 0.000000
[ 2700.100000]   .rt_runtime                    : 950.000000
[ 2700.100000] 
[ 2700.100000] runnable tasks:
[ 2700.100000]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 2700.100000] ----------------------------------------------------------------------------------------------------------
[ 2700.100000] R S00splashutils    36   2695651.938408      5397   120               0               0               0.000000               0.000000               0.000000
[ 2700.100000] 
[ 2700.100000] 
[ 2700.100000] Showing all locks held in the system:
[ 2700.100000] 4 locks held by S00splashutils/36:
[ 2700.100000]  #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<c00fa268>] vfs_readdir+0x4c/0xb0
[ 2700.100000]  #1:  (&port_lock_key){-.-...}, at: [<c01d9f94>] pl011_int+0x20/0x5a4
[ 2700.100000]  #2:  (sysrq_key_table_lock){-.....}, at: [<c01c5f84>] __handle_sysrq+0x24/0x190
[ 2700.100000]  #3:  (tasklist_lock){.?.+..}, at: [<c007c404>] debug_show_all_locks+0x40/0x1a4
[ 2700.100000] 
[ 2700.100000] =============================================
[ 2700.100000] 


-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 17:14 ` Uwe Kleine-König
@ 2010-12-30 17:57   ` Linus Torvalds
  2010-12-30 18:24     ` Trond Myklebust
  2010-12-30 17:59   ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2010-12-30 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

Please cc the poor hapless NFS people too, who probably otherwise
wouldn't see it. And Arnd just in case it might be locking-related.

Trond, any ideas? The sysrq thing does imply that it's stuck in some
busy-loop in fs/nfs/dir.c, and line 647 is get_cache_page(), which in
turn implies that the endless loop is either the loop in
readdir_search_pagecache() _or_ in a caller. In particular, the
EBADCOOKIE case in the caller (nfs_readdir) looks suspicious. What
protects us from endless streams of EBADCOOKIE and a successful
uncached_readdir?

                     Linus

2010/12/30 Uwe Kleine-K?nig <u.kleine-koenig@pengutronix.de>:
> Hello,
>
> I wonder if the nfs-stuff is considered to be solved, because I still
> see strange things.
>
> During boot my machine sometimes (approx one out of two times) hangs with
> the output pasted below on Sysctl-l. ?The irq
>
> I'm not 100% sure it's related, but at least it seems to hang in
> nfs_readdir. ?(When the serial irq happend that triggered the sysrq the
> program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.)
>
> This is on 2.6.37-rc8 plus some patches for machine support on an ARM
> machine.
>
> Best regards
> Uwe
>
> [ 2700.100000] SysRq : Show State
> [ 2700.100000] ? task ? ? ? ? ? ? ? ?PC stack ? pid father
> [ 2700.100000] init ? ? ? ? ?S c0285d80 ? ? 0 ? ? 1 ? ? ?0 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
> [ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
> [ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
> [ 2700.100000] ?r8:c0034088 r7:00000072 r6:00000001 r5:0000001b r4:0140b228
> [ 2700.100000] kthreadd ? ? ?S c0285d80 ? ? 0 ? ? 2 ? ? ?0 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006a30c>] (kthreadd+0x70/0xfc)
> [ 2700.100000] [<c006a29c>] (kthreadd+0x0/0xfc) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ksoftirqd/0 ? S c0285d80 ? ? 0 ? ? 3 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c0052714>] (run_ksoftirqd+0x5c/0x110)
> [ 2700.100000] [<c00526b8>] (run_ksoftirqd+0x0/0x110) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] ?r8:00000000 r7:c00526b8 r6:00000000 r5:c7843f1c r4:c7859fac
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
> [ 2700.100000] kworker/0:0 ? S c0285d80 ? ? 0 ? ? 4 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
> [ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
> [ 2700.100000] kworker/u:0 ? S c0285d80 ? ? 0 ? ? 5 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
> [ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
> [ 2700.100000] watchdog/0 ? ?S c0285d80 ? ? 0 ? ? 6 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c008b418>] (watchdog+0xc0/0x110)
> [ 2700.100000] [<c008b358>] (watchdog+0x0/0x110) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] ?r6:00000000 r5:c7843efc r4:c785ffac
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843efc
> [ 2700.100000] khelper ? ? ? S c0285d80 ? ? 0 ? ? 7 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
> [ 2700.100000] sync_supers ? S c0285d80 ? ? 0 ? ? 8 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00cd114>] (bdi_sync_supers+0x38/0x50)
> [ 2700.100000] [<c00cd0dc>] (bdi_sync_supers+0x0/0x50) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] ?r5:c7843f2c r4:c7895fac
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f2c
> [ 2700.100000] bdi-default ? S c0285d80 ? ? 0 ? ? 9 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c02864b4>] (schedule_timeout+0x22c/0x27c)
> [ 2700.100000] [<c0286288>] (schedule_timeout+0x0/0x27c) from [<c00ce014>] (bdi_forker_thread+0x3a8/0x41c)
> [ 2700.100000] ?r8:c0363f80 r7:00000000 r6:00000000 r5:c03641e8 r4:00000000
> [ 2700.100000] [<c00cdc6c>] (bdi_forker_thread+0x0/0x41c) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843efc
> [ 2700.100000] kintegrityd ? S c0285d80 ? ? 0 ? ?10 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843e9c
> [ 2700.100000] kblockd ? ? ? S c0285d80 ? ? 0 ? ?11 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ec4
> [ 2700.100000] rpciod ? ? ? ?S c0285d80 ? ? 0 ? ?12 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ebc
> [ 2700.100000] kworker/0:1 ? S c0285d80 ? ? 0 ? ?13 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
> [ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c785be94
> [ 2700.100000] khungtaskd ? ?S c0285d80 ? ? 0 ? ?14 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c02864b4>] (schedule_timeout+0x22c/0x27c)
> [ 2700.100000] [<c0286288>] (schedule_timeout+0x0/0x27c) from [<c0286584>] (schedule_timeout_interruptible+0x28/0x2c)
> [ 2700.100000] ?r8:00000078 r7:00007fe9 r6:000003e9 r5:c034eef0 r4:00000064
> [ 2700.100000] [<c028655c>] (schedule_timeout_interruptible+0x0/0x2c) from [<c008ada8>] (watchdog+0x54/0x2e8)
> [ 2700.100000] [<c008ad54>] (watchdog+0x0/0x2e8) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f2c
> [ 2700.100000] kswapd0 ? ? ? S c0285d80 ? ? 0 ? ?15 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00c5ea4>] (kswapd+0x210/0x74c)
> [ 2700.100000] [<c00c5c94>] (kswapd+0x0/0x74c) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f1c
> [ 2700.100000] fsnotify_mark S c0285d80 ? ? 0 ? ?16 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c011f884>] (fsnotify_mark_destroy+0x11c/0x144)
> [ 2700.100000] [<c011f768>] (fsnotify_mark_destroy+0x0/0x144) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843f34
> [ 2700.100000] aio ? ? ? ? ? S c0285d80 ? ? 0 ? ?17 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
> [ 2700.100000] nfsiod ? ? ? ?S c0285d80 ? ? 0 ? ?18 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843edc
> [ 2700.100000] crypto ? ? ? ?S c0285d80 ? ? 0 ? ?19 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c00657d4>] (rescuer_thread+0x1b8/0x1c4)
> [ 2700.100000] [<c006561c>] (rescuer_thread+0x0/0x1c4) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c7843ee4
> [ 2700.100000] kworker/u:1 ? S c0285d80 ? ? 0 ? ?24 ? ? ?2 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c006480c>] (worker_thread+0x41c/0x444)
> [ 2700.100000] [<c00643f0>] (worker_thread+0x0/0x444) from [<c006a294>] (kthread+0x8c/0x94)
> [ 2700.100000] [<c006a208>] (kthread+0x0/0x94) from [<c004f4d8>] (do_exit+0x0/0x658)
> [ 2700.100000] ?r7:00000013 r6:c004f4d8 r5:c006a208 r4:c785de94
> [ 2700.100000] rcS ? ? ? ? ? S c0285d80 ? ? 0 ? ?27 ? ? ?1 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
> [ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
> [ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
> [ 2700.100000] ?r8:c0034088 r7:00000072 r6:ffffffff r5:bee7880c r4:00000000
> [ 2700.100000] run-parts ? ? S c0285d80 ? ? 0 ? ?35 ? ? 27 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c02858c8>] (schedule+0x0/0x534) from [<c004f268>] (do_wait+0x1a4/0x20c)
> [ 2700.100000] [<c004f0c4>] (do_wait+0x0/0x20c) from [<c004f378>] (sys_wait4+0xa8/0xc0)
> [ 2700.100000] [<c004f2d0>] (sys_wait4+0x0/0xc0) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
> [ 2700.100000] ?r8:c0034088 r7:00000072 r6:00000024 r5:bef7dcc4 r4:00000000
> [ 2700.100000] S00splashutil R running ? ? ?0 ? ?36 ? ? 35 0x00000000
> [ 2700.100000] Backtrace:
> [ 2700.100000] [<c0037b54>] (dump_backtrace+0x0/0x110) from [<c0037c80>] (show_stack+0x1c/0x20)
> [ 2700.100000] ?r7:c79cfd64 r6:00000000 r5:c7954600 r4:00000000
> [ 2700.100000] [<c0037c64>] (show_stack+0x0/0x20) from [<c0046b78>] (sched_show_task+0xb0/0xcc)
> [ 2700.100000] [<c0046ac8>] (sched_show_task+0x0/0xcc) from [<c0046bf0>] (show_state_filter+0x5c/0xc8)
> [ 2700.100000] ?r5:c7954600 r4:c7954600
> [ 2700.100000] [<c0046b94>] (show_state_filter+0x0/0xc8) from [<c01c5c40>] (sysrq_handle_showstate+0x18/0x1c)
> [ 2700.100000] ?r8:20000093 r7:00000007 r6:00000001 r5:00000074 r4:c036ec5c
> [ 2700.100000] [<c01c5c28>] (sysrq_handle_showstate+0x0/0x1c) from [<c01c6040>] (__handle_sysrq+0xe0/0x190)
> [ 2700.100000] [<c01c5f60>] (__handle_sysrq+0x0/0x190) from [<c01c62d8>] (handle_sysrq+0x38/0x44)
> [ 2700.100000] ?r8:c7999000 r7:00000100 r6:c7973640 r5:00010074 r4:c7864300
> [ 2700.100000] [<c01c62a0>] (handle_sysrq+0x0/0x44) from [<c01da100>] (pl011_int+0x18c/0x5a4)
> [ 2700.100000] [<c01d9f74>] (pl011_int+0x0/0x5a4) from [<c008b8b0>] (handle_IRQ_event+0x7c/0x1a8)
> [ 2700.100000] [<c008b834>] (handle_IRQ_event+0x0/0x1a8) from [<c008de5c>] (handle_level_irq+0xc8/0x148)
> [ 2700.100000] [<c008dd94>] (handle_level_irq+0x0/0x148) from [<c002d080>] (asm_do_IRQ+0x80/0xa4)
> [ 2700.100000] ?r7:c74a05a4 r6:c74a0508 r5:00000000 r4:0000002f
> [ 2700.100000] [<c002d000>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
> [ 2700.100000] Exception stack(0xc79cfe88 to 0xc79cfed0)
> [ 2700.100000] fe80: ? ? ? ? ? ? ? ? ? c74a0508 00000000 c0145d24 c7487e60 00000000 c79cfee8
> [ 2700.100000] fea0: c74a0508 c74a05a4 c7487e60 c79cfee8 c74a0508 c79cff4c c79ce000 c79cfed0
> [ 2700.100000] fec0: c016ff10 c014601c 60000013 ffffffff
> [ 2700.100000] ?r5:f5000000 r4:ffffffff
> [ 2700.100000] [<c0145f0c>] (nfs_readdir+0x0/0x458) from [<c00fa298>] (vfs_readdir+0x7c/0xb0)
> [ 2700.100000] [<c00fa21c>] (vfs_readdir+0x0/0xb0) from [<c00fa3fc>] (sys_getdents+0x70/0xb8)
> [ 2700.100000] [<c00fa38c>] (sys_getdents+0x0/0xb8) from [<c0033e80>] (ret_fast_syscall+0x0/0x38)
> [ 2700.100000] ?r7:0000008d r6:00000000 r5:402ed00c r4:402ed020
> [ 2700.100000] Sched Debug Version: v0.09, 2.6.37-rc8-00065-g1cd48e3-dirty #35
> [ 2700.100000] now at 2701202.749966 msecs
> [ 2700.100000] ? .jiffies ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? : 240010
> [ 2700.100000] ? .sysctl_sched_latency ? ? ? ? ? ? ? ? ? ?: 6.000000
> [ 2700.100000] ? .sysctl_sched_min_granularity ? ? ? ? ? ?: 0.750000
> [ 2700.100000] ? .sysctl_sched_wakeup_granularity ? ? ? ? : 1.000000
> [ 2700.100000] ? .sysctl_sched_child_runs_first ? ? ? ? ? : 0
> [ 2700.100000] ? .sysctl_sched_features ? ? ? ? ? ? ? ? ? : 31855
> [ 2700.100000] ? .sysctl_sched_tunable_scaling ? ? ? ? ? ?: 1 (logaritmic)
> [ 2700.100000]
> [ 2700.100000] cpu#0
> [ 2700.100000] ? .nr_running ? ? ? ? ? ? ? ? ? ?: 1
> [ 2700.100000] ? .load ? ? ? ? ? ? ? ? ? ? ? ? ?: 1024
> [ 2700.100000] ? .nr_switches ? ? ? ? ? ? ? ? ? : 11875
> [ 2700.100000] ? .nr_load_updates ? ? ? ? ? ? ? : 269696
> [ 2700.100000] ? .nr_uninterruptible ? ? ? ? ? ?: 0
> [ 2700.100000] ? .next_balance ? ? ? ? ? ? ? ? ?: 0.000000
> [ 2700.100000] ? .curr->pid ? ? ? ? ? ? ? ? ? ? : 36
> [ 2700.100000] ? .clock ? ? ? ? ? ? ? ? ? ? ? ? : 2700100.000000
> [ 2700.100000] ? .cpu_load[0] ? ? ? ? ? ? ? ? ? : 1024
> [ 2700.100000] ? .cpu_load[1] ? ? ? ? ? ? ? ? ? : 1024
> [ 2700.100000] ? .cpu_load[2] ? ? ? ? ? ? ? ? ? : 1024
> [ 2700.100000] ? .cpu_load[3] ? ? ? ? ? ? ? ? ? : 1024
> [ 2700.100000] ? .cpu_load[4] ? ? ? ? ? ? ? ? ? : 1024
> [ 2700.100000]
> [ 2700.100000] cfs_rq[0]:
> [ 2700.100000] ? .exec_clock ? ? ? ? ? ? ? ? ? ?: 0.000000
> [ 2700.100000] ? .MIN_vruntime ? ? ? ? ? ? ? ? ?: 0.000001
> [ 2700.100000] ? .min_vruntime ? ? ? ? ? ? ? ? ?: 2695651.938408
> [ 2700.100000] ? .max_vruntime ? ? ? ? ? ? ? ? ?: 0.000001
> [ 2700.100000] ? .spread ? ? ? ? ? ? ? ? ? ? ? ?: 0.000000
> [ 2700.100000] ? .spread0 ? ? ? ? ? ? ? ? ? ? ? : 0.000000
> [ 2700.100000] ? .nr_running ? ? ? ? ? ? ? ? ? ?: 1
> [ 2700.100000] ? .load ? ? ? ? ? ? ? ? ? ? ? ? ?: 1024
> [ 2700.100000] ? .nr_spread_over ? ? ? ? ? ? ? ?: 0
> [ 2700.100000]
> [ 2700.100000] rt_rq[0]:
> [ 2700.100000] ? .rt_nr_running ? ? ? ? ? ? ? ? : 0
> [ 2700.100000] ? .rt_throttled ? ? ? ? ? ? ? ? ?: 0
> [ 2700.100000] ? .rt_time ? ? ? ? ? ? ? ? ? ? ? : 0.000000
> [ 2700.100000] ? .rt_runtime ? ? ? ? ? ? ? ? ? ?: 950.000000
> [ 2700.100000]
> [ 2700.100000] runnable tasks:
> [ 2700.100000] ? ? ? ? ? ? task ? PID ? ? ? ? tree-key ?switches ?prio ? ? exec-runtime ? ? ? ? sum-exec ? ? ? ?sum-sleep
> [ 2700.100000] ----------------------------------------------------------------------------------------------------------
> [ 2700.100000] R S00splashutils ? ?36 ? 2695651.938408 ? ? ?5397 ? 120 ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ? 0.000000 ? ? ? ? ? ? ? 0.000000 ? ? ? ? ? ? ? 0.000000
> [ 2700.100000]
> [ 2700.100000]
> [ 2700.100000] Showing all locks held in the system:
> [ 2700.100000] 4 locks held by S00splashutils/36:
> [ 2700.100000] ?#0: ?(&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<c00fa268>] vfs_readdir+0x4c/0xb0
> [ 2700.100000] ?#1: ?(&port_lock_key){-.-...}, at: [<c01d9f94>] pl011_int+0x20/0x5a4
> [ 2700.100000] ?#2: ?(sysrq_key_table_lock){-.....}, at: [<c01c5f84>] __handle_sysrq+0x24/0x190
> [ 2700.100000] ?#3: ?(tasklist_lock){.?.+..}, at: [<c007c404>] debug_show_all_locks+0x40/0x1a4
> [ 2700.100000]
> [ 2700.100000] =============================================
> [ 2700.100000]
>
>
> --
> Pengutronix e.K. ? ? ? ? ? ? ? ? ? ? ? ? ? | Uwe Kleine-K?nig ? ? ? ? ? ?|
> Industrial Linux Solutions ? ? ? ? ? ? ? ? | http://www.pengutronix.de/ ?|
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 17:57   ` Linus Torvalds
@ 2010-12-30 18:24     ` Trond Myklebust
  2010-12-30 18:50       ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2010-12-30 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-12-30 at 09:57 -0800, Linus Torvalds wrote: 
> Please cc the poor hapless NFS people too, who probably otherwise
> wouldn't see it. And Arnd just in case it might be locking-related.
> 
> Trond, any ideas? The sysrq thing does imply that it's stuck in some
> busy-loop in fs/nfs/dir.c, and line 647 is get_cache_page(), which in
> turn implies that the endless loop is either the loop in
> readdir_search_pagecache() _or_ in a caller. In particular, the
> EBADCOOKIE case in the caller (nfs_readdir) looks suspicious. What
> protects us from endless streams of EBADCOOKIE and a successful
> uncached_readdir?

There is nothing we can do to protect ourselves against an infinite loop
if the server (or underlying filesystem) is breaking the rules w.r.t.
cookie generation. It should be possible to recover from all other
situations.
IOW: if the server generates non-unique cookies, then we're screwed.
Fixing that particular problem is impossible since it is basically a
variant of the halting problem.
That was why I asked which filesystem is being exported in my previous
reply.

The point of 'uncached_readdir' is to resolve a cookie that was
previously valid, but has since been invalidated; usually that is due to
the file having been unlinked. If it succeeds, it should result in a new
set of valid entries being posted to the 'filldir' callback, and a new
cookie being set in the filp->private (i.e. we should have made
progress). If it fails, we exit, as you can see.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 18:24     ` Trond Myklebust
@ 2010-12-30 18:50       ` Linus Torvalds
  2010-12-30 19:25         ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Linus Torvalds @ 2010-12-30 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 30, 2010 at 10:24 AM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
>
> There is nothing we can do to protect ourselves against an infinite loop
> if the server (or underlying filesystem) is breaking the rules w.r.t.
> cookie generation. It should be possible to recover from all other
> situations.

Umm. Sure there is. Just make sure that you return the uncached entry
to user space, rather than loop forever.

Looping forever in kernel space is a bad idea. How about just changing
the "continue" into a "break" for the "uncached readdir returned
success".

No halting problems, no excuses. There is absolutely _no_ excuse for
an endless loop in kernel mode. Certainly not "the other end is
incompetent".

EVERYBODY is incompetent sometimes. That just means that you must
never trust the other end too much. You can't say "we require the
server to be sane in order not to lock up".

                    Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 18:50       ` Linus Torvalds
@ 2010-12-30 19:25         ` Trond Myklebust
  2010-12-30 20:02           ` Linus Torvalds
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2010-12-30 19:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-12-30 at 10:50 -0800, Linus Torvalds wrote: 
> On Thu, Dec 30, 2010 at 10:24 AM, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
> >
> > There is nothing we can do to protect ourselves against an infinite loop
> > if the server (or underlying filesystem) is breaking the rules w.r.t.
> > cookie generation. It should be possible to recover from all other
> > situations.
> 
> Umm. Sure there is. Just make sure that you return the uncached entry
> to user space, rather than loop forever.
> 
> Looping forever in kernel space is a bad idea. How about just changing
> the "continue" into a "break" for the "uncached readdir returned
> success".

uncached_readdir is not really a problem. The real problem is
filesystems that generate "infinite directories" by producing looping
combinations of cookies.

IOW: I've seen servers that generate cookies in a sequence of a form
vaguely resembling

1 2 3 4 5 6 7 8 9 10 11 12 3...

(with possibly a thousand or so entries between the first and second
copy of '3')

The kernel won't loop forever with something like that (because
eventually filldir() will declare it is out of buffer space), but
userland has a halting problem: it needs to detect that every
sys_getdents() call it is making is generating another copy of the
sequence associated with '4 5 6 7 8 9 10 11 12 3'...

> No halting problems, no excuses. There is absolutely _no_ excuse for
> an endless loop in kernel mode. Certainly not "the other end is
> incompetent".

We should never get an endless loop in _kernel mode_ with the current
scheme, and I can't see that anyone has demonstrated that yet.

> EVERYBODY is incompetent sometimes. That just means that you must
> never trust the other end too much. You can't say "we require the
> server to be sane in order not to lock up".

Unfortunately we must. Call it an NFS protocol failure, but it really
boils down to the fact that POSIX readdir() generates a data stream with
no well-defined concept of an offset. As a result, each and every
filesystem has their own interesting ways of generating cookies to
represent that 'offset'.

Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 19:25         ` Trond Myklebust
@ 2010-12-30 20:02           ` Linus Torvalds
  0 siblings, 0 replies; 79+ messages in thread
From: Linus Torvalds @ 2010-12-30 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 30, 2010 at 11:25 AM, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
>
> uncached_readdir is not really a problem. The real problem is
> filesystems that generate "infinite directories" by producing looping
> combinations of cookies.

But if we don't have any lseek's, the readdir cache should trivially
take care of this by just incrementing the page_index, and we should
return to user space the (eventually ending) sequence, even if there
are duplicate numbers.

(Also, I suspect that "page_index" should not be a page index, but a
position, and then you the "search_for_pos()" should use that instead
of the file_pos/current_index thing, but that's a detail that would
show up only when you have duplicate cookies within one page worth of
directory caches)

And if the server really sends us an infinite stream of entries, then
that's fine - at least we give to user space the infinite entries that
were given to us, instead of _generating_ an infinite stream from what
was a finite - but broken - stream).

So it seems wrong that the directory caching code resets page_index to
the start when it then does an uncached readdir. That seems wrong.

I'm sure there's some reason for it, but wouldn't it be nice if the
rule for page_index was that it starts off at zero, and only gets
reset by lseek?

> IOW: I've seen servers that generate cookies in a sequence of a form
> vaguely resembling
>
> 1 2 3 4 5 6 7 8 9 10 11 12 3...
>
> (with possibly a thousand or so entries between the first and second
> copy of '3')

Ok, so that' obviously broken, but it's then _doubly_ broken to turn
that long broken sequence into an _endless_ broken sequence.

And I agree that when user space sees such an endless broken sequence,
it's a real stopping problem for user space. But in the absense of
lseek, it should _never_ be a problem for the kernel itself, afaik.
The kernel should happily return just the broken sequence. No?

So then perhaps the solution is to just remove the resetting of
page_index in the uncached_readdir() function? Make sure that the
page_index is monotonically increasing for any readdir(), and you
protect against turning a bad sequence into an endless sequence.

Of course, lseek() will have to reset page_index to zero, and if
somebody does an lseek() on the directory, then the duplicate '3"
entry in the cookie sequence will inevitably be ambiguous, but that
really is unavoidable. And rare. People who use lseek() on directories
are odd and we know they'll break on other filesystems too under
certain circumstances.

                         Linus

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 17:14 ` Uwe Kleine-König
  2010-12-30 17:57   ` Linus Torvalds
@ 2010-12-30 17:59   ` Trond Myklebust
  2010-12-30 19:18     ` Uwe Kleine-König
  1 sibling, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2010-12-30 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-12-30 at 18:14 +0100, Uwe Kleine-K?nig wrote: 
> Hello,
> 
> I wonder if the nfs-stuff is considered to be solved, because I still
> see strange things.
> 
> During boot my machine sometimes (approx one out of two times) hangs with
> the output pasted below on Sysctl-l.  The irq 
> 
> I'm not 100% sure it's related, but at least it seems to hang in
> nfs_readdir.  (When the serial irq happend that triggered the sysrq the
> program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.)
> 
> This is on 2.6.37-rc8 plus some patches for machine support on an ARM
> machine.

Ccing linux-nfs at vger.kernel.org

What filesystem are you exporting on the server? What is the NFS
version? Is this nfsroot, autofs or an ordinary nfs mount?

In short, how can we reproduce this?

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 17:59   ` Trond Myklebust
@ 2010-12-30 19:18     ` Uwe Kleine-König
  2011-01-03 21:38       ` Uwe Kleine-König
  0 siblings, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2010-12-30 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Trond,

On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> On Thu, 2010-12-30 at 18:14 +0100, Uwe Kleine-K?nig wrote: 
> > I wonder if the nfs-stuff is considered to be solved, because I still
> > see strange things.
> > 
> > During boot my machine sometimes (approx one out of two times) hangs with
> > the output pasted below on Sysctl-l.  The irq 
> > 
> > I'm not 100% sure it's related, but at least it seems to hang in
> > nfs_readdir.  (When the serial irq happend that triggered the sysrq the
> > program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.)
> > 
> > This is on 2.6.37-rc8 plus some patches for machine support on an ARM
> > machine.
> 
> Ccing linux-nfs at vger.kernel.org
Yeah, good idea.  I had that ~2min after sending my report during
dinner, sorry :-\
 
> What filesystem are you exporting on the server? What is the NFS
> version? Is this nfsroot, autofs or an ordinary nfs mount?
This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
directory on a different partition.  I don't know the filesystem of my
homedir as it resides on a server I have no access to, but I asked the
admin, so I can follow up with this info later (I'd suspect ext3, too).
The real root directory is on ext3 (rw,noatime).

The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
nfs-related kernel parameters are

	ip=dhcp root=/dev/nfs nfsroot=192.168.23.2:/home/ukl/nfsroot/tx28,v3,tcp

I hope this answers your questions.  If not, please ask.

I tried without the symlink and saw some different errors, e.g.

	starting splashutils daemon.../etc/rc.d/S00splashutils: line 50: //sbin/fbsplashd.static: Unknown error 521

(this is the init script that hung before) and

	[    6.160000] NFS: server 192.168.23.2 error: fileid changed
	[    6.160000] fsid 0:c: expected fileid 0x33590a4, got 0x4d11bedc

but no hang as before.  So maybe it's related to the symlink?  I don't
know if testing that further would help or just waste of my time, so
please let me know if I can help you and how.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2010-12-30 19:18     ` Uwe Kleine-König
@ 2011-01-03 21:38       ` Uwe Kleine-König
  2011-01-04  0:22         ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-03 21:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 30, 2010 at 08:18:46PM +0100, Uwe Kleine-K?nig wrote:
> On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> > What filesystem are you exporting on the server? What is the NFS
> > version? Is this nfsroot, autofs or an ordinary nfs mount?
> This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
> directory on a different partition.  I don't know the filesystem of my
> homedir as it resides on a server I have no access to, but I asked the
> admin, so I can follow up with this info later (I'd suspect ext3, too).
Yes, it is ext3.

> The real root directory is on ext3 (rw,noatime).
> 
> The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
If that matters, kernel is linux-image-2.6.32-5-amd64 (2.6.32-29)
provided by Debian.

>                                                                I don't
> know if testing that further would help or just waste of my time, so
> please let me know if I can help you and how.
This still applies

Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-03 21:38       ` Uwe Kleine-König
@ 2011-01-04  0:22         ` Trond Myklebust
  2011-01-05  8:40           ` Uwe Kleine-König
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-04  0:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-03 at 22:38 +0100, Uwe Kleine-K?nig wrote: 
> On Thu, Dec 30, 2010 at 08:18:46PM +0100, Uwe Kleine-K?nig wrote:
> > On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> > > What filesystem are you exporting on the server? What is the NFS
> > > version? Is this nfsroot, autofs or an ordinary nfs mount?
> > This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
> > directory on a different partition.  I don't know the filesystem of my
> > homedir as it resides on a server I have no access to, but I asked the
> > admin, so I can follow up with this info later (I'd suspect ext3, too).
> Yes, it is ext3.
> 
> > The real root directory is on ext3 (rw,noatime).
> > 
> > The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
> If that matters, kernel is linux-image-2.6.32-5-amd64 (2.6.32-29)
> provided by Debian.
> 
> >                                                                I don't
> > know if testing that further would help or just waste of my time, so
> > please let me know if I can help you and how.
> This still applies

I'm having trouble reproducing this with my own nfsroot setup (which is
just a 'fedora 13 live' disk with NetworkManager turned firmly off).

However looking back at your report, you said that when you remove the
symlink, you get an error message of the form:

"starting splashutils daemon.../etc/rc.d/S00splashutils: line
50: //sbin/fbsplashd.static: Unknown error 521"

Error 521 is EBADHANDLE, which basically means your client got a
corrupted filehandle. The 'fileid changed' thing also indicates some
form of corruption.

The question is whether this is something happening on the server or the
client. Does an older client kernel boot without any trouble?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-04  0:22         ` Trond Myklebust
@ 2011-01-05  8:40           ` Uwe Kleine-König
  2011-01-05 11:05             ` Uwe Kleine-König
  0 siblings, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-05  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Trond,

On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> The question is whether this is something happening on the server or the
> client. Does an older client kernel boot without any trouble?
I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
compare with.  If you don't consider .36 to be old enough let me now.
Once the setup is done it should be easy to test .35 (say), too.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05  8:40           ` Uwe Kleine-König
@ 2011-01-05 11:05             ` Uwe Kleine-König
  2011-01-05 11:27               ` Russell King - ARM Linux
  0 siblings, 1 reply; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-05 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Trond,

On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > The question is whether this is something happening on the server or the
> > client. Does an older client kernel boot without any trouble?
> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> compare with.  If you don't consider .36 to be old enough let me now.
> Once the setup is done it should be easy to test .35 (say), too.
> 
Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
problems didn't occur.  This was more reliable to trigger and he was so
kind to bisect the problem.

When testing v2.6.36-rc3-51-gafa8ccc init hanged.
(babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
"init hangs" problem the "fileid changed on tab" problem.

I could only reproduce that on armv5 machines (imx27, imx28 and at91)
but not on armv6 (imx35).

Best regards
Uwe

[1] similar means: not during boot, but when hitting tab to get command
completion in the shell.

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 11:05             ` Uwe Kleine-König
@ 2011-01-05 11:27               ` Russell King - ARM Linux
  2011-01-05 12:14                 ` Marc Kleine-Budde
  2011-01-05 13:40                 ` Uwe Kleine-König
  0 siblings, 2 replies; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> Hello Trond,
> 
> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> > On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > > The question is whether this is something happening on the server or the
> > > client. Does an older client kernel boot without any trouble?
> > I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> > compare with.  If you don't consider .36 to be old enough let me now.
> > Once the setup is done it should be easy to test .35 (say), too.
> > 
> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> problems didn't occur.  This was more reliable to trigger and he was so
> kind to bisect the problem.
> 
> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> "init hangs" problem the "fileid changed on tab" problem.
> 
> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> but not on armv6 (imx35).

FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
truncation of the fileid.  It occurred several times on successive
reboots, so I tried to capture a tcpdump trace off the server (Linux
2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
IDE drivers trying to move it forward.)  However, for the last couple
of weeks I've been unable to reproduce it.

The client was based on 2.6.37-rc6.

The "fileid changed" messages popped up after mounting an export with
'nolock,intr,rsize=4096,soft', and then trying to use bash completion
and 'ls' in a few subdirectories - and entries were missing from the
directory lists without 'ls' reporting any errors (which I think is bad
behaviour in itself.)

I don't know why it's stopped producing the errors, although once it
went I never investigated it any further (was far too busy trying to
get AMBA DMA support working.)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 11:27               ` Russell King - ARM Linux
@ 2011-01-05 12:14                 ` Marc Kleine-Budde
  2011-01-05 13:02                   ` Nori, Sekhar
  2011-01-05 13:40                 ` Uwe Kleine-König
  1 sibling, 1 reply; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-05 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/05/2011 12:27 PM, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
>> Hello Trond,
>>
>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
>>>> The question is whether this is something happening on the server or the
>>>> client. Does an older client kernel boot without any trouble?
>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
>>> compare with.  If you don't consider .36 to be old enough let me now.
>>> Once the setup is done it should be easy to test .35 (say), too.
>>>
>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
>> problems didn't occur.  This was more reliable to trigger and he was so
>> kind to bisect the problem.
>>
>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
>> this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
>> "init hangs" problem the "fileid changed on tab" problem.
>>
>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
>> but not on armv6 (imx35).
> 
> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> truncation of the fileid.  It occurred several times on successive
> reboots, so I tried to capture a tcpdump trace off the server (Linux
> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> IDE drivers trying to move it forward.)  However, for the last couple
> of weeks I've been unable to reproduce it.

We have the problem with nfs-root. From the kernel command line:

root=/dev/nfs
nfsroot=192.168.23.2:/home/mkl/pengutronix/xxx/bsp/OSELAS.BSP-xxx-Grabowski-trunk/platform-Ronetix-PM9263/root,v3,tcp

/home/mkl/pengutronix is a link which points to a link
/ptx/work/octopus/mkl (which is a ext3-based) which points to
WORK_1/mkl which is also ext3-based.

The server is 2.6.32 and has been rebooted yesterday :), nfs-utils are
1.2.2. I make a tcpdump if needed.

Cheers, Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110105/eba9dcec/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 12:14                 ` Marc Kleine-Budde
@ 2011-01-05 13:02                   ` Nori, Sekhar
  2011-01-05 15:34                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 79+ messages in thread
From: Nori, Sekhar @ 2011-01-05 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 17:44:16, Marc Kleine-Budde wrote:
> On 01/05/2011 12:27 PM, Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> >> Hello Trond,
> >>
> >> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> >>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> >>>> The question is whether this is something happening on the server or the
> >>>> client. Does an older client kernel boot without any trouble?
> >>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> >>> compare with.  If you don't consider .36 to be old enough let me now.
> >>> Once the setup is done it should be easy to test .35 (say), too.
> >>>
> >> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> >> problems didn't occur.  This was more reliable to trigger and he was so
> >> kind to bisect the problem.
> >>
> >> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> >> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> >> this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> >> "init hangs" problem the "fileid changed on tab" problem.
> >>
> >> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> >> but not on armv6 (imx35).
> > 
> > FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> > truncation of the fileid.  It occurred several times on successive
> > reboots, so I tried to capture a tcpdump trace off the server (Linux
> > 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> > IDE drivers trying to move it forward.)  However, for the last couple
> > of weeks I've been unable to reproduce it.
> 
> We have the problem with nfs-root. From the kernel command line:
> 
> root=/dev/nfs
> nfsroot=192.168.23.2:/home/mkl/pengutronix/xxx/bsp/OSELAS.BSP-xxx-Grabowski-trunk/platform-Ronetix-PM9263/root,v3,tcp
> 
> /home/mkl/pengutronix is a link which points to a link
> /ptx/work/octopus/mkl (which is a ext3-based) which points to
> WORK_1/mkl which is also ext3-based.
> 
> The server is 2.6.32 and has been rebooted yesterday :), nfs-utils are
> 1.2.2. I make a tcpdump if needed.

I see the issue too with an ARMv5 based DM355 board with the just released
v2.6.37 tag (nfs client). I too see the issue when using bash tab completion.

Here are some logs:

fileid changed fsid 0:c: expected fileid 0x2db61d, got 0x2dad20

fileid changed fsid 0:c: expected fileid 0x100000000000, got 0x7070000000000000

I am using Fedora 8 (2.6.25 kernel) on the server side. I will try the latest
Ubuntu release on the server side and test.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 13:02                   ` Nori, Sekhar
@ 2011-01-05 15:34                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 06:32:58PM +0530, Nori, Sekhar wrote:
> Here are some logs:
> 
> fileid changed fsid 0:c: expected fileid 0x2db61d, got 0x2dad20
> 
> fileid changed fsid 0:c: expected fileid 0x100000000000, got 0x7070000000000000

Just to be clear, what I was seeing was things like:

expected fileid <32-bit number> got <64-bit number with same 32-bit LS bits>

so it looked like something was truncating a 64-bit fileid down
to 32-bits.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 11:27               ` Russell King - ARM Linux
  2011-01-05 12:14                 ` Marc Kleine-Budde
@ 2011-01-05 13:40                 ` Uwe Kleine-König
  2011-01-05 14:29                   ` Jim Rees
  2011-01-05 14:53                   ` Trond Myklebust
  1 sibling, 2 replies; 79+ messages in thread
From: Uwe Kleine-König @ 2011-01-05 13:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> > Hello Trond,
> > 
> > On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> > > On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > > > The question is whether this is something happening on the server or the
> > > > client. Does an older client kernel boot without any trouble?
> > > I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> > > compare with.  If you don't consider .36 to be old enough let me now.
> > > Once the setup is done it should be easy to test .35 (say), too.
> > > 
> > Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> > problems didn't occur.  This was more reliable to trigger and he was so
> > kind to bisect the problem.
> > 
> > When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> > (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> > this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> > "init hangs" problem the "fileid changed on tab" problem.
> > 
> > I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> > but not on armv6 (imx35).
> 
> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> truncation of the fileid.  It occurred several times on successive
> reboots, so I tried to capture a tcpdump trace off the server (Linux
> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> IDE drivers trying to move it forward.)  However, for the last couple
> of weeks I've been unable to reproduce it.
> 
> The client was based on 2.6.37-rc6.
> 
> The "fileid changed" messages popped up after mounting an export with
> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> and 'ls' in a few subdirectories - and entries were missing from the
> directory lists without 'ls' reporting any errors (which I think is bad
> behaviour in itself.)
There was a bug in at least -rc5[1] that was considered already fixed in
-rc4[2]. The later announcements didn't mention it anymore. 

> I don't know why it's stopped producing the errors, although once it
> went I never investigated it any further (was far too busy trying to
> get AMBA DMA support working.)
It seems it was fixed for most users though. Trond?

Uwe

[1] http://lwn.net/Articles/418963/
[2] http://lwn.net/Articles/417704/
-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 13:40                 ` Uwe Kleine-König
@ 2011-01-05 14:29                   ` Jim Rees
  2011-01-05 14:42                     ` Marc Kleine-Budde
  2011-01-05 14:53                   ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Jim Rees @ 2011-01-05 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

Uwe Kleine-K?nig wrote:

  > The "fileid changed" messages popped up after mounting an export with
  > 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
  > and 'ls' in a few subdirectories - and entries were missing from the
  > directory lists without 'ls' reporting any errors (which I think is bad
  > behaviour in itself.)
  There was a bug in at least -rc5[1] that was considered already fixed in
  -rc4[2]. The later announcements didn't mention it anymore. 
  
  > I don't know why it's stopped producing the errors, although once it
  > went I never investigated it any further (was far too busy trying to
  > get AMBA DMA support working.)
  It seems it was fixed for most users though. Trond?

Trond sent a fix to the nfs list on 27 Nov for "fileid changed" but I don't
know if this is the same bug you're seeing.  The patch was to
nfs_same_file() and I can send it if you want.  As far as I know the patch
made it upstream.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 14:29                   ` Jim Rees
@ 2011-01-05 14:42                     ` Marc Kleine-Budde
  2011-01-05 15:38                       ` Jim Rees
  0 siblings, 1 reply; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-05 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/05/2011 03:29 PM, Jim Rees wrote:
> Uwe Kleine-K?nig wrote:
> 
>   > The "fileid changed" messages popped up after mounting an export with
>   > 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
>   > and 'ls' in a few subdirectories - and entries were missing from the
>   > directory lists without 'ls' reporting any errors (which I think is bad
>   > behaviour in itself.)
>   There was a bug in at least -rc5[1] that was considered already fixed in
>   -rc4[2]. The later announcements didn't mention it anymore. 
>   
>   > I don't know why it's stopped producing the errors, although once it
>   > went I never investigated it any further (was far too busy trying to
>   > get AMBA DMA support working.)
>   It seems it was fixed for most users though. Trond?
> 
> Trond sent a fix to the nfs list on 27 Nov for "fileid changed" but I don't
> know if this is the same bug you're seeing.  The patch was to
> nfs_same_file() and I can send it if you want.  As far as I know the patch
> made it upstream.

Are you sure it's in .37?

The pick-axe just found one commit so far
(although it's still searching):

$ git log -Snfs_same_file
commit d39ab9de3b80da5835049b1c3b49da4e84e01c07
Author: Bryan Schumaker <bjschuma@netapp.com>
Date:   Fri Sep 24 18:50:01 2010 -0400

    NFS: re-add readdir plus

    This patch adds readdir plus support to the cache array.

    Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Would you please be so kind and send the patch to this thread?

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110105/64f1ba04/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 14:42                     ` Marc Kleine-Budde
@ 2011-01-05 15:38                       ` Jim Rees
  0 siblings, 0 replies; 79+ messages in thread
From: Jim Rees @ 2011-01-05 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

Marc Kleine-Budde wrote:

  > Trond sent a fix to the nfs list on 27 Nov for "fileid changed" but I don't
  > know if this is the same bug you're seeing.  The patch was to
  > nfs_same_file() and I can send it if you want.  As far as I know the patch
  > made it upstream.
  
  Are you sure it's in .37?

...

  Would you please be so kind and send the patch to this thread?

This is the commit I'm thinking of:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=37a09f07459753e7c98d4e21f1c61e8756923f81

NFS: Fix a readdirplus bug

When comparing filehandles in the helper nfs_same_file(), we should not be
using 'strncmp()': filehandles are not null terminated strings.

Instead, we should just use the existing helper nfs_compare_fh().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 8ea4a41..f0a384e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -395,13 +395,9 @@ int xdr_decode(nfs_readdir_descriptor_t *desc, struct nfs_entry *entry, struct x
 static
 int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
 {
-	struct nfs_inode *node;
 	if (dentry->d_inode == NULL)
 		goto different;
-	node = NFS_I(dentry->d_inode);
-	if (node->fh.size != entry->fh->size)
-		goto different;
-	if (strncmp(node->fh.data, entry->fh->data, node->fh.size) != 0)
+	if (nfs_compare_fh(entry->fh, NFS_FH(dentry->d_inode)) != 0)
 		goto different;
 	return 1;
 different:

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 13:40                 ` Uwe Kleine-König
  2011-01-05 14:29                   ` Jim Rees
@ 2011-01-05 14:53                   ` Trond Myklebust
  2011-01-05 15:01                     ` Marc Kleine-Budde
  2011-01-14  2:25                     ` Andy Isaacson
  1 sibling, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 14:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-K?nig wrote: 
> Hi Russell,
> 
> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> > > Hello Trond,
> > > 
> > > On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> > > > On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > > > > The question is whether this is something happening on the server or the
> > > > > client. Does an older client kernel boot without any trouble?
> > > > I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> > > > compare with.  If you don't consider .36 to be old enough let me now.
> > > > Once the setup is done it should be easy to test .35 (say), too.
> > > > 
> > > Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> > > problems didn't occur.  This was more reliable to trigger and he was so
> > > kind to bisect the problem.
> > > 
> > > When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> > > (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> > > this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> > > "init hangs" problem the "fileid changed on tab" problem.
> > > 
> > > I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> > > but not on armv6 (imx35).
> > 
> > FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> > truncation of the fileid.  It occurred several times on successive
> > reboots, so I tried to capture a tcpdump trace off the server (Linux
> > 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> > IDE drivers trying to move it forward.)  However, for the last couple
> > of weeks I've been unable to reproduce it.
> > 
> > The client was based on 2.6.37-rc6.
> > 
> > The "fileid changed" messages popped up after mounting an export with
> > 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> > and 'ls' in a few subdirectories - and entries were missing from the
> > directory lists without 'ls' reporting any errors (which I think is bad
> > behaviour in itself.)
> There was a bug in at least -rc5[1] that was considered already fixed in
> -rc4[2]. The later announcements didn't mention it anymore. 
> 
> > I don't know why it's stopped producing the errors, although once it
> > went I never investigated it any further (was far too busy trying to
> > get AMBA DMA support working.)
> It seems it was fixed for most users though. Trond?

As I said, I can't reproduce it.

I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
x86, or does it appear to be architecture-specific?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 14:53                   ` Trond Myklebust
@ 2011-01-05 15:01                     ` Marc Kleine-Budde
  2011-01-05 15:14                       ` Trond Myklebust
  2011-01-14  2:25                     ` Andy Isaacson
  1 sibling, 1 reply; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-05 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/05/2011 03:53 PM, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-K?nig wrote: 
>> Hi Russell,
>>
>> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
>>> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
>>>> Hello Trond,
>>>>
>>>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
>>>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
>>>>>> The question is whether this is something happening on the server or the
>>>>>> client. Does an older client kernel boot without any trouble?
>>>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
>>>>> compare with.  If you don't consider .36 to be old enough let me now.
>>>>> Once the setup is done it should be easy to test .35 (say), too.
>>>>>
>>>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
>>>> problems didn't occur.  This was more reliable to trigger and he was so
>>>> kind to bisect the problem.
>>>>
>>>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
>>>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
>>>> this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
>>>> "init hangs" problem the "fileid changed on tab" problem.
>>>>
>>>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
>>>> but not on armv6 (imx35).
>>>
>>> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
>>> truncation of the fileid.  It occurred several times on successive
>>> reboots, so I tried to capture a tcpdump trace off the server (Linux
>>> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
>>> IDE drivers trying to move it forward.)  However, for the last couple
>>> of weeks I've been unable to reproduce it.
>>>
>>> The client was based on 2.6.37-rc6.
>>>
>>> The "fileid changed" messages popped up after mounting an export with
>>> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
>>> and 'ls' in a few subdirectories - and entries were missing from the
>>> directory lists without 'ls' reporting any errors (which I think is bad
>>> behaviour in itself.)
>> There was a bug in at least -rc5[1] that was considered already fixed in
>> -rc4[2]. The later announcements didn't mention it anymore. 
>>
>>> I don't know why it's stopped producing the errors, although once it
>>> went I never investigated it any further (was far too busy trying to
>>> get AMBA DMA support working.)
>> It seems it was fixed for most users though. Trond?
> 
> As I said, I can't reproduce it.
> 
> I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> x86, or does it appear to be architecture-specific?

It _seems_ to be ARMv5 specific[1]. Uwe did some tests and figured out
that disabling dcache on ARMv5 "fixes" the problem, but
CONFIG_CPU_DCACHE_WRITETHROUGH isn't enough.

[1] Uwe fails to reproduce it on ARMv6. The ARMv6 has a L2 cache and
uses IIRC different instructions to flush the L1 caches. (please correct
me, if I'm wrong, ARM guys :)

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110105/51700fc7/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 15:01                     ` Marc Kleine-Budde
@ 2011-01-05 15:14                       ` Trond Myklebust
  2011-01-05 15:29                         ` Trond Myklebust
  2011-01-05 15:52                         ` Russell King - ARM Linux
  0 siblings, 2 replies; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 16:01 +0100, Marc Kleine-Budde wrote: 
> On 01/05/2011 03:53 PM, Trond Myklebust wrote:
> > On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-K?nig wrote: 
> >> Hi Russell,
> >>
> >> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
> >>> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> >>>> Hello Trond,
> >>>>
> >>>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> >>>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> >>>>>> The question is whether this is something happening on the server or the
> >>>>>> client. Does an older client kernel boot without any trouble?
> >>>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> >>>>> compare with.  If you don't consider .36 to be old enough let me now.
> >>>>> Once the setup is done it should be easy to test .35 (say), too.
> >>>>>
> >>>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> >>>> problems didn't occur.  This was more reliable to trigger and he was so
> >>>> kind to bisect the problem.
> >>>>
> >>>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> >>>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> >>>> this hang.)  Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> >>>> "init hangs" problem the "fileid changed on tab" problem.
> >>>>
> >>>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> >>>> but not on armv6 (imx35).
> >>>
> >>> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> >>> truncation of the fileid.  It occurred several times on successive
> >>> reboots, so I tried to capture a tcpdump trace off the server (Linux
> >>> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> >>> IDE drivers trying to move it forward.)  However, for the last couple
> >>> of weeks I've been unable to reproduce it.
> >>>
> >>> The client was based on 2.6.37-rc6.
> >>>
> >>> The "fileid changed" messages popped up after mounting an export with
> >>> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> >>> and 'ls' in a few subdirectories - and entries were missing from the
> >>> directory lists without 'ls' reporting any errors (which I think is bad
> >>> behaviour in itself.)
> >> There was a bug in at least -rc5[1] that was considered already fixed in
> >> -rc4[2]. The later announcements didn't mention it anymore. 
> >>
> >>> I don't know why it's stopped producing the errors, although once it
> >>> went I never investigated it any further (was far too busy trying to
> >>> get AMBA DMA support working.)
> >> It seems it was fixed for most users though. Trond?
> > 
> > As I said, I can't reproduce it.
> > 
> > I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> > x86, or does it appear to be architecture-specific?
> 
> It _seems_ to be ARMv5 specific[1]. Uwe did some tests and figured out
> that disabling dcache on ARMv5 "fixes" the problem, but
> CONFIG_CPU_DCACHE_WRITETHROUGH isn't enough.
> 
> [1] Uwe fails to reproduce it on ARMv6. The ARMv6 has a L2 cache and
> uses IIRC different instructions to flush the L1 caches. (please correct
> me, if I'm wrong, ARM guys :)
> 
> cheers, Marc

OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
interfaces, but we can end up reading them via a virtual address range
that gets set up via vm_map_ram() (that range gets set up before the
write occurs).

Do we perhaps need an invalidate_kernel_vmap_range() before we can read
the data on ARM in this kind of scenario?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 15:14                       ` Trond Myklebust
@ 2011-01-05 15:29                         ` Trond Myklebust
  2011-01-05 15:39                           ` Marc Kleine-Budde
  2011-01-05 15:52                         ` Russell King - ARM Linux
  1 sibling, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 10:14 -0500, Trond Myklebust wrote: 
> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> interfaces, but we can end up reading them via a virtual address range
> that gets set up via vm_map_ram() (that range gets set up before the
> write occurs).
> 
> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> the data on ARM in this kind of scenario?

IOW: Does something like the following patch fix the problem?

------------------------------------------------------------------------------- 
From: Trond Myklebust <Trond.Myklebust@netapp.com>
NFS: Ensure we clean the TLB cache in nfs_readdir_xdr_to_array
    
After calling nfs_readdir_xdr_filler(), we need a call to
invalidate_kernel_vmap_range() before we can proceed to read
the data back through the virtual address range.
    
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 996dd89..4640470 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -587,6 +587,9 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
 		if (status < 0)
 			break;
 		pglen = status;
+
+		invalidate_kernel_vmap_range(pages_ptr, pglen);
+
 		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
 		if (status < 0) {
 			if (status == -ENOSPC)


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 15:29                         ` Trond Myklebust
@ 2011-01-05 15:39                           ` Marc Kleine-Budde
  0 siblings, 0 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2011-01-05 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/05/2011 04:29 PM, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 10:14 -0500, Trond Myklebust wrote: 
>> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
>> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
>> interfaces, but we can end up reading them via a virtual address range
>> that gets set up via vm_map_ram() (that range gets set up before the
>> write occurs).
>>
>> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
>> the data on ARM in this kind of scenario?
> 
> IOW: Does something like the following patch fix the problem?
> 
> ------------------------------------------------------------------------------- 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> NFS: Ensure we clean the TLB cache in nfs_readdir_xdr_to_array
>     
> After calling nfs_readdir_xdr_filler(), we need a call to
> invalidate_kernel_vmap_range() before we can proceed to read
> the data back through the virtual address range.
>     
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 996dd89..4640470 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -587,6 +587,9 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
>  		if (status < 0)
>  			break;
>  		pglen = status;
> +
> +		invalidate_kernel_vmap_range(pages_ptr, pglen);
> +
>  		status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
>  		if (status < 0) {
>  			if (status == -ENOSPC)

\o/ - Works for me (at91, armv5)

Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>

This is a candidate for stable (Cc'd).

Regards,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110105/2804eb32/attachment.sig>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 15:14                       ` Trond Myklebust
  2011-01-05 15:29                         ` Trond Myklebust
@ 2011-01-05 15:52                         ` Russell King - ARM Linux
  2011-01-05 17:17                           ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> interfaces, but we can end up reading them via a virtual address range
> that gets set up via vm_map_ram() (that range gets set up before the
> write occurs).

kmap of lowmem pages will always reuses the existing kernel direct
mapping, so there won't be a problem there.

> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> the data on ARM in this kind of scenario?

Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
it take care of page colouring - so any architecture where cache aliasing
can occur will see this problem.  It should not limited to ARM.

Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
There's two problems here:

	addr = kmap(lowmem_page);
	*addr = stuff;
	kunmap(lowmem_page);

Such lowmem pages are accessed through their kernel direct mapping.

	ptr = vm_map_ram(lowmem_page);
	read = *ptr;

This creates a new mapping which can alias with the kernel direct mapping.
Now, as this is a new mapping, there should be no cache lines associated
with it.  (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
region.  vb_free() also calls flush_cache_vunmap() too.)

If the write after kmap() hits an already present cache line, the cache
line will be updated, but it won't be written back to memory.  So, on
a subsequent vm_map_ram(), with any kind of aliasing cache, there's
no guarantee that you'll hit that cache line and read the data just
written there.

The kernel direct mapping would need to be flushed.

I'm really getting to the point of hating the poliferation of RAM
remapping interfaces - it's going to (and is) causing nothing but lots
of pain on virtual cache architectures, needing more and more cache
flushing interfaces to be created.

Is there any other solution to this?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 15:52                         ` Russell King - ARM Linux
@ 2011-01-05 17:17                           ` Trond Myklebust
  2011-01-05 17:26                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 15:52 +0000, Russell King - ARM Linux wrote: 
> On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
> > OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> > pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> > interfaces, but we can end up reading them via a virtual address range
> > that gets set up via vm_map_ram() (that range gets set up before the
> > write occurs).
> 
> kmap of lowmem pages will always reuses the existing kernel direct
> mapping, so there won't be a problem there.
> 
> > Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> > the data on ARM in this kind of scenario?
> 
> Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
> it take care of page colouring - so any architecture where cache aliasing
> can occur will see this problem.  It should not limited to ARM.
> 
> Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
> There's two problems here:
> 
> 	addr = kmap(lowmem_page);
> 	*addr = stuff;
> 	kunmap(lowmem_page);
> 
> Such lowmem pages are accessed through their kernel direct mapping.
> 
> 	ptr = vm_map_ram(lowmem_page);
> 	read = *ptr;
> 
> This creates a new mapping which can alias with the kernel direct mapping.
> Now, as this is a new mapping, there should be no cache lines associated
> with it.  (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
> free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
> region.  vb_free() also calls flush_cache_vunmap() too.)
> 
> If the write after kmap() hits an already present cache line, the cache
> line will be updated, but it won't be written back to memory.  So, on
> a subsequent vm_map_ram(), with any kind of aliasing cache, there's
> no guarantee that you'll hit that cache line and read the data just
> written there.
> 
> The kernel direct mapping would need to be flushed.

We should already be flushing the kernel direct mapping after writing by
means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
and all the helpers in net/sunrpc/xdr.c.

The only new thing is the read access through the virtual address
mapping. That mapping is created outside the loop in
nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
invalidate_kernel_vmap_range(): we're essentially doing a series of
writes through the kernel direct mapping (i.e. readdir RPC calls), then
reading the results through the virtual mapping.

i.e. we're doing

ptr = vm_map_ram(lowmem_pages);
while (need_more_data) {

for (i = 0; i < npages; i++) {
addr = kmap_atomic(lowmem_page[i]);
*addr = rpc_stuff;
flush_dcache_page(lowmem_page[i]);
kunmap_atomic(lowmem_page[i]);
}

invalidate_kernel_vmap_range(ptr); // Needed here?

read = *ptr;
}
vm_unmap_ram(lowmem_pages)

> I'm really getting to the point of hating the poliferation of RAM
> remapping interfaces - it's going to (and is) causing nothing but lots
> of pain on virtual cache architectures, needing more and more cache
> flushing interfaces to be created.
> 
> Is there any other solution to this?

Arbitrary sized pages. :-)

The problem here is that we want to read variable sized records (i.e.
readdir() records) from a multi-page buffer. We could do that by copying
those particular records that overlap with page boundaries, but that
would make for a fairly intrusive rewrite too.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 17:17                           ` Trond Myklebust
@ 2011-01-05 17:26                             ` Russell King - ARM Linux
  2011-01-05 18:12                               ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> We should already be flushing the kernel direct mapping after writing by
> means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> and all the helpers in net/sunrpc/xdr.c.

Hmm, we're getting into the realms of what flush_dcache_page() is supposed
to do and what it's not supposed to do.

Is this page an associated with a mapping (iow, page_mapping(page) is non-
NULL)?  If not, flush_dcache_page() won't do anything, and from my
understanding, its flush_anon_page() which you want to be using there
instead.

> The only new thing is the read access through the virtual address
> mapping. That mapping is created outside the loop in
> nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
> invalidate_kernel_vmap_range(): we're essentially doing a series of
> writes through the kernel direct mapping (i.e. readdir RPC calls), then
> reading the results through the virtual mapping.
> 
> i.e. we're doing
> 
> ptr = vm_map_ram(lowmem_pages);
> while (need_more_data) {
> 
> for (i = 0; i < npages; i++) {
> addr = kmap_atomic(lowmem_page[i]);
> *addr = rpc_stuff;
> flush_dcache_page(lowmem_page[i]);
> kunmap_atomic(lowmem_page[i]);
> }
> 
> invalidate_kernel_vmap_range(ptr); // Needed here?

Yes, you're going to need some cache maintainence in there to make it work,
because accessing 'ptr' will load that data into the cache, and that won't
be updated by the writes via kmap_atomic().

Provided you don't write to ptr, then using invalidate_kernel_vmap_range()
will be safe.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 17:26                             ` Russell King - ARM Linux
@ 2011-01-05 18:12                               ` Trond Myklebust
  2011-01-05 18:27                                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote: 
> On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > We should already be flushing the kernel direct mapping after writing by
> > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > and all the helpers in net/sunrpc/xdr.c.
> 
> Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> to do and what it's not supposed to do.
> 
> Is this page an associated with a mapping (iow, page_mapping(page) is non-
> NULL)?  If not, flush_dcache_page() won't do anything, and from my
> understanding, its flush_anon_page() which you want to be using there
> instead.

Actually, none of these pages are ever mapped into userspace, nor are
they mapped into the page cache.

They are allocated directly using alloc_page() by the thread that called
the readdir() syscall, so afaics there should be no incoherent mappings
other than the kernel direct mapping and the one created by
vm_map_ram().

So, yes, you are right that we don't need the flush_dcache_page() here.

> > The only new thing is the read access through the virtual address
> > mapping. That mapping is created outside the loop in
> > nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
> > invalidate_kernel_vmap_range(): we're essentially doing a series of
> > writes through the kernel direct mapping (i.e. readdir RPC calls), then
> > reading the results through the virtual mapping.
> > 
> > i.e. we're doing
> > 
> > ptr = vm_map_ram(lowmem_pages);
> > while (need_more_data) {
> > 
> > for (i = 0; i < npages; i++) {
> > addr = kmap_atomic(lowmem_page[i]);
> > *addr = rpc_stuff;
> > flush_dcache_page(lowmem_page[i]);
> > kunmap_atomic(lowmem_page[i]);
> > }
> > 
> > invalidate_kernel_vmap_range(ptr); // Needed here?
> 
> Yes, you're going to need some cache maintainence in there to make it work,
> because accessing 'ptr' will load that data into the cache, and that won't
> be updated by the writes via kmap_atomic().
> 
> Provided you don't write to ptr, then using invalidate_kernel_vmap_range()
> will be safe.

Thanks! That is what Marc's testing appears to confirm.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 18:12                               ` Trond Myklebust
@ 2011-01-05 18:27                                 ` Russell King - ARM Linux
  2011-01-05 18:55                                   ` Trond Myklebust
  0 siblings, 1 reply; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 18:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 01:12:25PM -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote: 
> > On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > > We should already be flushing the kernel direct mapping after writing by
> > > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > > and all the helpers in net/sunrpc/xdr.c.
> > 
> > Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> > to do and what it's not supposed to do.
> > 
> > Is this page an associated with a mapping (iow, page_mapping(page) is non-
> > NULL)?  If not, flush_dcache_page() won't do anything, and from my
> > understanding, its flush_anon_page() which you want to be using there
> > instead.
> 
> Actually, none of these pages are ever mapped into userspace, nor are
> they mapped into the page cache.
> 
> They are allocated directly using alloc_page() by the thread that called
> the readdir() syscall, so afaics there should be no incoherent mappings
> other than the kernel direct mapping and the one created by
> vm_map_ram().
> 
> So, yes, you are right that we don't need the flush_dcache_page() here.

I do still think you need _something_ there, otherwise data can remain
in the direct map alias and not be visible via the vmap alias.  I don't
see that we have anything in place to handle this at present though.

jejb mentioned something about making kunmap_atomic() always flush the
cache, even for lowmem pages, but I think that's going to be exceedingly
painful, to the extent that I believe it will prevent our PIO-only MMC
drivers working - or we need a scatterlist API that will let drivers
iterate over the scatterlist without needing to continually kmap_atomic
and kunmap_atomic each page.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 18:27                                 ` Russell King - ARM Linux
@ 2011-01-05 18:55                                   ` Trond Myklebust
  2011-01-05 19:07                                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-05 18:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-05 at 18:27 +0000, Russell King - ARM Linux wrote: 
> On Wed, Jan 05, 2011 at 01:12:25PM -0500, Trond Myklebust wrote:
> > On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote: 
> > > On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > > > We should already be flushing the kernel direct mapping after writing by
> > > > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > > > and all the helpers in net/sunrpc/xdr.c.
> > > 
> > > Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> > > to do and what it's not supposed to do.
> > > 
> > > Is this page an associated with a mapping (iow, page_mapping(page) is non-
> > > NULL)?  If not, flush_dcache_page() won't do anything, and from my
> > > understanding, its flush_anon_page() which you want to be using there
> > > instead.
> > 
> > Actually, none of these pages are ever mapped into userspace, nor are
> > they mapped into the page cache.
> > 
> > They are allocated directly using alloc_page() by the thread that called
> > the readdir() syscall, so afaics there should be no incoherent mappings
> > other than the kernel direct mapping and the one created by
> > vm_map_ram().
> > 
> > So, yes, you are right that we don't need the flush_dcache_page() here.
> 
> I do still think you need _something_ there, otherwise data can remain
> in the direct map alias and not be visible via the vmap alias.  I don't
> see that we have anything in place to handle this at present though.

Is that perhaps what flush_kernel_dcache_page() is supposed to do?

> jejb mentioned something about making kunmap_atomic() always flush the
> cache, even for lowmem pages, but I think that's going to be exceedingly
> painful, to the extent that I believe it will prevent our PIO-only MMC
> drivers working - or we need a scatterlist API that will let drivers
> iterate over the scatterlist without needing to continually kmap_atomic
> and kunmap_atomic each page.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 18:55                                   ` Trond Myklebust
@ 2011-01-05 19:07                                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 79+ messages in thread
From: Russell King - ARM Linux @ 2011-01-05 19:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 01:55:05PM -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 18:27 +0000, Russell King - ARM Linux wrote: 
> > I do still think you need _something_ there, otherwise data can remain
> > in the direct map alias and not be visible via the vmap alias.  I don't
> > see that we have anything in place to handle this at present though.
> 
> Is that perhaps what flush_kernel_dcache_page() is supposed to do?

Well, given how we have things currently setup on ARM, this ends up
being a no-op - as new page cache pages are marked dirty and their
flushing done at the point when they're mapped into userspace.

I guess we could do the flushing there and mark the page clean, but
it'd need some careful examination of various code paths to confirm
that it's safe - we may be avoiding this because some ARM arch
versions need to manually IPI cache flushes to other cores (which
can only be done with IRQs enabled.)

So, I don't think it'll do at the present time.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-05 14:53                   ` Trond Myklebust
  2011-01-05 15:01                     ` Marc Kleine-Budde
@ 2011-01-14  2:25                     ` Andy Isaacson
  2011-01-14  2:40                       ` Trond Myklebust
  1 sibling, 1 reply; 79+ messages in thread
From: Andy Isaacson @ 2011-01-14  2:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 05, 2011 at 09:53:13AM -0500, Trond Myklebust wrote:
> > There was a bug in at least -rc5[1] that was considered already fixed in
> > -rc4[2]. The later announcements didn't mention it anymore. 
> > 
> > > I don't know why it's stopped producing the errors, although once it
> > > went I never investigated it any further (was far too busy trying to
> > > get AMBA DMA support working.)
> > It seems it was fixed for most users though. Trond?
> 
> As I said, I can't reproduce it.
> 
> I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> x86, or does it appear to be architecture-specific?

I'm seeing processes stuck in D with "fileid changed" in dmesg, on
x86_64 (both server and client).  The repro testcase is to run an
executable off of NFS, recompile it on the server, and then try to tab
complete the executable name.  The client prints

    NFS: server <hostname> error: fileid changed
    fsid 0:18: expected fileid 0x107aa4a, got 0x107ad3e

and /bin/zsh hangs in D.

My server is running 2.6.36.1, filesystem is ext3 on sda3 on AHCI,
client is currently running 2.6.37-rc1.  I'm assuming that 37a09f will
fix it.

-andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-14  2:25                     ` Andy Isaacson
@ 2011-01-14  2:40                       ` Trond Myklebust
  2011-01-14  4:22                         ` Andy Isaacson
  0 siblings, 1 reply; 79+ messages in thread
From: Trond Myklebust @ 2011-01-14  2:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-13 at 18:25 -0800, Andy Isaacson wrote: 
> On Wed, Jan 05, 2011 at 09:53:13AM -0500, Trond Myklebust wrote:
> > > There was a bug in at least -rc5[1] that was considered already fixed in
> > > -rc4[2]. The later announcements didn't mention it anymore. 
> > > 
> > > > I don't know why it's stopped producing the errors, although once it
> > > > went I never investigated it any further (was far too busy trying to
> > > > get AMBA DMA support working.)
> > > It seems it was fixed for most users though. Trond?
> > 
> > As I said, I can't reproduce it.
> > 
> > I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> > x86, or does it appear to be architecture-specific?
> 
> I'm seeing processes stuck in D with "fileid changed" in dmesg, on
> x86_64 (both server and client).  The repro testcase is to run an
> executable off of NFS, recompile it on the server, and then try to tab
> complete the executable name.  The client prints
> 
>     NFS: server <hostname> error: fileid changed
>     fsid 0:18: expected fileid 0x107aa4a, got 0x107ad3e
> 
> and /bin/zsh hangs in D.
> 
> My server is running 2.6.36.1, filesystem is ext3 on sda3 on AHCI,
> client is currently running 2.6.37-rc1.  I'm assuming that 37a09f will
> fix it.

Why are you sticking to 2.6.37-rc1 when the final 2.6.37 is out? There
have been several readdir bugfixes merged in the months since -rc1 came
out.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* still nfs problems [Was: Linux 2.6.37-rc8]
  2011-01-14  2:40                       ` Trond Myklebust
@ 2011-01-14  4:22                         ` Andy Isaacson
  0 siblings, 0 replies; 79+ messages in thread
From: Andy Isaacson @ 2011-01-14  4:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 13, 2011 at 09:40:23PM -0500, Trond Myklebust wrote:
> > My server is running 2.6.36.1, filesystem is ext3 on sda3 on AHCI,
> > client is currently running 2.6.37-rc1.  I'm assuming that 37a09f will
> > fix it.
> 
> Why are you sticking to 2.6.37-rc1 when the final 2.6.37 is out? There
> have been several readdir bugfixes merged in the months since -rc1 came
> out.

No good reason, just hadn't run into any reasons to update; switching
kernels is nonzero cost since the machine in question runs some out of
tree modules ergo updating is slightly more involved than "make && make
install".  Above and beyond the costs of rebooting and losing state.

Actual bug is obviously a good reason to update though.

-andy

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2011-01-14  4:22 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 19:05 still nfs problems [Was: Linux 2.6.37-rc8] James Bottomley
2011-01-05 19:18 ` Linus Torvalds
2011-01-05 19:36   ` James Bottomley
2011-01-05 19:49     ` Linus Torvalds
2011-01-05 20:35       ` James Bottomley
2011-01-05 20:00     ` Russell King - ARM Linux
2011-01-05 20:33       ` James Bottomley
2011-01-05 20:48         ` Linus Torvalds
2011-01-05 21:04           ` Russell King - ARM Linux
2011-01-05 21:08             ` Linus Torvalds
2011-01-05 21:16               ` Trond Myklebust
2011-01-05 21:30                 ` Linus Torvalds
2011-01-05 23:06                   ` Trond Myklebust
2011-01-05 23:28                     ` James Bottomley
2011-01-06 17:40                       ` James Bottomley
2011-01-06 17:47                         ` Trond Myklebust
2011-01-06 17:51                           ` James Bottomley
2011-01-06 17:55                           ` Linus Torvalds
2011-01-07 18:53                             ` Trond Myklebust
2011-01-07 19:02                               ` Russell King - ARM Linux
2011-01-07 19:11                                 ` James Bottomley
2011-01-08 16:49                                   ` Trond Myklebust
2011-01-08 23:15                                     ` Trond Myklebust
2011-01-10 10:50                                       ` Uwe Kleine-König
2011-01-10 16:25                                         ` Trond Myklebust
2011-01-10 17:08                                           ` Marc Kleine-Budde
2011-01-10 17:20                                             ` Trond Myklebust
2011-01-10 17:26                                               ` Marc Kleine-Budde
2011-01-10 19:25                                               ` Uwe Kleine-König
2011-01-10 19:29                                                 ` Trond Myklebust
2011-01-10 19:31                                                   ` James Bottomley
2011-01-10 19:34                                                   ` Linus Torvalds
2011-01-10 20:15                                                     ` Trond Myklebust
2011-01-10 12:44                                       ` Marc Kleine-Budde
2011-01-07 19:13                                 ` Trond Myklebust
2011-01-07 19:05                               ` James Bottomley
2011-01-06 18:05                         ` Russell King - ARM Linux
2011-01-06 18:14                           ` James Bottomley
2011-01-06 18:25                             ` James Bottomley
2011-01-06 21:07                               ` James Bottomley
2011-01-06 20:19                         ` John Stoffel
2011-01-05 23:28                     ` Linus Torvalds
2011-01-05 23:59                       ` Russell King - ARM Linux
2011-01-05 21:16           ` James Bottomley
     [not found] <AANLkTi=-dNeeDjcSoznKtwcaNyw1mMXSqepFY89R2i+2@mail.gmail.com>
2010-12-30 17:14 ` Uwe Kleine-König
2010-12-30 17:57   ` Linus Torvalds
2010-12-30 18:24     ` Trond Myklebust
2010-12-30 18:50       ` Linus Torvalds
2010-12-30 19:25         ` Trond Myklebust
2010-12-30 20:02           ` Linus Torvalds
2010-12-30 17:59   ` Trond Myklebust
2010-12-30 19:18     ` Uwe Kleine-König
2011-01-03 21:38       ` Uwe Kleine-König
2011-01-04  0:22         ` Trond Myklebust
2011-01-05  8:40           ` Uwe Kleine-König
2011-01-05 11:05             ` Uwe Kleine-König
2011-01-05 11:27               ` Russell King - ARM Linux
2011-01-05 12:14                 ` Marc Kleine-Budde
2011-01-05 13:02                   ` Nori, Sekhar
2011-01-05 15:34                     ` Russell King - ARM Linux
2011-01-05 13:40                 ` Uwe Kleine-König
2011-01-05 14:29                   ` Jim Rees
2011-01-05 14:42                     ` Marc Kleine-Budde
2011-01-05 15:38                       ` Jim Rees
2011-01-05 14:53                   ` Trond Myklebust
2011-01-05 15:01                     ` Marc Kleine-Budde
2011-01-05 15:14                       ` Trond Myklebust
2011-01-05 15:29                         ` Trond Myklebust
2011-01-05 15:39                           ` Marc Kleine-Budde
2011-01-05 15:52                         ` Russell King - ARM Linux
2011-01-05 17:17                           ` Trond Myklebust
2011-01-05 17:26                             ` Russell King - ARM Linux
2011-01-05 18:12                               ` Trond Myklebust
2011-01-05 18:27                                 ` Russell King - ARM Linux
2011-01-05 18:55                                   ` Trond Myklebust
2011-01-05 19:07                                     ` Russell King - ARM Linux
2011-01-14  2:25                     ` Andy Isaacson
2011-01-14  2:40                       ` Trond Myklebust
2011-01-14  4:22                         ` Andy Isaacson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).