netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* af_packet.c flush_dcache_page
@ 2007-10-31 14:08 Patrick McHardy
  2007-10-31 22:57 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick McHardy @ 2007-10-31 14:08 UTC (permalink / raw)
  To: Linux Netdev List

I'm currently adding mmap support to af_netlink based on the
af_packet implementation and I'm wondering about this code in
tpacket_rcv():

         h->tp_status = status;
         smp_mb();

         {
                 struct page *p_start, *p_end;
                 u8 *h_end = (u8 *)h + macoff + snaplen - 1;

                 p_start = virt_to_page(h);
                 p_end = virt_to_page(h_end);
                 while (p_start <= p_end) {
                         flush_dcache_page(p_start);
                         p_start++;
                 }
         }

Shouldn't the flushing be done in reverse order to make sure
that the page containing tp_status is flushed last and userspace
doesn't start looking at following pages before all dcache entries
are flushed?

A related question: Documentation/cachetlb.txt mentions that
flushing also needs to be done for reading of shared+writable
mapped pages, so it seems like we also need to call flush_dcache_page 
before the tp_status check earlier in that function and packet_poll().

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: af_packet.c flush_dcache_page
  2007-10-31 14:08 af_packet.c flush_dcache_page Patrick McHardy
@ 2007-10-31 22:57 ` David Miller
  2007-11-01 16:10   ` Patrick McHardy
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2007-10-31 22:57 UTC (permalink / raw)
  To: kaber; +Cc: netdev

From: Patrick McHardy <kaber@trash.net>
Date: Wed, 31 Oct 2007 15:08:02 +0100

> I'm currently adding mmap support to af_netlink based on the
> af_packet implementation and I'm wondering about this code in
> tpacket_rcv():
> 
>          h->tp_status = status;
>          smp_mb();
> 
>          {
>                  struct page *p_start, *p_end;
>                  u8 *h_end = (u8 *)h + macoff + snaplen - 1;
> 
>                  p_start = virt_to_page(h);
>                  p_end = virt_to_page(h_end);
>                  while (p_start <= p_end) {
>                          flush_dcache_page(p_start);
>                          p_start++;
>                  }
>          }
> 
> Shouldn't the flushing be done in reverse order to make sure
> that the page containing tp_status is flushed last and userspace
> doesn't start looking at following pages before all dcache entries
> are flushed?
> 
> A related question: Documentation/cachetlb.txt mentions that
> flushing also needs to be done for reading of shared+writable
> mapped pages, so it seems like we also need to call flush_dcache_page 
> before the tp_status check earlier in that function and packet_poll().

Thanks for bringing up this topic.

Instead of answering your questions, I'm going to show you
how to avoid having to do any of this cache flushing crap :-)

You can avoid having to flush anything as long as the virtual
addresses on the kernel side are modulo SHMLBA the virtual addresses
on the userland side.

We have some (decidedly awkward) mechanisms to try and achieve
this in the kernel, but they are cumbersome and not air tight.

Instead, I would recommend simply that you access the ring
buffer directly in userspace.  This avoids all of the cache
aliasing issues.

Yes, this means you have to do the ring buffer accesses in
the context of the user, but it simplifies so much that I think
it'd be worth it.

Another option is to use the "copy_to_user_page()" and
"copy_from_user_page()" interfaces which will do all of
the necessary cache flushing for you.

Actually it might be nice to convert AF_PACKET's mmap() code
over to using those things.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: af_packet.c flush_dcache_page
  2007-10-31 22:57 ` David Miller
@ 2007-11-01 16:10   ` Patrick McHardy
  2007-11-01 16:27     ` Evgeniy Polyakov
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick McHardy @ 2007-11-01 16:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller wrote:
> Instead of answering your questions, I'm going to show you
> how to avoid having to do any of this cache flushing crap :-)
> 
> You can avoid having to flush anything as long as the virtual
> addresses on the kernel side are modulo SHMLBA the virtual addresses
> on the userland side.
> 
> We have some (decidedly awkward) mechanisms to try and achieve
> this in the kernel, but they are cumbersome and not air tight.
> 
> Instead, I would recommend simply that you access the ring
> buffer directly in userspace.  This avoids all of the cache
> aliasing issues.
> 
> Yes, this means you have to do the ring buffer accesses in
> the context of the user, but it simplifies so much that I think
> it'd be worth it.


I'm probably misunderstanding your suggestion because of my
limited mm knowledge, are you suggesting to do something like
this:

setsockopt(RX_RING, ...):

Allocate ring using get_user_pages, return address to user

tpacket_rcv/netlink_unicast/netlink_broadcast:

for each receiver:
	switch_mm(...)
	copy data to ring

switch_mm(original mm)

Would this work in softirq context?

> Another option is to use the "copy_to_user_page()" and
> "copy_from_user_page()" interfaces which will do all of
> the necessary cache flushing for you.
> 
> Actually it might be nice to convert AF_PACKET's mmap() code
> over to using those things.


That would also require to do the copy in the context of
the user, right?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: af_packet.c flush_dcache_page
  2007-11-01 16:10   ` Patrick McHardy
@ 2007-11-01 16:27     ` Evgeniy Polyakov
  0 siblings, 0 replies; 4+ messages in thread
From: Evgeniy Polyakov @ 2007-11-01 16:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev

On Thu, Nov 01, 2007 at 05:10:32PM +0100, Patrick McHardy (kaber@trash.net) wrote:
> David Miller wrote:
> >Instead of answering your questions, I'm going to show you
> >how to avoid having to do any of this cache flushing crap :-)
> >
> >You can avoid having to flush anything as long as the virtual
> >addresses on the kernel side are modulo SHMLBA the virtual addresses
> >on the userland side.
> >
> >We have some (decidedly awkward) mechanisms to try and achieve
> >this in the kernel, but they are cumbersome and not air tight.
> >
> >Instead, I would recommend simply that you access the ring
> >buffer directly in userspace.  This avoids all of the cache
> >aliasing issues.
> >
> >Yes, this means you have to do the ring buffer accesses in
> >the context of the user, but it simplifies so much that I think
> >it'd be worth it.
> 
> 
> I'm probably misunderstanding your suggestion because of my
> limited mm knowledge, are you suggesting to do something like
> this:
> 
> setsockopt(RX_RING, ...):
> 
> Allocate ring using get_user_pages, return address to user
> 
> tpacket_rcv/netlink_unicast/netlink_broadcast:
> 
> for each receiver:
> 	switch_mm(...)
> 	copy data to ring
> 
> switch_mm(original mm)
> 
> Would this work in softirq context?

IIRC it requires disabled interrupts.

Probably David suggests to provide a pointer to allocated
in userspace buffer and use copy_to_user() and friends.

> >Another option is to use the "copy_to_user_page()" and
> >"copy_from_user_page()" interfaces which will do all of
> >the necessary cache flushing for you.
> >
> >Actually it might be nice to convert AF_PACKET's mmap() code
> >over to using those things.
> 
> 
> That would also require to do the copy in the context of
> the user, right?

Most of the time it is possible to call copy_to_user() in atomic
context, but it can fail, in which case some additional mechanism to
make a copy should be invented (workqueue, kthread, whatever you like :)

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-11-01 16:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 14:08 af_packet.c flush_dcache_page Patrick McHardy
2007-10-31 22:57 ` David Miller
2007-11-01 16:10   ` Patrick McHardy
2007-11-01 16:27     ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).