linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pagevecs of more than PAGEVEC_SIZE pages
@ 2008-05-22 19:13 Steve French
  2008-05-23  5:54 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Steve French @ 2008-05-22 19:13 UTC (permalink / raw)
  To: linux-fsdevel, lkml

Now that various servers support processing larger than 64K file
writes, it would be helpful to have pagevecs that are larger than 14
pages (PAGEVEC_SIZE).so an array of more pages could be passed to
kernel_sendmsg (thus allowing writes larger than 56K, depending on
wsize - the server limit is currently 8MB)

pagevec_lookup_tag (and the inline function pagevec_space which I
don't use) is the main place that cifs needs a pagevec instead of
simply an array of pages and PAGEVEC_SIZE does not seem to matter
there (it works if I malloc a structure with more pages and request
more than 14 pages out of pagevec_lookup_tag).  Is there another
better alternative other than defining a larger pagevec (pagevec2 or
cifs_pagevec)?  Should I simply kmalloc something larger than a
pagevec and cast a (struct pagevec *) to it?

 static int cifs_writepages(struct address_space *mapping,
                           struct writeback_control *wbc)
 {
@@ -1209,7 +1216,7 @@ static int cifs_writepages(struct address_space *mapping,
        __u64 offset = 0;
        struct cifsFileInfo *open_file;
        struct page *page;
-       struct pagevec pvec;
+       struct pagevec2 pvec;
        int rc = 0;
        int scanned = 0;
        int xid;
@@ -1229,7 +1236,7 @@ static int cifs_writepages(struct address_space *mapping,
                        if (!experimEnabled)
                                return generic_writepages(mapping, wbc);

-       iov = kmalloc(32 * sizeof(struct kvec), GFP_KERNEL);
+       iov = kmalloc(maximum_pages_to_write_based_on_wsize *
sizeof(struct kvec), GFP_KERNEL);
        if (iov == NULL)
                return generic_writepages(mapping, wbc);

@@ -1261,7 +1268,7 @@ retry:
        while (!done && (index <= end) &&
               (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
                        PAGECACHE_TAG_DIRTY,
-                       min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1))) {
+                       min(end - index,
(pgoff_t)maximum_pages_to_write_based_on_wsize - 1) + 1))) {
                int first;
                unsigned int i;



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pagevecs of more than PAGEVEC_SIZE pages
  2008-05-22 19:13 pagevecs of more than PAGEVEC_SIZE pages Steve French
@ 2008-05-23  5:54 ` Andrew Morton
  2008-05-23  9:29   ` Evgeniy Polyakov
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2008-05-23  5:54 UTC (permalink / raw)
  To: Steve French; +Cc: linux-fsdevel, lkml

On Thu, 22 May 2008 14:13:07 -0500 "Steve French" <smfrench@gmail.com> wrote:

> Now that various servers support processing larger than 64K file
> writes, it would be helpful to have pagevecs that are larger than 14
> pages (PAGEVEC_SIZE).so an array of more pages could be passed to
> kernel_sendmsg (thus allowing writes larger than 56K, depending on
> wsize - the server limit is currently 8MB)

You can pass multiple pagevecs into kernel_sendmsg?

> pagevec_lookup_tag (and the inline function pagevec_space which I
> don't use) is the main place that cifs needs a pagevec instead of
> simply an array of pages and PAGEVEC_SIZE does not seem to matter
> there (it works if I malloc a structure with more pages and request
> more than 14 pages out of pagevec_lookup_tag).  Is there another
> better alternative other than defining a larger pagevec (pagevec2 or
> cifs_pagevec)?  Should I simply kmalloc something larger than a
> pagevec and cast a (struct pagevec *) to it?

We could make pagevecs variable-sized easily enough - add a `struct
page **' to it.  But it's more overhead and this is the first time I've
seen demand for it?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pagevecs of more than PAGEVEC_SIZE pages
  2008-05-23  5:54 ` Andrew Morton
@ 2008-05-23  9:29   ` Evgeniy Polyakov
  2008-05-23 17:13     ` Steve French
  0 siblings, 1 reply; 5+ messages in thread
From: Evgeniy Polyakov @ 2008-05-23  9:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Steve French, linux-fsdevel, lkml

Hi.

On Thu, May 22, 2008 at 10:54:59PM -0700, Andrew Morton (akpm@linux-foundation.org) wrote:
> > Now that various servers support processing larger than 64K file
> > writes, it would be helpful to have pagevecs that are larger than 14
> > pages (PAGEVEC_SIZE).so an array of more pages could be passed to
> > kernel_sendmsg (thus allowing writes larger than 56K, depending on
> > wsize - the server limit is currently 8MB)
> 
> You can pass multiple pagevecs into kernel_sendmsg?

kernel_sendmsg() requires iovecs, so each page will have to be mapped,
which can end up having too many slots in kmap table. I can reproduce
deadlock we talked about in POHMELFS thread with 100 highmem pages per
transaction on 32bit x86, so increasing pagevec noticebly can deadlock
CIFS too.

It could be great if CIFS moved to kernel_sendpage() instead, Steve?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pagevecs of more than PAGEVEC_SIZE pages
  2008-05-23  9:29   ` Evgeniy Polyakov
@ 2008-05-23 17:13     ` Steve French
  2008-05-24 10:38       ` Evgeniy Polyakov
  0 siblings, 1 reply; 5+ messages in thread
From: Steve French @ 2008-05-23 17:13 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Andrew Morton, linux-fsdevel, lkml

Sounds interesting - looks like the only user at the moment is NFS's
SunRPC right (so no other examples to look at)?

How much better is this than send_msg?

For the write path (coming from writepage and writepages) - we have a
small header followed by a list of sequential pages (the servers
either support 4 pages (old windows), 15 pages (some windows and
NetApp etc. filers), or 31 pages (older Samba and other Windows) or
2048 pages (current Samba supports up to 8MB writes, and this may be
very useful now that Samba can call splice/receivefile and not have to
do the extra copy)

On Fri, May 23, 2008 at 4:29 AM, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> Hi.
>
> On Thu, May 22, 2008 at 10:54:59PM -0700, Andrew Morton (akpm@linux-foundation.org) wrote:
>> > Now that various servers support processing larger than 64K file
>> > writes, it would be helpful to have pagevecs that are larger than 14
>> > pages (PAGEVEC_SIZE).so an array of more pages could be passed to
>> > kernel_sendmsg (thus allowing writes larger than 56K, depending on
>> > wsize - the server limit is currently 8MB)
>>
>> You can pass multiple pagevecs into kernel_sendmsg?
>
> kernel_sendmsg() requires iovecs, so each page will have to be mapped,
> which can end up having too many slots in kmap table. I can reproduce
> deadlock we talked about in POHMELFS thread with 100 highmem pages per
> transaction on 32bit x86, so increasing pagevec noticebly can deadlock
> CIFS too.
>
> It could be great if CIFS moved to kernel_sendpage() instead, Steve?
>
> --
>        Evgeniy Polyakov
>



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: pagevecs of more than PAGEVEC_SIZE pages
  2008-05-23 17:13     ` Steve French
@ 2008-05-24 10:38       ` Evgeniy Polyakov
  0 siblings, 0 replies; 5+ messages in thread
From: Evgeniy Polyakov @ 2008-05-24 10:38 UTC (permalink / raw)
  To: Steve French; +Cc: Andrew Morton, linux-fsdevel, lkml

On Fri, May 23, 2008 at 12:13:58PM -0500, Steve French (smfrench@gmail.com) wrote:
> Sounds interesting - looks like the only user at the moment is NFS's
> SunRPC right (so no other examples to look at)?
> 
> How much better is this than send_msg?

It depends. In usual life with not that big requests and 64bit arch (or
low mem pages on 32 bit x86) difference is marginal, but with high-pages
and huge requests it was additional 10 MB/s for bulk writing in POHMELFS.

> For the write path (coming from writepage and writepages) - we have a
> small header followed by a list of sequential pages (the servers
> either support 4 pages (old windows), 15 pages (some windows and
> NetApp etc. filers), or 31 pages (older Samba and other Windows) or
> 2048 pages (current Samba supports up to 8MB writes, and this may be
> very useful now that Samba can call splice/receivefile and not have to
> do the extra copy)

kernel_sendpage() has only single page argument, but that should not be
a problem, since stack will combine multiple pages into single skb if
needed (if it can, especially for stuff like TSO/GSO), so just loop for
whatever you fetched pages via find_get_pages_tag() and send them
one-by-one, network stack will coalesce them by itself.

But note that return of kernel_sendpage() does not guarantee that page
has been sent, one can free it (drop refcnt, page will not be freed,
since network stack increases counter), but can not write there until
reply from the server is returned, so just lock page and mark it is
being under writeback and then unlock and end_page_writeback() after
CIFSSMBWrite2() just like it is doen right now.

Btw, should'n end_page_writeback() be called with locked page in CIFS?
Iirc in POHMELFS under huge load I was able to lose writeback clearing
probably because of that, but its subtle and can be not the issue is
CIFS.

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8636cec..ba23c65 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1391,8 +1391,8 @@ retry:
 				if (rc)
 					SetPageError(page);
 				kunmap(page);
-				unlock_page(page);
 				end_page_writeback(page);
+				unlock_page(page);
 				page_cache_release(page);
 			}
 			if ((wbc->nr_to_write -= n_iov) <= 0)

-- 
	Evgeniy Polyakov

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-05-24 10:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-22 19:13 pagevecs of more than PAGEVEC_SIZE pages Steve French
2008-05-23  5:54 ` Andrew Morton
2008-05-23  9:29   ` Evgeniy Polyakov
2008-05-23 17:13     ` Steve French
2008-05-24 10:38       ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).