public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: splice(SPLICE_F_MOVE) problems
Date: Wed, 3 May 2006 08:56:45 +0200	[thread overview]
Message-ID: <20060503065644.GJ9712@suse.de> (raw)
In-Reply-To: <20060503041455.GA158@oleg>

On Wed, May 03 2006, Oleg Nesterov wrote:
> I am looking at current fs/splice.c from
> 	http://kernel.org/git/?p=linux/kernel/git/torvalds/....
> 
> 
> 	pipe_to_file:
> 
> 		if ((sd->flags & SPLICE_F_MOVE) && this_len == PAGE_CACHE_SIZE) {
> 			/*
> 			 * If steal succeeds, buf->page is now pruned from the vm
> 			 * side (page cache) and we can reuse it. The page will also
> 			 * be locked on successful return.
> 			 */
> 			if (buf->ops->steal(info, buf))
> 				goto find_page;
> 
> 			page = buf->page;
> 			page_cache_get(page);
> 
> Isn't it better to do this after successful successful add_to_page_cache() ?
> This way you don't need to do page_cache_release() if add_to_page_cache fails.

It is, I dunno why I moved the ordering around when fixing that up. I'll
move it down a block and skip the early page_cache_get.

> 			/*
> 			 * page must be on the LRU for adding to the pagecache.
> 
> Is it true? (looking at add_to_page_cache_lru).

My wording is a little confusing probably, I didn't mean to imply any
ordering contraints there. Just simply state that a page cannot be in
the pagecache and not on the LRU.

> 			if (!PageLRU(page)) {
> 				struct zone *zone = page_zone(page);
> 
> 				spin_lock_irq(&zone->lru_lock);
> 				if (!PageLRU(page)) {
> 					SetPageLRU(page);
> 					add_page_to_inactive_list(zone, page);
> 				}
> 				spin_unlock_irq(&zone->lru_lock);
> 
> Why not lru_cache_add() ?

Because for calling lru_cache_add, I have to be absolutely certain that
the page isn't on the LRU already. So I'd have to grab the zone lru_lock
first anyways, and I cannot call lru_cache_add() with that held since
__pagevec_lru_add() also grabs it. Hence the "open coding".

> 			if (add_to_page_cache(page, mapping, index, gfp_mask)) {
> 				page_cache_release(page);
> 				unlock_page(page);
> 				goto find_page;
> 
> This leaves unmapped page on ->inactive_list. page_count(page) == 1.
> What if shrink_inactive_list() finds this page, increments
> page->count, and calls shrink_page_list()->remove_mapping() again?
> Hmm... So page_cache_pipe_buf_steal() has a reason to return a locked
> page (I am not an expert, please correct me).

With the block moved above the LRU business, this disappears as well.

> However, user_page_pipe_buf_steal() returns unlocked page in
> PIPE_BUF_FLAG_GIFT case. So, if add_to_page_cache() fails,
> unlock_page() will trigger BUG().

It does, it calls generic_pipe_buf_steal() which locks it.

> 		ret = mapping->a_ops->prepare_write(file, page, offset, offset+this_len);
> 		if (ret == AOP_TRUNCATED_PAGE) {
> 			page_cache_release(page);
> 			goto find_page;
> 
> We also need to unlock(page) if it was stealed.

Are you sure that's the right test? Don't you mean if ret !=
AOP_TRUNCATED_PAGE && ret?

How about the attached?

diff --git a/fs/splice.c b/fs/splice.c
index 7fb0497..9b10ec4 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -574,12 +574,17 @@ static int pipe_to_file(struct pipe_inod
 			goto find_page;
 
 		page = buf->page;
+		if (add_to_page_cache(page, mapping, index, gfp_mask)) {
+			unlock_page(page);
+			goto find_page;
+		}
+
 		page_cache_get(page);
 
 		/*
-		 * page must be on the LRU for adding to the pagecache.
-		 * Check this without grabbing the zone lock, if it isn't
-		 * the do grab the zone lock, recheck, and add if necessary.
+		 * The page cannot reside in the pagecache and not be on
+		 * the LRU list. It should be safe to check PageLRU()
+		 * outside the zone lru_lock, as long as we recheck it inside.
 		 */
 		if (!PageLRU(page)) {
 			struct zone *zone = page_zone(page);
@@ -591,12 +596,6 @@ static int pipe_to_file(struct pipe_inod
 			}
 			spin_unlock_irq(&zone->lru_lock);
 		}
-
-		if (add_to_page_cache(page, mapping, index, gfp_mask)) {
-			page_cache_release(page);
-			unlock_page(page);
-			goto find_page;
-		}
 	} else {
 find_page:
 		page = find_lock_page(mapping, index);
@@ -647,11 +646,15 @@ find_page:
 	}
 
 	ret = mapping->a_ops->prepare_write(file, page, offset, offset+this_len);
-	if (ret == AOP_TRUNCATED_PAGE) {
+	if (unlikely(ret)) {
+		if (ret != AOP_TRUNCATED_PAGE)
+			unlock_page(page);
 		page_cache_release(page);
-		goto find_page;
-	} else if (ret)
+		if (ret == AOP_TRUNCATED_PAGE)
+			goto find_page;
+
 		goto out;
+	}
 
 	if (buf->page != page) {
 		/*

-- 
Jens Axboe


  reply	other threads:[~2006-05-03  6:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-01  6:59 splice(SPLICE_F_MOVE) problems Oleg Nesterov
2006-05-01  6:54 ` Jens Axboe
2006-05-01 19:06   ` Oleg Nesterov
2006-05-01 17:41     ` Jens Axboe
2006-05-02  0:11       ` Oleg Nesterov
2006-05-02  5:28         ` Jens Axboe
2006-05-03  4:14           ` Oleg Nesterov
2006-05-03  6:56             ` Jens Axboe [this message]
2006-05-03 14:35               ` Oleg Nesterov
2006-05-03 10:48                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060503065644.GJ9712@suse.de \
    --to=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox