Re: [PATCH 1/4] Support generic I/O requests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nitin Gupta <ngupta@vflare.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>, Pekka Enberg <penberg@cs.helsinki.fi>,
	Minchan Kim <minchan.kim@gmail.com>, Ed Tomlinson <edt@aei.ca>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>, Cyp <cyp561@gmail.com>,
	driverdev <devel@driverdev.osuosl.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] Support generic I/O requests
Date: Sat, 05 Jun 2010 13:14:26 +0530	[thread overview]
Message-ID: <4C0A005A.10607@vflare.org> (raw)
In-Reply-To: <20100604121041.1b88b0af.akpm@linux-foundation.org>

On 06/05/2010 12:40 AM, Andrew Morton wrote:
> On Tue,  1 Jun 2010 13:31:23 +0530
> Nitin Gupta <ngupta@vflare.org> wrote:
> 

>> Usage/Examples:
>>  1) Use as /tmp storage
>>  - mkfs.ext4 /dev/zram0
>>  - mount /dev/zram0 /tmp
> 
> hm, how does that work?  The "device" will only handle page-sized and
> page-aligned requests, won't it?  Can you walk us through what happens
> when the fs does a 512-byte I/O?
> 

Yes, it still handles page-size aligned, n*page_size sized I/O requests.

Unfortunately, I don't know much of vfs/filesystem details, so I could not
trace out the exact path. But, given that we set logical and physical sector
size to PAGE_SIZE, the block layer (and filesytem) should make sure that we
get correctly aligned, correctly sized I/O requests. I just discovered this
fact through experimentation and didn't know making it a generic device is
actually this easy.

Given that I lack detailed knowledge in this area, there may be some corner
cases where we may get unaligned I/O requests (in which case we simply return
I/O error) but successful run of 'dd' and 'iozone' tests (links in patch 0/4)
increased my confidence in this :)

The only change that was needed to make it generic device was to
iterate over all bio segments (earlier it was hard-coded to handle
just the first one).

>>  - Double caching: We can potentially waste memory by having
>> two copies of a page -- one in page cache (uncompress) and
>> second in the device memory (compressed). However, during
>> reclaim, clean page cache pages are quickly freed, so this
>> does not seem to be a big problem.
> 
> Yes, clean pagecache is cheap.  But what happens when the pagecache
> copy of the page gets modified?
> 

Dirty pages are periodically flushed to disk (zram in this case) and
then it becomes clean again.

> Or is it the case that once a compressed page gets copied out to
> pagecache, the compressed version is never used again?  If so, the
> memory could be synchronously freed, so I guess I don't understand what
> you mean here.

We cannot free a page as soon as it is decompressed and added to page cache.
When a clean page is reclaimed, it is simply freed and not written out to
disk and thus, we will end up losing this data.

The only opportunity to free a (compressed) disk page is when filesystem
issues a block discard request or, when used as a swap disk, we get a swap
slot free notification (a callback for this was recently added to
struct block_device_operations).

> 
>>  - Stale data: Not all filesystems support issuing 'discard'
>> requests to underlying block devices. So, if such filesystems
>> are used over zram devices, we can accumulate lot of stale
>> data in memory. Even for filesystems to do support discard
>> (example, ext4), we need to see how effective it is.
> 
> Can you walk us through how zram uses discard requests?
> 

I could not get discard working (yet), so support for this was
removed from these patches. I hope to include it soon.

Thanks for your feedback.
Nitin

next prev parent reply	other threads:[~2010-06-05  7:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01  8:01 [PATCH 0/4] zram: generic RAM based compressed R/W block devices (v2) Nitin Gupta
2010-06-01  8:01 ` [PATCH 1/4] Support generic I/O requests Nitin Gupta
2010-06-02  6:20   ` Minchan Kim
2010-06-02  7:36     ` Nitin Gupta
2010-06-04 19:19     ` Andrew Morton
2010-06-05  8:28       ` Pekka Enberg
2010-06-05  8:36         ` Pekka Enberg
2010-06-05  8:35       ` Pekka Enberg
2010-06-05 13:15         ` Nitin Gupta
2010-06-06 23:32         ` Minchan Kim
2010-06-07  5:35           ` Pekka Enberg
2010-06-04 19:10   ` Andrew Morton
2010-06-05  7:44     ` Nitin Gupta [this message]
2010-06-01  8:01 ` [PATCH 2/4] Rename ramzswap files to zram Nitin Gupta
2010-06-04 19:32   ` Andrew Morton
2010-06-04 21:07     ` Greg KH
2010-06-05  7:00     ` Nitin Gupta
2010-06-01  8:01 ` [PATCH 3/4] Rename ramzswap to zram in code Nitin Gupta
2010-06-01  8:01 ` [PATCH 4/4] Rename ramzswap to zram in documentation Nitin Gupta
2010-06-01 10:25 ` [PATCH 0/4] zram: generic RAM based compressed R/W block devices (v2) Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C0A005A.10607@vflare.org \
    --to=ngupta@vflare.org \
    --cc=akpm@linux-foundation.org \
    --cc=cyp561@gmail.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=edt@aei.ca \
    --cc=greg@kroah.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan.kim@gmail.com \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.