* Re: [PATCH 2/2] zram: do not count duplicated pages as compressed [not found] <20170516073616.GB767@jagdpanzerIV.localdomain> @ 2017-05-17 8:32 ` Minchan Kim 2017-05-17 9:14 ` Sergey Senozhatsky 2017-05-21 7:04 ` Christoph Hellwig 0 siblings, 2 replies; 5+ messages in thread From: Minchan Kim @ 2017-05-17 8:32 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, linux-kernel, Joonsoo Kim, Sergey Senozhatsky, kernel-team, linux-fsdevel, linux-block, hch, viro, axboe, jlayton, tytso Hi Sergey, On Tue, May 16, 2017 at 04:36:17PM +0900, Sergey Senozhatsky wrote: > On (05/16/17 16:16), Minchan Kim wrote: > > > but would this be correct? the data is not valid - we failed to store > > > the valid one. but instead we assure application that read()/swapin/etc., > > > depending on the usage scenario, is successful (even though the data is > > > not what application really expects to see), application tries to use the > > > data from that page and probably crashes (dunno, for example page contained > > > hash tables with pointers that are not valid anymore, etc. etc.). > > > > > > I'm not optimistic about stale data reads; it basically will look like > > > data corruption to the application. > > > > Hmm, I don't understand what you say. > > My point is zram_free_page should be done only if whoe write operation > > is successful. > > With you change, following situation can happens. > > > > write block 4, 'all A' -> success > > read block 4, 'all A' verified -> Good > > write block 4, 'all B' -> but failed with ENOMEM > > read block 4 expected 'all A' but 'all 0' -> Oops > > yes. 'all A' in #4 can be incorrect. zram can be used as a block device > with a file system, and pid that does write op not necessarily does read > op later. it can be a completely different application. e.g. compilation, > or anything else. > > suppose PID A does > > wr block 1 all a > wr block 2 all a + 1 > wr block 3 all a + 2 > wr block 4 all a + 3 > > now PID A does > > wr block 1 all m > wr block 2 all m + 1 > wr block 3 all m + 2 > wr block 4 failed. block still has 'all a + 3'. > exit > > another application, PID C, reads in the file and tries to do > something sane with it > > rd block 1 all m > rd block 2 all m + 1 > rd block 3 all m + 3 > rd block 4 all a + 3 << this is dangerous. we should return > error from read() here; not stale data. > > > what we can return now is a `partially updated' data, with some new > and some stale pages. this is quite unlikely to end up anywhere good. > am I wrong? > > why does `rd block 4' in your case causes Oops? as a worst case scenario? > application does not expect page to be 'all A' at this point. pages are > likely to belong to some mappings/files/etc., and there is likely a data > dependency between them, dunno C++ objects that span across pages or > JPEG images, etc. so returning "new data new data stale data" is a bit > fishy. I thought more about it and start to confuse. :/ So, let's Cc linux-block, fs peoples. The question is that Is block device(esp, zram which is compressed ram block device) okay to return garbage when ongoing overwrite IO fails? O_DIRECT write 4 block "aaa.." -> success read 4 block "aaa.." -> success O_DIRECT write 4 block "bbb.." -> fail read 4 block "000..' -> it is okay? Hope to get an answer form experts. :) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] zram: do not count duplicated pages as compressed 2017-05-17 8:32 ` [PATCH 2/2] zram: do not count duplicated pages as compressed Minchan Kim @ 2017-05-17 9:14 ` Sergey Senozhatsky 2017-05-18 4:53 ` Minchan Kim 2017-05-21 7:04 ` Christoph Hellwig 1 sibling, 1 reply; 5+ messages in thread From: Sergey Senozhatsky @ 2017-05-17 9:14 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Joonsoo Kim, Sergey Senozhatsky, kernel-team, linux-fsdevel, linux-block, hch, viro, axboe, jlayton, tytso Hello Minchan, On (05/17/17 17:32), Minchan Kim wrote: [..] > > what we can return now is a `partially updated' data, with some new > > and some stale pages. this is quite unlikely to end up anywhere good. > > am I wrong? > > > > why does `rd block 4' in your case causes Oops? as a worst case scenario? > > application does not expect page to be 'all A' at this point. pages are > > likely to belong to some mappings/files/etc., and there is likely a data > > dependency between them, dunno C++ objects that span across pages or > > JPEG images, etc. so returning "new data new data stale data" is a bit > > fishy. > > I thought more about it and start to confuse. :/ sorry, I'm not sure I see what's the source of your confusion :) my point is - we should not let READ succeed if we know that WRITE failed. assume JPEG image example, over-write block 1 aaa->xxx OK over-write block 2 bbb->yyy OK over-write block 3 ccc->zzz error reading that JPEG file read block 1 xxx OK read block 2 yyy OK read block 3 ccc OK << we should not return OK here. because "xxxyyyccc" is not the correct JPEG file anyway. do you agree that telling application that read() succeeded and at the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is not correct? so how about this, - if we fail to compress page (S/W or H/W compressor error, depending on particular setup) let's store it uncompressed (page_size-d zspool object). ? this should do the trick. at least we will have correct data: xxx - compressed yyy - compressed zzz - uncompressed, because compressing back-end returned an error. thoughts? -ss ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] zram: do not count duplicated pages as compressed 2017-05-17 9:14 ` Sergey Senozhatsky @ 2017-05-18 4:53 ` Minchan Kim 0 siblings, 0 replies; 5+ messages in thread From: Minchan Kim @ 2017-05-18 4:53 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, linux-kernel, Joonsoo Kim, Sergey Senozhatsky, kernel-team, linux-fsdevel, linux-block, hch, viro, axboe, jlayton, tytso Hi Sergey, On Wed, May 17, 2017 at 06:14:23PM +0900, Sergey Senozhatsky wrote: > Hello Minchan, > > On (05/17/17 17:32), Minchan Kim wrote: > [..] > > > what we can return now is a `partially updated' data, with some new > > > and some stale pages. this is quite unlikely to end up anywhere good. > > > am I wrong? > > > > > > why does `rd block 4' in your case causes Oops? as a worst case scenario? > > > application does not expect page to be 'all A' at this point. pages are > > > likely to belong to some mappings/files/etc., and there is likely a data > > > dependency between them, dunno C++ objects that span across pages or > > > JPEG images, etc. so returning "new data new data stale data" is a bit > > > fishy. > > > > I thought more about it and start to confuse. :/ > > sorry, I'm not sure I see what's the source of your confusion :) > > my point is - we should not let READ succeed if we know that WRITE > failed. assume JPEG image example, I don't think we shoul do it. I will write down my thought below. :) > > > over-write block 1 aaa->xxx OK > over-write block 2 bbb->yyy OK > over-write block 3 ccc->zzz error > > reading that JPEG file > > read block 1 xxx OK > read block 2 yyy OK > read block 3 ccc OK << we should not return OK here. because > "xxxyyyccc" is not the correct JPEG file > anyway. > > do you agree that telling application that read() succeeded and at > the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is > not correct? I don't agree. I *think* block device is a just dumb device so zram doesn't need to know about any objects from the upper layer. What zram should consider is basically read/write success or fail of IO unit(maybe, BIO). So if we assume each step from above example is bio unit, I think it's no problem returns "xxxyyyccc". What I meant "started confused" was about atomicity, not above thing. I think it's okay to return ccc instead of zzz but is it okay zram to return "000", not "ccc" and "zzz"? My conclusion is that it's okay now after discussion from one of my FS friends. Let's think about it. FS requests write "aaa" to block 4 and fails by somethings (H/W failure, S/W failure like ENOMEM). The interface to catch the failure is the function registered by bio_endio which is normally handles by AS_EIO by mappint_set_error as well as PG_error flags of the page. In this case, FS assumes the block 4 can have stale data, not 'zzz' and 'ccc' because the device was broken in the middle of write some data to a block if the block device doesn't support atomic write(I guess it's more popular) so it would be safe to consider the block has garbage now rather than old value, new value. (I hope I explain my thought well :/) Having said that, I think everyone likes block device supports atomicity(ie, old or new). so I am reluctant to change the behavior for simple refactoring. Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] zram: do not count duplicated pages as compressed 2017-05-17 8:32 ` [PATCH 2/2] zram: do not count duplicated pages as compressed Minchan Kim 2017-05-17 9:14 ` Sergey Senozhatsky @ 2017-05-21 7:04 ` Christoph Hellwig 2017-05-21 7:15 ` Minchan Kim 1 sibling, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2017-05-21 7:04 UTC (permalink / raw) To: Minchan Kim Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Joonsoo Kim, Sergey Senozhatsky, kernel-team, linux-fsdevel, linux-block, hch, viro, axboe, jlayton, tytso On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote: > Is block device(esp, zram which is compressed ram block device) okay to > return garbage when ongoing overwrite IO fails? > > O_DIRECT write 4 block "aaa.." -> success > read 4 block "aaa.." -> success > O_DIRECT write 4 block "bbb.." -> fail > read 4 block "000..' -> it is okay? > > Hope to get an answer form experts. :) It's "okay" as it's what existing real block devices do (at least on a sector boundary). It's not "nice" though, so if you can avoid it, please do. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] zram: do not count duplicated pages as compressed 2017-05-21 7:04 ` Christoph Hellwig @ 2017-05-21 7:15 ` Minchan Kim 0 siblings, 0 replies; 5+ messages in thread From: Minchan Kim @ 2017-05-21 7:15 UTC (permalink / raw) To: Christoph Hellwig Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, Joonsoo Kim, Sergey Senozhatsky, kernel-team, linux-fsdevel, linux-block, viro, axboe, jlayton, tytso On Sun, May 21, 2017 at 12:04:27AM -0700, Christoph Hellwig wrote: > On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote: > > Is block device(esp, zram which is compressed ram block device) okay to > > return garbage when ongoing overwrite IO fails? > > > > O_DIRECT write 4 block "aaa.." -> success > > read 4 block "aaa.." -> success > > O_DIRECT write 4 block "bbb.." -> fail > > read 4 block "000..' -> it is okay? > > > > Hope to get an answer form experts. :) > > It's "okay" as it's what existing real block devices do (at least on a > sector boundary). It's not "nice" though, so if you can avoid it, > please do. That was my understanding so I wanted to avoid it for just simple code refactoring. Your comment helps to confirm the thhought. Thanks, Christoph! ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-05-21 7:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170516073616.GB767@jagdpanzerIV.localdomain>
2017-05-17 8:32 ` [PATCH 2/2] zram: do not count duplicated pages as compressed Minchan Kim
2017-05-17 9:14 ` Sergey Senozhatsky
2017-05-18 4:53 ` Minchan Kim
2017-05-21 7:04 ` Christoph Hellwig
2017-05-21 7:15 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).