From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:49173 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759474AbcCDWNz (ORCPT ); Fri, 4 Mar 2016 17:13:55 -0500 Subject: Re: [PATCH 0/11] Update version of write stream ID patchset To: Jeff Moyer References: <1457107853-8689-1-git-send-email-axboe@fb.com> <56D9F141.9070803@fb.com> <56D9F8E1.2080702@fb.com> CC: , , , , , "Martin K. Petersen" From: Jens Axboe Message-ID: <56DA0892.4050007@fb.com> Date: Fri, 4 Mar 2016 15:13:38 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 03/04/2016 03:03 PM, Jeff Moyer wrote: > Jens Axboe writes: > >> On 03/04/2016 02:01 PM, Jeff Moyer wrote: >>> OK. I'm still of the opinion that we should try to make this >>> transparent. I could be swayed by workload descriptions and numbers >>> comparing approaches, though. >> >> You can't just waive that flag and not have a solution. Any solution >> in that space would imply having policy in the kernel. A "just use a >> stream per file" is never going to work. > > Jens, I'm obviously missing a lot of the background information, here. > I want to stress that I'm not against your patches. I'm just trying to > understand if there's a sensible way to use the write stream support in > the kernel so that applcations don't /have/ to be converted. It sounds > like that's hard, and without any specs or hardware, I'm not going to be > able to even try to come up with solutions to that problem. It's not hard to update an application to do this. As an example, one thing I tried was converting RocksDB to use streams. A naive approach was used, where we simply mapped each compaction level to a specific stream, and got about a 30% reduction in WA just through that. The guys from Samsung has done that with RocksDB as well, just a bit more involved, and got better results. The application change was really no more involved than calling fadvise() on the fd after opening it. That is it. I don't know why you think that is hard. As to doing this automagically, you'll need knowledge that you do not have. The kernel or file system has no idea if data written to file X and file Y have similar life times. You could start tracking that, of course, but that would make you very unhappy. If I'm an application storing files, I have a much better idea of what is related time wise. And you don't really need a spec to understand how this works, the spec will just tell you the mechanics of how we pass this information to the device, how we find out what the device can support, etc. The basic gist of it is that we can write data with similar life times to the right place on media. For a flash disk, that would be the same EB. > I think it > would make for interesting research, though. I recall a paper from one > of the USENIX conferences that dealt with automatically identifying > write streams on a network storage server, but alas, I can't find the > reference right now. Samsung released a paper on RocksDB and streams, iirc. -- Jens Axboe