linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression
       [not found] <rnx34bfst5gyomkwooq2pvkxsjw5mrx5vxszhz7m4hy54yuma5@huwvwzgvrrru>
@ 2024-03-15 12:22 ` Jan Kara
  2024-03-18 10:59   ` David Disseldorp
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2024-03-15 12:22 UTC (permalink / raw)
  To: Enzo Matsumiya; +Cc: lsf-pc, linux-fsdevel, linux-cifs

Hello Enzo,

it is good to also CC appropriate public mailing lists so that other
attendees can discuss about your proposal. Added some I found relevant.

								Honza

On Thu 14-03-24 15:14:49, Enzo Matsumiya wrote:
> Hello,
> 
> Having implemented data compression for SMB2 messages in cifs.ko, I'd
> like to attend LSF/MM to discuss:
> 
> - implementation decisions, both in the protocol level and in the
>   compression algorithms; e.g. performance improvements, what could,
>   if possible/wanted, turn into a lib/ module, etc
> 
> - compression algorithms in general; talk about algorithms to determine
>   if/how compressible a blob of data is
>     * several such algorithms already exist and are used by on-disk
>       compression tools, but for over-the-wire compression maybe the
>       fastest one with good (not great nor best) predictability
>       could work?
> 
> - overlapping modules/areas that have the need/desire to compress
>   transmitting data and their status quo in the topic; difficulties
>   where I could help and/or achievements that I could learn from
> 
> 
> Cheers,
> 
> Enzo
> _______________________________________________
> Lsf-pc mailing list
> Lsf-pc@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lsf-pc
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression
  2024-03-15 12:22 ` [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression Jan Kara
@ 2024-03-18 10:59   ` David Disseldorp
  2024-03-22 21:23     ` Enzo Matsumiya
  0 siblings, 1 reply; 4+ messages in thread
From: David Disseldorp @ 2024-03-18 10:59 UTC (permalink / raw)
  To: Enzo Matsumiya; +Cc: Jan Kara, lsf-pc, linux-fsdevel, linux-cifs

Hi Enzo,

...
> On Thu 14-03-24 15:14:49, Enzo Matsumiya wrote:
> > Hello,
> > 
> > Having implemented data compression for SMB2 messages in cifs.ko, I'd
> > like to attend LSF/MM to discuss:
> > 
> > - implementation decisions, both in the protocol level and in the
> >   compression algorithms; e.g. performance improvements, what could,
> >   if possible/wanted, turn into a lib/ module, etc
> > 
> > - compression algorithms in general; talk about algorithms to determine
> >   if/how compressible a blob of data is
> >     * several such algorithms already exist and are used by on-disk
> >       compression tools, but for over-the-wire compression maybe the
> >       fastest one with good (not great nor best) predictability
> >       could work?

Ideally there could be some overlap between on-disk and over-the-wire
compression algorithm support. That could allow optimally aligned /
sized IOs to avoid unnecessary compression / decompression cycles on an
SMB server / client if the underlying filesystem supports encoded I/O
via e.g. BTRFS_IOC_ENCODED_READ/WRITE.

IIUC, we currently have:
SMB: LZ77, LZ77+Huffman (DEFLATE?), LZNT1, LZ4
Btrfs: zlib/DEFLATE, LZO, Zstd
Bcachefs: zlib/DEFLATE, LZ4, Zstd. Currently no encoded I/O support.

Cheers, David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression
  2024-03-18 10:59   ` David Disseldorp
@ 2024-03-22 21:23     ` Enzo Matsumiya
  2024-03-25 10:40       ` David Disseldorp
  0 siblings, 1 reply; 4+ messages in thread
From: Enzo Matsumiya @ 2024-03-22 21:23 UTC (permalink / raw)
  To: David Disseldorp; +Cc: Jan Kara, lsf-pc, linux-fsdevel, linux-cifs

Hi Dave,

On 03/18, David Disseldorp wrote:
>Hi Enzo,
>
>...
>> On Thu 14-03-24 15:14:49, Enzo Matsumiya wrote:
>> > Hello,
>> >
>> > Having implemented data compression for SMB2 messages in cifs.ko, I'd
>> > like to attend LSF/MM to discuss:
>> >
>> > - implementation decisions, both in the protocol level and in the
>> >   compression algorithms; e.g. performance improvements, what could,
>> >   if possible/wanted, turn into a lib/ module, etc
>> >
>> > - compression algorithms in general; talk about algorithms to determine
>> >   if/how compressible a blob of data is
>> >     * several such algorithms already exist and are used by on-disk
>> >       compression tools, but for over-the-wire compression maybe the
>> >       fastest one with good (not great nor best) predictability
>> >       could work?
>
>Ideally there could be some overlap between on-disk and over-the-wire
>compression algorithm support. That could allow optimally aligned /
>sized IOs to avoid unnecessary compression / decompression cycles on an
>SMB server / client if the underlying filesystem supports encoded I/O
>via e.g. BTRFS_IOC_ENCODED_READ/WRITE.

That's exactly the kind of discussion I'd be interested in when I
mentioned 'modules/subsystems with such overlapping
requirements/desire', and not only from the feature/integration
perspective, but the performance part is something I really wanted to
get right (good) from the beginning.

Which brought me to the 'how to detect uncompressible data' subject;
practical test at hand: when writing this 289MiB ISO file to an SMB
share with compression enabled, only 7 out of 69 WRITE requests
(~10%) are compressed.

(this is not the problem since SMB2 compression is supposed to be
done on a best-effort basis)

So, best effort... for 90% of this particular ISO file, cifs.ko "compressed"
those requests, reached an output with size >= to input size, discarded it
all, and sent the original uncompressed request instead => lots of CPU
cycles wasted.  Would be nice to not try to compress such data right of
the bat, or at least with minimal parsing, instead.

>IIUC, we currently have:
>SMB: LZ77, LZ77+Huffman (DEFLATE?), LZNT1, LZ4
>Btrfs: zlib/DEFLATE, LZO, Zstd
>Bcachefs: zlib/DEFLATE, LZ4, Zstd. Currently no encoded I/O support.

The algorithms required by SMB2 looks generic from an initial POV,
but due to some minor, but very important, implementation details,
I couldn't make a Windows Server decompress a DEFLATE'd buffer,
for example.  So I'm not really sure how such integration with other
subsystems would play out.

LZ4 might change this, but I haven't implemented it yet (btw thanks for
pointing me to its support in newest MS-SMB2 :)).


Cheers,

Enzo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression
  2024-03-22 21:23     ` Enzo Matsumiya
@ 2024-03-25 10:40       ` David Disseldorp
  0 siblings, 0 replies; 4+ messages in thread
From: David Disseldorp @ 2024-03-25 10:40 UTC (permalink / raw)
  To: Enzo Matsumiya; +Cc: Jan Kara, lsf-pc, linux-fsdevel, linux-cifs

Hi Enzo,

On Fri, 22 Mar 2024 18:23:54 -0300, Enzo Matsumiya wrote:

> Which brought me to the 'how to detect uncompressible data' subject;
> practical test at hand: when writing this 289MiB ISO file to an SMB
> share with compression enabled, only 7 out of 69 WRITE requests
> (~10%) are compressed.
> 
> (this is not the problem since SMB2 compression is supposed to be
> done on a best-effort basis)
> 
> So, best effort... for 90% of this particular ISO file, cifs.ko "compressed"
> those requests, reached an output with size >= to input size, discarded it
> all, and sent the original uncompressed request instead => lots of CPU
> cycles wasted.  Would be nice to not try to compress such data right of
> the bat, or at least with minimal parsing, instead.

Sounds like storing some compressible vs non-compressible write metrics
alongside a compression-capable SMB2 FILEID would allow for a simple
attempt-compression-on-next-write prediction mechanism. However, you'd
be forced to re-learn compressibility with each reconnect or store it.
FILE_ATTRIBUTE_COMPRESSED might also be available as a (user-provided)
hint.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-25 10:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <rnx34bfst5gyomkwooq2pvkxsjw5mrx5vxszhz7m4hy54yuma5@huwvwzgvrrru>
2024-03-15 12:22 ` [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression Jan Kara
2024-03-18 10:59   ` David Disseldorp
2024-03-22 21:23     ` Enzo Matsumiya
2024-03-25 10:40       ` David Disseldorp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).