Discussion:
[fuse-devel] Using read_buf correctly with more than one fuse_bufs
Alberto Miranda
2017-04-04 15:50:26 UTC
Permalink
Hi,

I've been taking a look to the read_buf interface while developing a
RAM-based caching filesystem and I'm not sure what the correct procedure
is to return multiple buffers through the 'struct fuse_bufvec** bufp'
passed to read_buf.

In our case, the data for a file may be stored in RAM in potentially
separate regions, which might need to be returned by a single read_buf()
invocation. From what I understood from the code and the documentation,
it seems that, first of all, a 'struct fuse_bufvec' should be
malloc()-ed with enough space to contain several 'fuse_bufs', that
should be appropriately initialized to describe these regions.

For instance, if we had two memory regions 'data0' and 'data1', would it
be ok to initialize the fuse_bufs in the following manner? (src in this
case is a struct fuse_bufvec *):

src->buf[0].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[0].mem = data0; // pointer to internal in-RAM
cache entry
src->buf[0].size = bytes0;

src->buf[1].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[1].mem = data1; // pointer to internal in-RAM
cache entry
src->buf[1].size = bytes1;

My confusion comes from the following description of read_buf (fuse.h):

* The buffer must be allocated dynamically and stored at the
* location pointed to by bufp. If the buffer contains memory
* regions, they too must be allocated using malloc(). The
* allocated memory will be freed by the caller.

This seems to imply that the contents of 'data0' and 'data1' should be
memcopied to two dynamically allocated memory regions, and that the
addresses of these two new regions should be used to initialize
buf[0].mem and buf[1].mem, rather than directly using the data that is
already in RAM. Is this the case? If so, why is this extra copy needed?

Thanks in advance,

alberto
--
Alberto Miranda, PhD
Researcher on HPC I/O
Barcelona Supercomputing Center
www : https://www.bsc.es/research-development/research-areas/big-data/high-performance-io
email : alberto.miranda(at)bsc.es
phone : (+34) 93 405 42 81


http://bsc.es/disclaimer
Antonio SJ Musumeci
2017-04-04 16:20:39 UTC
Permalink
My guess is convenience in creation of the API. Rather than provide a way
for the user to manage the memory the API expects to be given ownership of
it (and that it come from malloc).

There are a few locations in libfuse like that. readdir is another location
where it'd be nice to control the underlying buffer.

One solution to the problem here could be to add a free function pointer
which the caller must provide but the library calls.
Post by Alberto Miranda
Hi,
I've been taking a look to the read_buf interface while developing a
RAM-based caching filesystem and I'm not sure what the correct procedure
is to return multiple buffers through the 'struct fuse_bufvec** bufp'
passed to read_buf.
In our case, the data for a file may be stored in RAM in potentially
separate regions, which might need to be returned by a single read_buf()
invocation. From what I understood from the code and the documentation,
it seems that, first of all, a 'struct fuse_bufvec' should be
malloc()-ed with enough space to contain several 'fuse_bufs', that
should be appropriately initialized to describe these regions.
For instance, if we had two memory regions 'data0' and 'data1', would it
be ok to initialize the fuse_bufs in the following manner? (src in this
src->buf[0].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[0].mem = data0; // pointer to internal in-RAM
cache entry
src->buf[0].size = bytes0;
src->buf[1].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[1].mem = data1; // pointer to internal in-RAM
cache entry
src->buf[1].size = bytes1;
* The buffer must be allocated dynamically and stored at the
* location pointed to by bufp. If the buffer contains memory
* regions, they too must be allocated using malloc(). The
* allocated memory will be freed by the caller.
This seems to imply that the contents of 'data0' and 'data1' should be
memcopied to two dynamically allocated memory regions, and that the
addresses of these two new regions should be used to initialize
buf[0].mem and buf[1].mem, rather than directly using the data that is
already in RAM. Is this the case? If so, why is this extra copy needed?
Thanks in advance,
alberto
--
Alberto Miranda, PhD
Researcher on HPC I/O
Barcelona Supercomputing Center
www : https://www.bsc.es/research-development/research-areas/
big-data/high-performance-io
email : alberto.miranda(at)bsc.es
phone : (+34) 93 405 42 81
http://bsc.es/disclaimer
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
--
fuse-devel mailing list
To unsubscribe or subscribe, visit https://lists.sourceforge.net/
lists/listinfo/fuse-devel
Nikolaus Rath
2017-04-05 22:09:36 UTC
Permalink
Post by Alberto Miranda
Hi,
I've been taking a look to the read_buf interface while developing a
RAM-based caching filesystem and I'm not sure what the correct procedure
is to return multiple buffers through the 'struct fuse_bufvec** bufp'
passed to read_buf.
In our case, the data for a file may be stored in RAM in potentially
separate regions, which might need to be returned by a single read_buf()
invocation. From what I understood from the code and the documentation,
it seems that, first of all, a 'struct fuse_bufvec' should be
malloc()-ed with enough space to contain several 'fuse_bufs', that
should be appropriately initialized to describe these regions.
For instance, if we had two memory regions 'data0' and 'data1', would it
be ok to initialize the fuse_bufs in the following manner? (src in this
src->buf[0].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[0].mem = data0; // pointer to internal in-RAM
cache entry
src->buf[0].size = bytes0;
src->buf[1].flags = (fuse_buf_flags) (~FUSE_BUF_IS_FD);
src->buf[1].mem = data1; // pointer to internal in-RAM
cache entry
src->buf[1].size = bytes1;
Yes, this is correct so far. But as you note below...
Post by Alberto Miranda
* The buffer must be allocated dynamically and stored at the
* location pointed to by bufp. If the buffer contains memory
* regions, they too must be allocated using malloc(). The
* allocated memory will be freed by the caller.
... this means that libfuse will call free(data0) and free(data1) after
your read_buf() function returns.
Post by Alberto Miranda
This seems to imply that the contents of 'data0' and 'data1' should be
memcopied to two dynamically allocated memory regions, and that the
addresses of these two new regions should be used to initialize
buf[0].mem and buf[1].mem, rather than directly using the data that is
already in RAM. Is this the case?
Yes.
Post by Alberto Miranda
If so, why is this extra copy needed?
In your case it is not needed. But libfuse has to take into account that
there may also be users where the buffer *has* to be dynamically
allocated and must be freed afterwards. Since the buffer is used by the
caller of reply_buf(), the caller must also be the one freeing it.

In principle, it would be possible to add a flag that tells the caller
whether to free the buffer or not. But no one has done the work.


A different solution is to use the low-level API. In that case, you
explicitly call fuse_reply_buf(), which sends the buffer without copying
(and without freeing). Afterwards, control returns to your read()
handler and you can free (or not free) the buffer as you desire.


Hope that helps!

-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
Loading...