[fuse-devel] FUSE Low Level API

Post by Mathias PanzenbÃ¶ck
Is there a good documentation or examples for the FUSE Low Level API somewhere? In particular: Does
it implement multi threading or do you need to implement this yourself when using the lowlevel API?
What about the fork() for running in background and installing signal handlers for SIGNINT etc.?

The fuse code has examples. Check out hello_ll.c. If you already know
the high level API that should tell you how to change to low level.

Everything else I think is eaxctly like the high level api, that afaik
only changes the callbacks and reuses everything else from the low level.

Post by Mathias PanzenbÃ¶ck
The only reason why I want to use the low level API is because I want to use fuse_reply_data (with a
file descriptor as "buffer"). Also my filesystem is read-only which means I could use a completely
lock-free multi threading implementation.

Maybe you could write a patch for the high level API to support splicing
too? Has anyone checked how feasable that is?

MfG
Goswin

Michael McTernan

2011-06-03 22:49:29 UTC

Post by Mathias PanzenbÃ¶ck
Also: A thing that I noticed is that my filesystem actually performs worse when using -m and doing a
read-only iozone benchmark that uses 5 threads.

Using my own benchmarks, I found -m slowed things down for my filesystem
when making linear access to a single file. This was due to the threads
dispatching the reads out of order and causing extra seek setups which
were expensive in my fs. There was also some small overhead from extra
thread scheduling even when the seek overhead was nulled.

Threaded access was however much faster than -s when making parallel
access to more than 1 file on systems with >1 CPU with the speed up
being *NUM_CPU as expected.

For my tests I used things along the lines of "find . -name "*.aiff" |
xargs -P x cat > /dev/null" where x controls how many files are read at
once. Since my fs is transcoding, I also used a tmpfs as the source to
avoid profiling an underlying filesystem by accident.

Regards,

Mike

Mathias Panzenböck

2011-06-04 01:23:17 UTC

Post by Mathias PanzenbÃ¶ck
Also: A thing that I noticed is that my filesystem actually performs worse when using -m and doing a
read-only iozone benchmark that uses 5 threads.

Using my own benchmarks, I found -m slowed things down for my filesystem
when making linear access to a single file. This was due to the threads
dispatching the reads out of order and causing extra seek setups which
were expensive in my fs. There was also some small overhead from extra
thread scheduling even when the seek overhead was nulled.
Threaded access was however much faster than -s when making parallel
access to more than 1 file on systems with>1 CPU with the speed up
being *NUM_CPU as expected.
For my tests I used things along the lines of "find . -name "*.aiff" |
xargs -P x cat> /dev/null" where x controls how many files are read at
once. Since my fs is transcoding, I also used a tmpfs as the source to
avoid profiling an underlying filesystem by accident.

Hm, maybe my tests did profile the underlying filesystem. The archive file I mount has the files as
consecutive raw data stored in files (one index file I read into memory and several archives that
basically are created by cat $some_files > foo_001.vpk). I guess because all archives are located in
the same filesystem and on the same HD this pretty much profiles concurrent access to this HD.

-panzi

John Haechten

2011-06-04 15:23:53 UTC

Are there any FUSE options or patches available that would allow multi-threaded FUSE operations, However, making access to a single file (files using the same fileInfo.fh) single threaded to prevent the out-of-order issue that can be caused by multi-threading? Ie. Put a check in so that when FUSE starts other threads of execution, it would only do so for a Non in-use fileinfo.fh. Seek setups are very expensive in my fs also and I think this might help mitigate the Non-Sequential Read issue.

Thoughts?

John

-----Original Message-----
From: Michael McTernan [mailto:***@cs.bris.ac.uk]
Sent: Friday, June 03, 2011 5:49 PM
To: ***@gmx.net
Cc: fuse-***@lists.sourceforge.net
Subject: Re: [fuse-devel] FUSE Low Level API

Post by Mathias PanzenbÃ¶ck
Also: A thing that I noticed is that my filesystem actually performs worse when using -m and doing a
read-only iozone benchmark that uses 5 threads.

Michael McTernan

2011-06-04 22:08:50 UTC

Post by John Haechten
Are there any FUSE options or patches available that would allow
multi-threaded FUSE operations, However, making access to a single
file (files using the same fileInfo.fh) single threaded to prevent
the out-of-order issue that can be caused by multi-threading?

I posted a patch to this list on 14th May 2011 that does exactly that:

http://sourceforge.net/mailarchive/forum.php?thread_name=87sjsdepoj.fsf%40tucsk.pomaz.szeredi.hu&forum_name=fuse-devel

Miklos kindly said he'd have a look at it again, so it would be
interesting to know how it changes your filesystem's performance.

Regards,

Mike

Miklos Szeredi

2011-06-08 15:41:05 UTC

http://sourceforge.net/mailarchive/forum.php?thread_name=87sjsdepoj.fsf%40tucsk.pomaz.szeredi.hu&forum_name=fuse-devel
Miklos kindly said he'd have a look at it again, so it would be
interesting to know how it changes your filesystem's performance.

Yes, it would definitely be interesting to do some experiments with
this.

There's one fundamental problem with Michael's patch: it uses fi->fh as
a key to select a thread. This works most of the time but is
conceptually wrong. The file handle is supplied by the filesystem
implementation and libfuse should not use it or assume anything about
it.

The alternative is to implement this on top of the low level API,
forwarding calls to another instance of this API.

Thoughts?

Thanks,
Miklos

Michael McTernan

2011-06-08 20:08:46 UTC

This post might be inappropriate. Click to display it.

Goswin von Brederlow

2011-06-10 10:42:51 UTC

Post by Miklos Szeredi

Post by Michael McTernan
Miklos kindly said he'd have a look at it again, so it would be
interesting to know how it changes your filesystem's performance.

Yes, it would definitely be interesting to do some experiments with
this.
There's one fundamental problem with Michael's patch: it uses fi->fh as
a key to select a thread. This works most of the time but is
conceptually wrong. The file handle is supplied by the filesystem
implementation and libfuse should not use it or assume anything about
it.
The alternative is to implement this on top of the low level API,
forwarding calls to another instance of this API.
Thoughts?

Yep - the fh hash is a total hack, but it keeps the changes very local.
There's also a bit of overhead on cracking the message to get at fi->fh
and then computing the hash - small, but not strictly necessary.
Given full freedom to change anything, I would prefer to keep most of
the patch, but add another identifier to the kernel<->userspace
interface. This could be something like a fi->wh e.g. a worker-handle
internal to libfuse. New requests from the kernel would pass 0 meaning
that userspace can dispatch to any free worker thread. Operations that
use a handle (i.e. have fi->fh) can then pass the kernel back a non-zero
fi->wh to identify an elected worker thread. Further operations on the
fi->fh would then be sent to the same thread using the associated fi->wh.

Why not use the inode and generation number?

MfG
Goswin

Stef Bon

2011-06-10 11:32:27 UTC

HI,

you can do anything with FUSE, but isn't the problem here the decoder
using a lock, cause of reasons you've descibed. It cannot hanlde fifo
queueing, what causes this.

So I suggest the problem lies with the decoder, it should be able to
handle on a fifo basis, and what Goswin is suggesting ( generation
number, is currently not used) is a step towards this, to make the
decoder aware a request is part of a serie of request.

If this key isn't there, it's dealing with random access, if it
unlocks the mutex it should look for the first on basis of the
generation number...... how I do not know, but I guess this is a good
sollution.

Same problem with access to cdrom.

Stef

Post by Miklos Szeredi

Post by Michael McTernan
Miklos kindly said he'd have a look at it again, so it would be
interesting to know how it changes your filesystem's performance.

Yes, it would definitely be interesting to do some experiments with
this.
There's one fundamental problem with Michael's patch: it uses fi->fh as
a key to select a thread. This works most of the time but is
conceptually wrong. The file handle is supplied by the filesystem
implementation and libfuse should not use it or assume anything about
it.
The alternative is to implement this on top of the low level API,
forwarding calls to another instance of this API.
Thoughts?

Why not use the inode and generation number?
MfG
Goswin
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
fuse-devel mailing list
https://lists.sourceforge.net/lists/listinfo/fuse-devel

Goswin von Brederlow

2011-06-11 07:27:05 UTC

Post by Stef Bon
HI,
you can do anything with FUSE, but isn't the problem here the decoder
using a lock, cause of reasons you've descibed. It cannot hanlde fifo
queueing, what causes this.
So I suggest the problem lies with the decoder, it should be able to
handle on a fifo basis, and what Goswin is suggesting ( generation
number, is currently not used) is a step towards this, to make the
decoder aware a request is part of a serie of request.

Small note: the generation referes to the inodes generation for
filesystems that reuse an inode:

struct fuse_entry_param {
/** Unique inode number
*
* In lookup, zero means negative entry (from version 2.5)
* Returning ENOENT also means negative entry, but by setting zero
* ino the kernel may cache negative entries for entry_timeout
* seconds.
*/
fuse_ino_t ino;

/** Generation number for this entry.
*
* The ino/generation pair should be unique for the filesystem's
* lifetime. It must be non-zero, otherwise FUSE will treat it as an
* error.
*/
unsigned long generation;
...

Post by Stef Bon
If this key isn't there, it's dealing with random access, if it
unlocks the mutex it should look for the first on basis of the
generation number...... how I do not know, but I guess this is a good
sollution.
Same problem with access to cdrom.
Stef

MfG
Goswin

Michael McTernan

2011-06-11 08:10:20 UTC

Post by Stef Bon
you can do anything with FUSE, but isn't the problem here the decoder
using a lock, cause of reasons you've descibed. It cannot hanlde fifo
queueing, what causes this.

If the filesystem was lockless, then yes there would be no problem.
However, it's a really really common case that filesystems need to
serialise access to each specific file, and hence they will use a lock
(or opt of single threaded).

Post by Stef Bon
So I suggest the problem lies with the decoder,

No. The problem stems from the current FUSE thread model.

Read/write requests can and do arrive out of order due to the
non-deterministic scheduling of threads. Unless a file system wishes to
implement a delay and a re-order buffer (e.g. using Nagle's algorithm),
there is currently nothing that can be done at the high level API to
prevent this behaviour and random access has to be implemented. Clearly
adding a delay for re-ordering each non-sequential access will harm
random-access performance unnecessarily, but not having a buffer may
lead to additional seeking which also has a cost.

The previously proposed patch provides a way for the high-level API to
maintain the ordering of read/write requests per file.

Regards,

Mike

Stef Bon

2011-06-11 09:19:00 UTC

Post by Stef Bon
you can do anything with FUSE, but isn't the problem here the decoder
using a lock, cause of reasons you've descibed. It cannot hanlde fifo
queueing, what causes this.

Please explain here a litlle bit about lockless and the difference
with your filesystem. We're talking about a lock being set to protect
the decoders context right.

So please give me some more info. Your filesystem is like:

static void yourfs_read(..... some parameters like inode number and request)

some translation from ino to path

do some more here maybe like creation of buffer

pthread_lock_mutex(some block var)

nreturn=call_to_your_backend(pathinformation, buffer)

pthread_unlock_mutex(some block var)

fuse_reply_buf(request, buffer......)

or fuse_reply_err when error

}

Post by Stef Bon
So I suggest the problem lies with the decoder,

No. The problem stems from the current FUSE thread model.
Read/write requests can and do arrive out of order due to the
non-deterministic scheduling of threads. Unless a file system wishes to
implement a delay and a re-order buffer (e.g. using Nagle's algorithm),
there is currently nothing that can be done at the high level API to
prevent this behaviour and random access has to be implemented. Clearly
adding a delay for re-ordering each non-sequential access will harm
random-access performance unnecessarily, but not having a buffer may
lead to additional seeking which also has a cost.

I understand, please give some more info. In my filesystem
(fuse-workspace) I'm using a threads and a structure of mutex vars and
cond vars to have create some priority scheduling combined with random
access.

My filesystem offers a layer around the normal system:

/Computer
/Home
/Internet Services
/Mounts
/Network
/Shared
/System

The normal entries like bin, dev and etc and usr are present but
hidden ( and after doing a bind mount to the original fs and a chroot
you get this)

Now in the computer map there is hardware like removable devices like
USB sticks. It's removable, so at any time tge user can pull it out
(bluntly) or give a "safe remove" command. I'm currently working on
this issue to make this work.

When the user gives a safe remove command, this will result in a
"detach" of the USB device. I'm thinking of blocking any access to
the device during some period, like 15 seconds. When period is over,
device has become available again like normal, but during the 15
seconds the user can safely pull it out.

To achieve I added some extta fields to a structure. This structure
contains information about the resource, in this case a USB device,
but at can also be a SMB share, a cdrom or just a directory or file on
the localhost:

struct primary_resource_struct {
md5key_resource_string md5key;
int fd;
struct fd_list_struct *fd_list;
pathstring recordpath;
struct primary_resource_struct *next;
struct primary_resource_struct *prev;
struct mount_data_struct *mount_data;
unsigned char security;
unsigned char directio;
unsigned char userpolicy;
unsigned char status;
unsigned char actions;
pthread_mutex_t actions_mutex;
pthread_cond_t actions_condition;
unsigned char globallock;
pthread_mutex_t globallock_mutex;
pthread_cond_t globallock_condition;
unsigned group;
union {
struct local_cdrom_struct *cdrom;
struct local_disk_struct *disk;
struct net_smb_struct *smb;
struct local_map_struct *map;
struct local_file_struct *file;
} data;
};

There are a lot of fields, but important here are the actions var, and
the related actions_mutex and actions_condition, and the globallock
and the related globallock_mutex and globallock_condition.

A structure is created for the USB stick at:

/Computer/161-Transcend JetFlash Flash Drive

(the entry for this directory points indirectly to the primary_resource struct).

Now when a normal operation like read uses this device - like read -
it increases the actions paramater with one whe it starts the
operation, and decreases it again by one when ready.
So the parameter actions gives always the current number of normal operations.

It cannot increase just like that (see it like access), when the
status variable is not normal, something like "detached" or "removed"
it cannot increase, so access is denied.

The admin actions (like setting it in a stat of detached or back
again) (which instead of the normal operations which work on a single
entry) work on the whole device and use therefore the globallock var:
try to get a globallock, and set status to the desired value.

So when setting this to a value of DETACHED or REMOVED access is denied.
The remove thing is a process to finally remove the directory. In the
meantime - when this process is taking place - this status is very
functional.

With your decode you have to something simluar I guess.
But before I can say anything about that give some more info.

Stef

Michael McTernan

2011-06-11 15:26:49 UTC

Post by Stef Bon
you can do anything with FUSE, but isn't the problem here the decoder
using a lock, cause of reasons you've descibed. It cannot hanlde fifo
queueing, what causes this.

Please explain here a litlle bit about lockless and the difference
with your filesystem.

By lockless I mean a filesystem that's implemented without needing to
serialise read() or write() on a file. Fusexmp.c (on of the FUSE
examples) is an example of such a filesystem.

Post by Stef Bon
We're talking about a lock being set to protect the decoders
context right.

Yes, the lock is for the decoder context. There's nothing special about
my filesystem and it is roughly as you describe. If you are interested,
take a look at the sources; the serialisation lock is in aiff.c: AiffRead():

http://www.mcternan.me.uk/aifffffs/software/aifffffs-0.4.tar.gz

It's maybe worth noting that with the proposed changes to fuse_loop_mt
(see previous patch), the lock to protect the decoder could be removed
entirely from aifffffs since there is a decoder per open file.

Regards,

Mike

Stef Bon

2011-06-12 12:36:42 UTC

Hi,

I've read some of the code.

Please can you explain why decoding FLAC requires protection of the context?

Are the FLAC functions not thread safe?

Look here:

http://www.ziva-vatra.com/index.php?aid=23&id=U29mdHdhcmU=

for a multithread safe decoder. As I understand it supports multithreading.
Multithreading should be possible, cause it's a pure file operation,
no other hardware is used, like with a CD.

Stef

Michael McTernan

2011-06-12 18:44:17 UTC

Post by Stef Bon
Please can you explain why decoding FLAC requires protection of the context?
Are the FLAC functions not thread safe?

Correct.

Post by Stef Bon
http://www.ziva-vatra.com/index.php?aid=23&id=U29mdHdhcmU=
for a multithread safe decoder. As I understand it supports multithreading.

I don't think so. It looks like a Python script using 'flac', which is
the binary built around the same libFLAC I am using. Viva-vatra maybe
confused between a thread and a process.

Post by Stef Bon
Multithreading should be possible, cause it's a pure file operation,
no other hardware is used, like with a CD.

Yes, supporting multiple concurrent decode threads is possible if you
create a FLAC decoder for each thread. Aifffffs creates a decoder per
opened file so can take advantage multi-threading when there's >1 CPU.

Aside, I'm not sure if this is still relevant to fuse-devel. Mail me
off list if there's further discussion about FLAC.

Regards,

Mike

Michael McTernan

2011-06-11 14:57:46 UTC

Post by Michael McTernan
Yep - the fh hash is a total hack, but it keeps the changes very local.
There's also a bit of overhead on cracking the message to get at fi->fh
and then computing the hash - small, but not strictly necessary.

<snip>

Post by Goswin von Brederlow
Why not use the inode and generation number?

That's a very good suggestion. Infact, would the inode alone be
sufficient since we need only a constant identifier for the duration of
the open file?

I tried it out and it certainly looks good and tidies the code in the
patch.

Regards,

Mike

Goswin von Brederlow

2011-06-13 10:07:48 UTC

<snip>

Post by Goswin von Brederlow
Why not use the inode and generation number?

That's a very good suggestion. Infact, would the inode alone be
sufficient since we need only a constant identifier for the duration of
the open file?
I tried it out and it certainly looks good and tidies the code in the
patch.
Regards,
Mike

If you use only the inode then you might serialize access to different
files if an inode is reused while a deleted files is still open. That is
probably a case that won't happen and worst case it just slows you down
a bit. But if you have access to the generation number then use it.

MfG
Goswin

Stef Bon

2011-06-15 09:30:39 UTC

This post might be inappropriate. Click to display it.

Lucas C. Villa Real

2011-06-15 17:40:31 UTC

Post by Stef Bon
The FLAC decoder is a piece of software, why isn't it possiblke to

make that thread safe?

There's probably a lot of work to do in the code in order to come up with
anything better than a single lock with a very coarse granularity. I bet the
FLAC developers will welcome patches from the community with that regard..
:-)

--
Lucas
"If you're looking for a reason I've a reason to give: pleasure, little
treasure"

Goswin von Brederlow

2011-06-15 17:51:22 UTC

That's a very good suggestion. Infact, would the inode alone be
sufficient since we need only a constant identifier for the duration of
the open file?
I tried it out and it certainly looks good and tidies the code in the
patch.
Regards,
Mike

I've been thinking about the use of this generation number, and I
cannot see how to use this, without a total hack.
The generation number is for deleted inodes, and do we have them here?

The generation number is something your FS has to support if it resues
inode numbers. I imagine the kernel cache must keep track of the number
somewhere. I don't know though if the libfuse can access the number.
So as I said. If you can get the generation number then use
it. Otherwise no big deal.

Post by Stef Bon
I still see no easy sollution. It;s the same with reading a cdrom, the
library to access it, cdparanoia, is also not thread safe, it has a
context of it's own.

That isn't actually an issue with thread safety or being
reentrant. Although both can play a role.

But the issue here is that you have a context that only allows linear
access. Random access needs to create a new context every time it seeks.
And creating such a context is expensive so you only want to do that
when absolutely neccessary. Not just because the multithreading reorders
requests to become non-linear.

Post by Stef Bon
But this is because there is hardware which is causing this, the cd
cannot swap easiliy, and the library cd paranoia is made that way. I'm
not totally certain here, but that's waht I can think of.
The FLAC decoder is a piece of software, why isn't it possiblke to
make that thread safe?

It is thread safe in that you can have as many threads as you like as
long as each uses its own context. It just can't seek fast.

Post by Stef Bon
Stef

MfG
Goswin

Stef Bon

2011-06-15 20:10:03 UTC

Post by Goswin von Brederlow
But the issue here is that you have a context that only allows linear
access. Random access needs to create a new context every time it seeks.
And creating such a context is expensive so you only want to do that
when absolutely neccessary. Not just because the multithreading reorders
requests to become non-linear.

Well, here I draw a line. I think that the disadvantages from FLAC
play a role here, the "lossless" of it has it's price, the difficulty
of seeking.

Now, as someone who has studied mathematics, I'm thinking of
unsolvable problems, and a quote from Mazur:

In an expository paper, Number Theory as Gadfly, Mazur describes
number theory as a field which
produces, without effort, innumerable problems which have a sweet,
innocent air about them, tempting flowers; and yet... number theory
swarms with bugs, waiting to bite the tempted flower-lovers who, once
bitten, are inspired to excesses of effort!

(source: http://en.wikipedia.org/wiki/Barry_Mazur)

FUSE cannot solve the problems introduced by the FLAC manner of
compression. Period. The benefits of the highcompression rate has it;s
price. it's simple as that.

Stef Bon

Kevin Fox

2011-06-15 20:38:37 UTC

Well, here I draw a line. I think that the disadvantages from FLAC
play a role here, the "lossless" of it has it's price, the difficulty
of seeking.

What does lossless have to do with seeking? You can seek in FLAC.

Post by Stef Bon
Now, as someone who has studied mathematics, I'm thinking of
In an expository paper, Number Theory as Gadfly, Mazur describes
number theory as a field which
produces, without effort, innumerable problems which have a sweet,
innocent air about them, tempting flowers; and yet... number theory
swarms with bugs, waiting to bite the tempted flower-lovers who, once
bitten, are inspired to excesses of effort!
(source: http://en.wikipedia.org/wiki/Barry_Mazur)
FUSE cannot solve the problems introduced by the FLAC manner of
compression. Period. The benefits of the highcompression rate has it;s
price. it's simple as that.

I haven't followed this thread much, but if you are talking about a file
system that stores audio or video in one format and allows it to be read
it out in another, its something that I've been thinking about doing for
a while.

Most of those formats have a way to do a rough seek in the input file.
What you probably want to do is have a tool that generates you a seek
index from that output format's point of view that goes from the
destination format's offsets to a rough seek and offset off of that. The
tool would basically transcode everything linearly, saving the index and
dropping the actual data. Once you have that index you can seek in the
output anywhere you want and set the decoder/encoder back up fairly
quickly.

Kevin

Post by Stef Bon
Stef Bon
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
fuse-devel mailing list
https://lists.sourceforge.net/lists/listinfo/fuse-devel

Sven Utcke

2011-06-16 11:20:00 UTC

Hello Stef,

Post by Goswin von Brederlow
But the issue here is that you have a context that only allows
linear access. Random access needs to create a new context every
time it seeks. And creating such a context is expensive so you
only want to do that when absolutely neccessary. Not just because
the multithreading reorders requests to become non-linear.

Well, here I draw a line. I think that the disadvantages from FLAC
play a role here, the "lossless" of it has it's price, the
difficulty of seeking.

But the problem really isn't unique to FLAC --- numerous other
applications would have the same problem. Take reading from a serial
medium (tape). Sure, you can solve the problem by caching (which is
what I do), but this is painful (for my particular application I would
probably do caching anyway, but that's a different story). So having
something which works around this problem by serialising access to
individual files would be sweet.

Sven

--
_ ___ ___ ___ The dCache File System
__| |/ __|| __|/ __| An archive file-system for PB of data
/ _` | (__ | _| \__ \ http://www.desy.de/~utcke/Data/
\__,_|\___||_| |___/ http://www.dr-utcke.de/

Michael McTernan

2011-06-18 12:57:15 UTC

Post by Stef Bon
Well, here I draw a line. I think that the disadvantages from FLAC
play a role here, the "lossless" of it has it's price, the difficulty
of seeking.
...
FUSE cannot solve the problems introduced by the FLAC manner of
compression. Period. The benefits of the highcompression rate has it;s
price. it's simple as that.

Stef - you've not understood my issue. I will try to explain just one
more time.

But first let me clarify that there's absolutely nothing wrong with
FLAC, the library implementation or that it is lossless. Infact, the
FLAC library is excellently implemented, very well documented and easy
to use, and as others have already pointed out, this issue is not a
property of FLAC.

The issue is simply that FUSE can generate out-of-order read and write
requests on a filesystem mounted with the multi-threaded option. This
means that seeking is required even if the process generating the
requests makes only sequential linear access (e.g. cp or cat).

For almost all file systems, storage mediums and devices, sequential
forward linear access is the fastest and most heavily optimised access
pattern.

A threaded FUSE mount can therefore produce a performance drop when
linear accesses end up being presented to the filesystem out-of-order,
thus adding potential overhead from seeking. The cost of a seek on a
filesystem determines how noticeable this will be.

For my filesystem, aifffffs, seeking overhead is noticeable hence I was
bitten and inspired to excesses of effort to find a fix.

Regards,

Mike

Stef Bon

2011-06-18 19:44:14 UTC

Stef - you've not understood my issue. I will try to explain just one
more time.

I think I do understand it very well. It looks as if you don't like my
answer, and then think I do not understand you. But the issue is here
that I think this problem is very hard to solve, and one solutions
like caching, and make threads wait on each other, possibly block is a
sollution. But these are mentioned already here.

But first let me clarify that there's absolutely nothing wrong with
FLAC, the library implementation or that it is lossless. Infact, the
FLAC library is excellently implemented, very well documented and easy
to use, and as others have already pointed out, this issue is not a
property of FLAC.

Yes, possibly, but lots of documentation and ease to use does not
solve the problem it has, and I think it has a problem by design.

The issue is simply that FUSE can generate out-of-order read and write
requests on a filesystem mounted with the multi-threaded option. This
means that seeking is required even if the process generating the
requests makes only sequential linear access (e.g. cp or cat).

Reads in your fs are very important, they are continuous necessary to
translate a flac into a aiff format (but only when read is necessary).
Now as I understand when the user/app wants to read a file this with
result in a chaotic behaviour of reading the original flac file.

Normally a read on my fs will result in linear behaviour. That's the
client which takes care of that. It reads the first 13128 (example)
bytes, do something with that, and then when ready with that gives the
next read command, resulting in an offset equal to previous_offset +
bytes read. This must be the case with your fs also. It's the client
which determines the pace. Am I correct?

Stef

Sven Utcke

2011-06-19 12:12:42 UTC

Hello Stef,

Post by Stef Bon
compression. Period. The benefits of the highcompression rate has
it;s price. it's simple as that.

Stef - you've not understood my issue. I will try to explain just
one more time.

I think I do understand it very well.

Actually, if you do I can not quite follow your argument either.

Post by Stef Bon
Normally a read on my fs will result in linear behaviour. That's the
client which takes care of that. It reads the first 13128 (example)
bytes, do something with that, and then when ready with that gives the
next read command, resulting in an offset equal to previous_offset +
bytes read. This must be the case with your fs also. It's the client
which determines the pace. Am I correct?

Yes (I'm not the OP though). But FUSE will (sooner or later)
randomise this access pattern, resulting in random access to the file
even though the client is reading the data linearly. And for some FSs
(not only FLAC), a seek, while possible, will have considerable
overhead, which it would be nice to avoid. So it would be nice to
have a way to tell FUSE to try and keep access (to a single file)
ordered.

Sven

--
_ ___ ___ ___ The dCache File System
__| |/ __|| __|/ __| An archive file-system for PB of data
/ _` | (__ | _| \__ \ http://www.desy.de/~utcke/Data/
\__,_|\___||_| |___/ http://www.dr-utcke.de/

Michael McTernan

2011-06-19 17:47:31 UTC

Post by Sven Utcke
Yes (I'm not the OP though). But FUSE will (sooner or later)
randomise this access pattern, resulting in random access to the file
even though the client is reading the data linearly. And for some FSs
(not only FLAC), a seek, while possible, will have considerable
overhead, which it would be nice to avoid.

Correct.

Post by Sven Utcke
So it would be nice to have a way to tell FUSE to try and keep access
(to a single file) ordered.

And this is what my patch provides.

It there's interest, I could post the updated/simplified version of the
patch which uses the inode number instead of fi->fh.

Kind Regards,

Mike

Stef Bon

2011-06-19 18:13:17 UTC

Post by Stef Bon
bytes read. This must be the case with your fs also. It's the client
which determines the pace. Am I correct?

OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Now, the next thread which gets the lock (and the others are still
waiting/blocked) is not always the same as "next" as you look at it in
linear aspect, causing too much seeking.

So this is the cause, a slow reading process in combination with not
threadsafe code.

The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
so maintain a list for every fd with (fd, threadid, offset,
next_per_fd, prev_per_fd).
When a single thread is finished ( and the file pointer is now
offset_previous + nbytesread) , just only give the lock to the read
thread with the offset the most close to this value.

I guess this is easy to do with the pthread functions.
When a new read thread is started, it will insert the read request
(the values fd, threadid, offset) in this linked list. When the list
is empty, then the read can start right away.
When not empty, insert it in the "wait" queue at the right position like:

queelement_previous->offset < new offset < queueelement_next->offset

When a read finished, it removes it values from this queue, and sends
a condition broadcast.
Only the thread which is first in the queue takes the lock, others still wait.

This way it's not necessary to patch FUSE, only thread managment in your fs.

And of course cache where possible is always good.

Stef

Sven
--
_ ___ ___ ___ The dCache File System
__| |/ __|| __|/ __| An archive file-system for PB of data
/ _` | (__ | _| \__ \ http://www.desy.de/~utcke/Data/
\__,_|\___||_| |___/ http://www.dr-utcke.de/
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
fuse-devel mailing list
https://lists.sourceforge.net/lists/listinfo/fuse-devel

Michael McTernan

2011-06-19 18:45:25 UTC

Post by Stef Bon
OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Not true. Context switching at any time can cause request re-ordering,
regardless of code in the read call. Re-ordering can have occurred
before any user code in the filesystem has been called.

Post by Stef Bon
The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
<snip>

This has already been discussed within this thread and would require a
Nagle style delay to collate out-of-order requests which decreases
performance in sequential linear cases:

http://sourceforge.net/mailarchive/message.php?msg_id=27633573

Post by Stef Bon
This way it's not necessary to patch FUSE, only thread managment in your fs.

No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

It's also beneficial to all filesystem writers for FUSE to maintain
request order to avoid adding seeks. I think patching FUSE is
reasonable, although it's not up to me and my patch may not be the best
way to fix the problem (Goswin already provided one improvement!) :)

I'd be interested to hear from other people on the list if my argument
is accurate or coherent, and whether the proposed changes are of
interest. Miklos previously said he was interest for wider feedback on
this patch too.

Regards,

Mike

Stef Bon

2011-06-19 20:33:47 UTC

Post by Stef Bon
OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Not true. Context switching at any time can cause request re-ordering,
regardless of code in the read call. Re-ordering can have occurred
before any user code in the filesystem has been called.

Not true? Context switching? What's that?

The lock and the way the next thread gets the thread is random:

ANSWER: Unless thread priority scheduling (not covered) is used, the
assignment will be left to the native system scheduler and may appear
to be more or less random.

(source: https://computing.llnl.gov/tutorials/pthreads/#WhyPthreads)

This is causing the random behaviour.

Post by Stef Bon
The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
<snip>

This has already been discussed within this thread and would require a
Nagle style delay to collate out-of-order requests which decreases
http://sourceforge.net/mailarchive/message.php?msg_id=27633573

why is a Nagle style the only sollution?

Post by Stef Bon
This way it's not necessary to patch FUSE, only thread managment in your fs.

No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It's also beneficial to all filesystem writers for FUSE to maintain
request order to avoid adding seeks. I think patching FUSE is
reasonable, although it's not up to me and my patch may not be the best
way to fix the problem (Goswin already provided one improvement!) :)

I do not think so. The fs's only dealing with a backend like this
(cdrom, tape, your fs ) deal with this issue. And because I think that
it can be solved in your fs, the need to do this in FUSE is not so
urgent.

I also would like to hear others about this.

Stef Bon

Goswin von Brederlow

2011-06-20 17:45:39 UTC

Post by Stef Bon
OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Not true. Context switching at any time can cause request re-ordering,
regardless of code in the read call. Re-ordering can have occurred
before any user code in the filesystem has been called.

Not true? Context switching? What's that?

It is called premptive multitasking.

Post by Stef Bon
ANSWER: Unless thread priority scheduling (not covered) is used, the
assignment will be left to the native system scheduler and may appear
to be more or less random.
(source: https://computing.llnl.gov/tutorials/pthreads/#WhyPthreads)
This is causing the random behaviour.

The locking basically ensures it happens. But even in a filesystem
without a lock on read/write it can happen at any time. It just isn't
likely to occur.

Post by Stef Bon
The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
<snip>

why is a Nagle style the only sollution?

It isn't. At least not for thread safe code, which we have here. What is
needed is, as you said, to maintain a list of read requests waiting per
fd. Same for write requests.

Using the multithreaded loop this is rather ugly, requireing locking the
threads and sorting the read requests and so on.

But using a single threaded loop with worker threads this becomes
easier. In a single threaded loop the read/write requests allways come
in linear and you just have to take care of queueing them up if there is
already a request running for an inode. Otherwise you create a queue for
the inode and wake up one of the worker threads.

Post by Stef Bon
This way it's not necessary to patch FUSE, only thread managment in your fs.

No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It isn't impossible but it would be stupid. The main problem is the
randomization and that is caused by the multithreading loop. Queuing
requests to the same inode before splitting into threads is easier than
trying to put Humpty-Dumpty back together.

I do not think so. The fs's only dealing with a backend like this
(cdrom, tape, your fs ) deal with this issue. And because I think that
it can be solved in your fs, the need to do this in FUSE is not so
urgent.
I also would like to hear others about this.
Stef Bon

I would like to see a fuse mainloop that would ensure read/write
requests to a single file remain in the order given. This can be
beneficial in many situations. cdroms, tapes that really should not seek
are one example. Optimization with read-ahead on linear access is
another. You don't want to waste time on read-ahead when access is
random.

One other major factor is fragmentation. I have a simple FS that writes
data to the next free block without any caching. If the write requests
come in in-order the written data remains linear on disk. Otherwise the
blocks get jumbled up. To avoid that you would need a much more complex
caching or pre-allocation algorithm.

MfG
Goswin

Stef Bon

2011-06-20 20:23:15 UTC

Post by Michael McTernan
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

Now were talking!

I agree that it would be much nicer to queue request to the same inode
before starting a thread to do the job. That would be much nicer, but
that is entirly in FUSE, and you have to build a general sollution,
while I think that making this work can only be done in the fs,
because that "knows" how to queue and to prioritze, and FUSE does not
know about it at all! The fs has all the knowledge about the backend
and how to deal with it.

Again I agree that the queueing in the fs (which is not difficult as
you say, just try to find the conditions which are important to sort,
and choose the right mutex/condition variables).

So, not stupid, no! Besides the extra code in FUSE will not be used
for a lot filesystems.
But if you think you can build a general solution for every fs dealing
with this kind of problems, well I would like to know. I think they
look a lot at each other but are not the same.

And earlier I've mentioned the fd, I also agree it's better per inode.

I will look I've got the time to build this with a lowlevel version of
cddfs, a fs to access audio cdroms...
Must be not so hard.

Post by Goswin von Brederlow
Optimization with read-ahead on linear access is
another. You don't want to waste time on read-ahead when access is
random.

When does read-ahead happen?

Stef

Kevin Fox

2011-06-20 20:47:18 UTC

Post by Michael McTernan
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It may be easier, but will it be faster? If I have a bunch of users
accessing files, should one user take a performance hit from another
because one user is issuing a lot of out of ordered reads?

Trying to merge the out of order seeks back linearly may be harder, but
it may be better. It may be more fair.

Post by Stef Bon
Now were talking!
I agree that it would be much nicer to queue request to the same inode
before starting a thread to do the job. That would be much nicer, but
that is entirly in FUSE, and you have to build a general sollution,
while I think that making this work can only be done in the fs,
because that "knows" how to queue and to prioritze, and FUSE does not
know about it at all! The fs has all the knowledge about the backend
and how to deal with it.
Again I agree that the queueing in the fs (which is not difficult as
you say, just try to find the conditions which are important to sort,
and choose the right mutex/condition variables).
So, not stupid, no! Besides the extra code in FUSE will not be used
for a lot filesystems.

Thats silly logic. Should glibc not have math functions because a lot of
applications don't use them? A lot of fuse file systems don't use the
low level api. Should we remove that? No...

Post by Stef Bon
But if you think you can build a general solution for every fs dealing
with this kind of problems, well I would like to know. I think they
look a lot at each other but are not the same.

I think the vast majority would get along just fine with specifying two
things and letting fuse work out the details:
1. that it in fact prefers sequential reads/writes.
2. a rough time to wait to reorder before giving up and issuing the
request out of order.

Post by Stef Bon
And earlier I've mentioned the fd, I also agree it's better per inode.

Yeah.

Post by Stef Bon
I will look I've got the time to build this with a lowlevel version of
cddfs, a fs to access audio cdroms...
Must be not so hard.

Post by Goswin von Brederlow
Optimization with read-ahead on linear access is
another. You don't want to waste time on read-ahead when access is
random.

When does read-ahead happen?

When your file system wants to. I read ahead when I make a call to hpss.
Because its a network call, 4mb reads are as expensive as a 1k one.
Reading ahead 4mb at a time sped up performance for me over 1000x.

Kevin

Post by Stef Bon
Stef
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
fuse-devel mailing list
https://lists.sourceforge.net/lists/listinfo/fuse-devel

Stef Bon

2011-06-20 21:14:37 UTC

Post by Michael McTernan
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It may be easier, but will it be faster? If I have a bunch of users
accessing files, should one user take a performance hit from another
because one user is issuing a lot of out of ordered reads?
Trying to merge the out of order seeks back linearly may be harder, but
it may be better. It may be more fair.

Ha! Great Kevin, we agree here! I started to think I'm the only one here.

So to summarize for general puposes, there are two moments the
scheduling can take place:

a. in the mainloop, in FUSE

b. in the fs, when calling the backend functions.

Thats silly logic. Should glibc not have math functions because a lot of
applications don't use them? A lot of fuse file systems don't use the
low level api. Should we remove that? No...

Yes you're right here.

But I stick to the point this code in FUSE will be very general for
all purposes, while there are important differences I can see, and
correct me if I'm wrong:

a. there are backends like cdrom and tape, where there is only one
pointer (read/write head) for the whole fs.

b. backends like audio files decoded, like with aifffffs, which have
a pointer per file (=inode).

So the way to deal with is will differ, which causes difficulties when
creating a general sollution (=in FUSE).

Solving this in the fs, dealing with different threads is not so hard.
That's where mutexes and condition vars are for!

Stef

Michael McTernan

2011-06-20 21:44:17 UTC

Post by Stef Bon
But I stick to the point this code in FUSE will be very general for
all purposes, while there are important differences I can see, and
a. there are backends like cdrom and tape, where there is only one
pointer (read/write head) for the whole fs.
b. backends like audio files decoded, like with aifffffs, which have
a pointer per file (=inode).
So the way to deal with is will differ, which causes difficulties when
creating a general sollution (=in FUSE).

I'm not sure I follow. If there's a problem, please could you be so
kind to review and comment on specific difficulties in the patch I
proposed. The original posting of the patch is here:

http://sourceforge.net/mailarchive/message.php?msg_id=27511478

Post by Stef Bon
Solving this in the fs, dealing with different threads is not so hard.
That's where mutexes and condition vars are for!

Hmm. Maybe you could supply me with a patch to aifffffs or just some
pseudo-code to demonstrate how this would work in the fs?

I'll wager that performance cannot be improved in this case (remember
aiffffffs is CPU bound), and that any fs change will also be more
complex than the suggestion I've put out for FUSE in the patch.

Regards,

Mike

Kevin Fox

2011-06-20 21:45:40 UTC

Post by Michael McTernan
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It may be easier, but will it be faster? If I have a bunch of users
accessing files, should one user take a performance hit from another
because one user is issuing a lot of out of ordered reads?
Trying to merge the out of order seeks back linearly may be harder, but
it may be better. It may be more fair.

Ha! Great Kevin, we agree here! I started to think I'm the only one here.
So to summarize for general puposes, there are two moments the
a. in the mainloop, in FUSE
b. in the fs, when calling the backend functions.

Yeah.

But, who knows how to do it better? The libfuse developers with some
hints from the FS or the FS itself?

Thats silly logic. Should glibc not have math functions because a lot of
applications don't use them? A lot of fuse file systems don't use the
low level api. Should we remove that? No...

Yes you're right here.
But I stick to the point this code in FUSE will be very general for
all purposes, while there are important differences I can see, and
a. there are backends like cdrom and tape, where there is only one
pointer (read/write head) for the whole fs.

In this case, isn't using the single threaded mode of fuse sufficient?

If not, both cases could be handled by having a callback function that
buckets requests to be serialized. One buckets on everything, the other
on inodes, and the user just picks one of the two standard ones or
provides their own. In fact, the regular, no serializing can be handled
by just having NULL for the function.

Post by Stef Bon
b. backends like audio files decoded, like with aifffffs, which have
a pointer per file (=inode).

I think this is probably the norm for filesystems wanting sequential
reads/writes.

Post by Stef Bon
So the way to deal with is will differ, which causes difficulties when
creating a general sollution (=in FUSE).
Solving this in the fs, dealing with different threads is not so hard.
That's where mutexes and condition vars are for!

Anytime anyone says mutex and/or condition vars that usually implies
hard. :) Yes I understand its easy for you and me, but that doesn't mean
its easy to everyone.

Doing it once and let everyone use it ensures that fewer mistakes are
made. File systems are one of the worst places to make mistakes since
the results of programming mistakes tend to remain, potentially longer
then the file system itself.

Kevin

Post by Stef Bon
Stef

Michael McTernan

2011-06-20 22:25:51 UTC

Post by Michael McTernan
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.

I do not agree. Why do you think this is impossible?? I have a fuse fs
where the threads are granted access based upon some conditions. Thats
completly in my fs, not in FUSE.

It may be easier, but will it be faster? If I have a bunch of users
accessing files, should one user take a performance hit from another
because one user is issuing a lot of out of ordered reads?
Trying to merge the out of order seeks back linearly may be harder, but
it may be better. It may be more fair.

Yeah.
But, who knows how to do it better? The libfuse developers with some
hints from the FS or the FS itself?

But right now, when in multi-threaded mode, it is FUSE that is creating
random access patterns from purely serial reads (e.g. cat or cp).

Having a scheduler in the fs to optimise random access according to the
capabilities of the underlying device is of course good and sensible for
a filesystem and an interesting problem if there are gains to be had.

But having to re-schedule requests to try and recover performance due to
FUSE randomising requests isn't a good cause to add such complexity
which will also cost extra CPU cycles and potentially further degrade
performance if the fs is already CPU bound.

Fixing the randomising behaviour of FUSE is the only sensible thing to
do here, and it's not a difficult change to make, as demonstrated by the
ideas of my patch.

Post by Stef Bon
Solving this in the fs, dealing with different threads is not so hard.
That's where mutexes and condition vars are for!

Anytime anyone says mutex and/or condition vars that usually implies
hard. :) Yes I understand its easy for you and me, but that doesn't mean
its easy to everyone.

And performance-wise, locks and condition vars are not without overhead,
especially in userspace where contention usually means a trip through
the kernel to reschedule.

Post by Kevin Fox
Doing it once and let everyone use it ensures that fewer mistakes are
made. File systems are one of the worst places to make mistakes since
the results of programming mistakes tend to remain, potentially longer
then the file system itself.

I agree entirely with this point. Also note that under the threading
model proposed by my patch, aifffffs would need no locks within itself.
I also looked at mp3fs, and think that could switch to this threading
model without alteration (it currently runs single threaded only).

If anything, offering a simplified threading model alongside the current
fully threaded and single threaded FUSE models provides filesystem
implementers a middle-ground where concurrency can be achieved in a
straight forward way and without the need for additional locks or
unexpected access patterns.

The current FUSE choice of 'everything everywhere in any thread' is a
big leap from single threaded FUSE option. That FUSE will also
randomise the request order... you might call that a bug.

Regards,

Mike

Stef Bon

2011-06-22 10:47:44 UTC

Well,

I will try to build a lowlevel version of your fs.
I have a framework for building a lowlevel fs, using it's own
administration of entries and inodes, so I do not have to reinvent the
wheel. (I have much info from deltafs, created by Miklos).
The inode here is available.

I do not expect to be ready soon. It takes a lot of time though and
extra efforts from my brain, which is already overheated lately. So
relax is the issue here.

Futher maybe we'll discover here the soluttion is AND and not OR!
Maybe some queueing in FUSE AND in the fs...

Stef

Lucas C. Villa Real

2011-06-19 22:17:40 UTC

On Sun, Jun 19, 2011 at 3:45 PM, Michael McTernan <

Post by Stef Bon
OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Not true. Context switching at any time can cause request re-ordering,
regardless of code in the read call. Re-ordering can have occurred
before any user code in the filesystem has been called.

Post by Stef Bon
The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
<snip>

Post by Stef Bon
This way it's not necessary to patch FUSE, only thread managment in your

fs.
No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.
It's also beneficial to all filesystem writers for FUSE to maintain
request order to avoid adding seeks. I think patching FUSE is
reasonable, although it's not up to me and my patch may not be the best
way to fix the problem (Goswin already provided one improvement!) :)
I'd be interested to hear from other people on the list if my argument
is accurate or coherent, and whether the proposed changes are of
interest. Miklos previously said he was interest for wider feedback on
this patch too.

I don't remember having had seek problems like the one you're mentioning
during the implementation of IBM's LTFS (Linear Tape File System), possibly
because since the very beginning we had an I/O scheduler that performed
smart caching, as we needed that for things like backward reads and
legitimate small seeks, for example.

We evaluated multiple approaches, such as reading at least 1MB (aligned to
the nearest block address) on each FUSE read request (which was much smaller
than 1MB). The bytes read were stored in a cache which was used by any
following reads whose start offset was covered by that cache. That cache
object would be thrown away on a write() to that same file or if an offset
not covered by that cache was provided to a read() operation. A correct and
efficient implementation of that kind in the file system layer can get
complex very easily.

I also understand that in your case you're surely not wanting to decode data
unless that's mandatory. Every CPU cycle matters, after all :-)

Your patch, at least in theory of operation, seems to be doing the right
thing. You'll want to check for allocation failures in its final version,
though, as that's critical to prevent data losses to the file systems built
on top of it. At a first glance, I see that 'b' in bbuf_init() is not
checked before its members are set and that 'w->bbuf' in fuse_start_thread()
isn't verified either. By the way, what does 'b' in 'bbuf' stands for? I'd
suggest you to rename that to something more obvious for first time readers.

You mentioned the problem of hash collisions, but that doesn't seem to be an
issue, besides adding too much workload to a worker's plate (and eventually
slowing down some request's response). That still guarantees coherency.

Best regards,

--
Lucas
"If you're looking for a reason I've a reason to give: pleasure, little
treasure"

Michael McTernan

2011-06-20 19:48:29 UTC

Post by Lucas C. Villa Real
I don't remember having had seek problems like the one you're mentioning
during the implementation of IBM's LTFS (Linear Tape File System), possibly
because since the very beginning we had an I/O scheduler that performed
smart caching, as we needed that for things like backward reads and
legitimate small seeks, for example.

A scheduler also makes a lot of sense if there is a real delay to access
the data such that you can spend some time analysing the best read
pattern. In my case I'm just waiting to use the CPU :)

Post by Lucas C. Villa Real
I also understand that in your case you're surely not wanting to decode data
unless that's mandatory. Every CPU cycle matters, after all :-)

Yep. The whole file could also be quite large (a whole audio CD FLAC
file would decode to 700MB).

Post by Lucas C. Villa Real
Your patch, at least in theory of operation, seems to be doing the right
thing. You'll want to check for allocation failures in its final version,
though, as that's critical to prevent data losses to the file systems built
on top of it.

Agreed. I didn't want to put too much effort in if the underlying
principal was floored.

Post by Lucas C. Villa Real
At a first glance, I see that 'b' in bbuf_init() is not
checked before its members are set and that 'w->bbuf' in fuse_start_thread()
isn't verified either.

Thanks. I will fix them.

Post by Lucas C. Villa Real
By the way, what does 'b' in 'bbuf' stands for? I'd suggest you to
rename that to something more obvious for first time readers.

Ah yes. It's a bounded buffer implementation. I would normally put a
bit of explanation in the header block of the files to say what they
contain, but I was following the FUSE style ;)

Post by Lucas C. Villa Real
You mentioned the problem of hash collisions, but that doesn't seem to be an
issue, besides adding too much workload to a worker's plate (and eventually
slowing down some request's response). That still guarantees coherency.

Yep. I was more concerned that a filesystem implementation may be using
fi->fh in an certain way such that lots of files would have the same
hash value and unexpectedly get single-threaded performance. Others
pointed out that really FUSE has no business interpreting fi->fh, so
locally I'm using the inode number from the request as suggested by
Goswin. This also gives simpler code as the inode number is in the
header structure of the kernel request data, so the casts in the switch
statement are gone. There's still a risk that a worker gets overloaded,
but as you say, it is still coherent and works well in my benchmarks.

I'll lurk a bit longer, then may publish a cleaned up version of the
patch for those that have expressed continued interest.

Many Thanks,

Mike

Kevin Fox

2011-06-20 15:43:20 UTC

Post by Stef Bon
OK, when looking at this futher - already mentioned in this thread
somewhere as I remember - that the lock used in the read call, is
causing this.

Not true. Context switching at any time can cause request re-ordering,
regardless of code in the read call. Re-ordering can have occurred
before any user code in the filesystem has been called.

Post by Stef Bon
The adjustment in the fs should be just simple. Maintain a list of
read requests waiting per fd,
<snip>

Post by Stef Bon
This way it's not necessary to patch FUSE, only thread managment in your fs.

No. It's actually very hard (impossible?) to fix efficiently outside of
FUSE for reasons already stated.
It's also beneficial to all filesystem writers for FUSE to maintain
request order to avoid adding seeks. I think patching FUSE is
reasonable, although it's not up to me and my patch may not be the best
way to fix the problem (Goswin already provided one improvement!) :)
I'd be interested to hear from other people on the list if my argument
is accurate or coherent, and whether the proposed changes are of
interest. Miklos previously said he was interest for wider feedback on
this patch too.

I have a file system that does caches very basically. It caches reads
but only has one read buffer. Too much seeing needlessly causes the
buffer to be thrown out/reread. It would benefit it to have the above
mentioned code available. Rather then having to write it for every file
system that needs it, it probably should be in the shared libraries and
you just set a configuration bit if you want it. My 2 cents.

Thanks,
Kevin

Post by Michael McTernan
Regards,
Mike
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
fuse-devel mailing list
https://lists.sourceforge.net/lists/listinfo/fuse-devel

Nikolaus Rath

2011-06-25 19:40:20 UTC

Post by Michael McTernan
It's also beneficial to all filesystem writers for FUSE to maintain
request order to avoid adding seeks. I think patching FUSE is
reasonable, although it's not up to me and my patch may not be the best
way to fix the problem (Goswin already provided one improvement!) :)
I'd be interested to hear from other people on the list if my argument
is accurate or coherent, and whether the proposed changes are of
interest. Miklos previously said he was interest for wider feedback on
this patch too.

Your arguments are accurate & coherent. I don't have any use for the
patch myself, but I believe it would be a good idea to integrate it into
FUSE.

Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

John Haechten

2011-06-27 13:11:07 UTC

I do have a use for the patch and do believe that it would be beneficial to integrate this into FUSE permanently. However, to satisfy others and leave FUSE as flexible as possible, making this an option would be Ok too.

- John

-----Original Message-----
From: Nikolaus Rath [mailto:***@rath.org]
Sent: Saturday, June 25, 2011 2:40 PM
To: fuse-***@lists.sourceforge.net
Subject: Re: [fuse-devel] FUSE Low Level API

Your arguments are accurate & coherent. I don't have any use for the
patch myself, but I believe it would be a good idea to integrate it into
FUSE.

Best,

-Nikolaus

Post by Michael McTernan
Time flies like an arrow, fruit flies like a Banana.<

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

Kevin Fox

2011-06-20 15:34:02 UTC

Stef - you've not understood my issue. I will try to explain just one
more time.
But first let me clarify that there's absolutely nothing wrong with
FLAC, the library implementation or that it is lossless. Infact, the
FLAC library is excellently implemented, very well documented and easy
to use, and as others have already pointed out, this issue is not a
property of FLAC.
The issue is simply that FUSE can generate out-of-order read and write
requests on a filesystem mounted with the multi-threaded option. This
means that seeking is required even if the process generating the
requests makes only sequential linear access (e.g. cp or cat).
For almost all file systems, storage mediums and devices, sequential
forward linear access is the fastest and most heavily optimised access
pattern.
A threaded FUSE mount can therefore produce a performance drop when
linear accesses end up being presented to the filesystem out-of-order,
thus adding potential overhead from seeking. The cost of a seek on a
filesystem determines how noticeable this will be.
For my filesystem, aifffffs, seeking overhead is noticeable hence I was
bitten and inspired to excesses of effort to find a fix.

One other thing you could do is put a bit of infrastructure together
that keeps track of the current position of the file, if your not there,
wait on a mutex in that fd for about as long as it takes to reset your
flac engine and wait to see if the out of order reads reassemble
themselves. It shouldn't be much code.

Kevin

Michael McTernan

2011-06-20 19:28:12 UTC

Post by Michael McTernan
For my filesystem, aifffffs, seeking overhead is noticeable hence I was
bitten and inspired to excesses of effort to find a fix.

Yep - that is what I've been calling a 'Nagle's algorithm' style delay
that I'd like to avoid.

Principally the FLAC library isn't *that* slow to reinitialise and any
benefit from request reordering can be quickly lost in the extra
overhead of a system call to put the thread to sleep in case another
request turns up, which then needs locking and sorting to resolve the
next action. Coupled with a dependency on the minimum timer resolution
of Linux, I think this could end up really hurting genuine random access
performance.

What you propose could of course be very beneficial in other cases e.g.
if I were accessing genuine hardware like a hard disk drive, I'd
naturally have a little time waiting for the disk to rotate and service
a request (e.g. up to 100us per revolution for a 10k rpm drive) during
which other requests can naturally be queued up and sorted as per Native
Command Queuing.

But in my case, the aifffffs filesystem is only waiting to burn some CPU
cycles, so any delay or CPU overhead is generally bad for performance.

Regards,

Mike

Stef Bon

2011-06-10 08:47:56 UTC

Have you also tried other mt main event loops?

See for example:

http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/10030

I'm curious the problem you describe also occurs when using another loop.

The one I've written uses epoll and a fixed number of threads.

Stef Bon

Post by Mathias PanzenbÃ¶ck
Also: A thing that I noticed is that my filesystem actually performs worse when using -m and doing a
read-only iozone benchmark that uses 5 threads.

Using my own benchmarks, I found -m slowed things down for my filesystem
when making linear access to a single file. This was due to the threads
dispatching the reads out of order and causing extra seek setups which
were expensive in my fs. There was also some small overhead from extra
thread scheduling even when the seek overhead was nulled.
Threaded access was however much faster than -s when making parallel
access to more than 1 file on systems with >1 CPU with the speed up
being *NUM_CPU as expected.
For my tests I used things along the lines of "find . -name "*.aiff" |
xargs -P x cat > /dev/null" where x controls how many files are read at
once. Since my fs is transcoding, I also used a tmpfs as the source to
avoid profiling an underlying filesystem by accident.
Regards,
Mike

Michael McTernan

2011-06-11 08:33:49 UTC

Post by Stef Bon
Have you also tried other mt main event loops?

Yes. I proposed something of a patch on the FUSE mt loop to change it's
behaviour:

http://sourceforge.net/mailarchive/message.php?msg_id=27511478

Post by Stef Bon
http://thread.gmane.org/gmane.comp.file-systems.fuse.devel/10030
I'm curious the problem you describe also occurs when using another loop.

I checked the code in your loop. Yes, I believe you can also find
random access emerging from purely sequential reads or writes with your
code structure.

Regards,

Mike

Stef Bon

2011-06-11 09:20:52 UTC

Post by Stef Bon
Have you also tried other mt main event loops?

I checked the code in your loop. Yes, I believe you can also find
random access emerging from purely sequential reads or writes with your
code structure.
Regards,
Mike

That's logical. The main loop cannot and should not do anything with
scheduling. Thats another level.

Stef

Mike Vitalo

2011-06-20 15:29:09 UTC