[fuse-devel] rapid inode number reuse, EBUSY errors from mkdir()

Discussion:

Boris Protopopov

2012-01-18 15:24:05 UTC

Hi, everybody,

I have a filesystem based on FUSE low-level APIs. I do re-use inode numbers when they become available, and my inode allocation algorithm tends to reuse inode number that have just been freed. It works fine generally, however, when I run tests that issue lots of metadata operations (create, link, unlink, mkdir, rmdir, symlink), I get sporadic EBUSY errors from mkdir().

The problems only occur when I run multiple processes (tests performing same steps concurrently) agains a single filesystem; single process always works fine. My guess multiple processes cause sufficiently fast inode number reuse (one process frees inode number, and another allocates it quickly). I've been trying to figure out what the issue might be, looked through the kernel module source code (this is where EBUSY comes from), and found that EBUSY seems to be returned when FUSE sees more than one link to a directory inode. This seems reasonable, but I cannot see how this could happen with the tests I am running.

I have couple questions that I hope someone could answer:

1) what is the proper use of the fuse_lowlevel_notify_inval_inode()/fuse_lowlevel_notify_inval_entry()?

It is my understanding that unless the inodes/directories are modified in some way other than through local VFS interfaces/fuse.ko, I do not need to use these. I believe fuse.ko code does all the invalidation that is needed for all the operations that come through VFS, unless I missed something. Is this correct ?

2) What is the meaning of 'nodeid' as opposed to inode number ?

It appears that nodeids are the numbers returned to fuse.ko in the struct stat by the fuse_reply_enty() and similar calls. My assumption is that the inode numbers (in struct inode, part of fuse_inode) are supposed to be equal to nodeids, nowever, I am confused by the following comment in fs/fuse/dir.c::fuse_link():
...
/* Contrary to "normal" filesystems it can happen that link
makes two "logical" inodes point to the same "physical"
inode. We invalidate the attributes of the old one, so it
will reflect changes in the backing inode (link count,
etc.)
if (!err || err == -EINTR)
fuse_invalidate_attr(inode);
return err;
...

Does anyone understand what this means ?
What are the "logical" vs "physical" inodes and do they have to do with nodeids vs inode numbers ?
Going a step further, if this situation with two logical inodes occurs, could this lead to the EBUSY problem discussed above ?

Best regards, Boris Protopopov.

Boris Protopopov

2012-01-18 22:22:32 UTC

Permalink

Thanks or the quick reply, John, will definitely look into the FORGET messages, Boris.

> Subject: Re: [fuse-devel] rapid inode number reuse, EBUSY errors from mkdir()
> From: ***@jmuir.com
> Date: Wed, 18 Jan 2012 23:12:47 +0100
> CC: fuse-***@lists.sourceforge.net
> To: ***@hotmail.com
>
> On 2012.01.18, at 16:24 , Boris Protopopov wrote:
>
> > I have a filesystem based on FUSE low-level APIs. I do re-use inode numbers when they become available
>
> I suspect that you aren't waiting until the kernel is finished with the directory's inode.
>
> When you get the FORGET message for an inode from the kernel, and the lookup count then goes to zero, THEN you can re-use then inode number.
>
> See the latest fuse_lowlevel.h on how the lookup count is manipulated.
>
> Cheers,
>
> John.
>

John Muir

2012-01-18 22:12:47 UTC

Permalink

On 2012.01.18, at 16:24 , Boris Protopopov wrote:

> I have a filesystem based on FUSE low-level APIs. I do re-use inode numbers when they become available

I suspect that you aren't waiting until the kernel is finished with the directory's inode.

When you get the FORGET message for an inode from the kernel, and the lookup count then goes to zero, THEN you can re-use then inode number.

See the latest fuse_lowlevel.h on how the lookup count is manipulated.

Cheers,

John.

Boris Protopopov

2012-01-19 17:50:33 UTC

Permalink

Hi, guys,

So, I looked at the use of the forget handler, and got things to work :)
Thanks for the pointer, John.

I found a few things about the forget notifications that were not quite expected:

- common request sequences included the forget requests but did not contain any lookups
- sometimes forget requests came before the corresponding release requests

To be more clear what I am referring to:
The comments in the header file suggest that the proper way of using the forget support is
to maintain a reference counter per inode number that increments (by one) on every lookup()
and decrements on every forget() by the value of the nlookup argument.

This does not seem to work as the following sequence

# mkdir dir1
# rmdir dir1

does not issue any lookups (on dir1) yet does issue the inode forget after dir1 is deleted.

I wonder is the term 'lookup' in the comments referred to operations other than the one carried out
by the lookup handler. Does anyone know what those are ? Without knowing the complete set of the
handlers that need to increment the count, it is difficult to maintain the necessary invariants.

Best regards and thanks,
Boris.

> Subject: Re: [fuse-devel] rapid inode number reuse, EBUSY errors from mkdir()
> From: ***@jmuir.com
> Date: Wed, 18 Jan 2012 23:12:47 +0100
> CC: fuse-***@lists.sourceforge.net
> To: ***@hotmail.com
>
> On 2012.01.18, at 16:24 , Boris Protopopov wrote:
>
> > I have a filesystem based on FUSE low-level APIs. I do re-use inode numbers when they become available
>
> I suspect that you aren't waiting until the kernel is finished with the directory's inode.
>
> When you get the FORGET message for an inode from the kernel, and the lookup count then goes to zero, THEN you can re-use then inode number.
>
> See the latest fuse_lowlevel.h on how the lookup count is manipulated.
>
> Cheers,
>
> John.
>

Boris Protopopov

2012-01-19 18:22:40 UTC

Permalink

I've taken a quick look at the fuse kernel module, and I believe that every fuse kernel method
that does iget() should result in an inode number reference count increment in the low-level fuse
filesystem. Those operations include ones that call create_new_entry() (e.g. mknod(), mkdir(),
link(), symlink(), etc.), fuse_create_open(), and lookup().

Does this make sense ?

Best regards, Boris Protopopov.

Hi, guys,

So, I looked at the use of the forget handler, and got things to work :)
Thanks for the pointer, John.

I found a few things about the forget notifications that were not quite expected:

- common request sequences included the forget requests but did not contain any lookups
- sometimes forget requests came before the corresponding release requests

To be more clear what I am referring to:
The comments in the header file suggest that the proper way of using the forget support is
to maintain a reference counter per inode number that increments (by one) on every lookup()
and decrements on every forget() by the value of the nlookup argument.

This does not seem to work as the following sequence

# mkdir dir1
# rmdir dir1

does not issue any lookups (on dir1) yet does issue the inode forget after dir1 is deleted.

I wonder is the term 'lookup' in the comments referred to operations other than the one carried out
by the lookup handler. Does anyone know what those are ? Without knowing the complete set of the
handlers that need to increment the count, it is difficult to maintain the necessary invariants.

Best regards and thanks,
Boris.

> Subject: Re: [fuse-devel] rapid inode number reuse, EBUSY errors from mkdir()
> From: ***@jmuir.com
> Date: Wed, 18 Jan 2012 23:12:47 +0100
> CC: fuse-***@lists.sourceforge.net
> To: ***@hotmail.com
>
> On 2012.01.18, at 16:24 , Boris Protopopov wrote:
>
> > I have a filesystem based on FUSE low-level APIs. I do re-use inode numbers when they become available
>
> I suspect that you aren't waiting until the kernel is finished with the directory's inode.
>
> When you get the FORGET message for an inode from the kernel, and the lookup count then goes to zero, THEN you can re-use then inode number.
>
> See the latest fuse_lowlevel.h on how the lookup count is manipulated.
>
> Cheers,
>
> John.
>

John Muir

2012-01-20 13:45:33 UTC

Permalink

On 2012.01.19, at 19:22 , Boris Protopopov wrote:
> I've taken a quick look at the fuse kernel module, and I believe that every fuse kernel method
> that does iget() should result in an inode number reference count increment in the low-level fuse
> filesystem. Those operations include ones that call create_new_entry() (e.g. mknod(), mkdir(),
> link(), symlink(), etc.), fuse_create_open(), and lookup().

Yes, the latest fuse_lowlevel.h implies this in the documentation for the forget callback.

Regards,

John.

--
John Muir - ***@jmuir.com
+32 491 64 22 76

Nikolaus Rath

2012-01-20 22:30:48 UTC

Permalink

Boris Protopopov <bprotopopov-***@public.gmane.org> writes:
> This does not seem to work as the following sequence
>
> # mkdir dir1
> # rmdir dir1
>
> does not issue any lookups (on dir1) yet does issue the inode forget after dir1 is deleted.

There a few fuse_reply_* functions that imply a lookup, i.e. you should
increase the lookup count when you call them. This should be documented
in fuse_lowlevel.h (at least, it was contained in the patch I send a
little while ago).

Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C