HowTo/EfficientlyRemoveLargeDirectoryTree

Messages 1-10 from thread Message 1 in thread From: GertK (gert.koopmanREMO@VETHISplanet.nl) Subject: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-05 16:19:10 PST

Hi,

A oracle process has tried to dump a core but created recursively a very deeply nested directory structure like this instead: core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes ran out. The amount of subdirectories is estimated to be almost 100,000 looking at the increase of used inodes on the filesystem (vxfs on HPUX 11.0). The process of creating these dirs has stopped now. After enlarging the fs, there is free space and enough free inodes available. I'm trying to get rid of all the subdirs. I've tried to do a # rm -r from the first core_8286 level, but after several hours it was still running and a trace on system calls showed it was very busy with chdir calls and seemed to be looping, so I killed it. A find core_8286 -depth |xargs rm -r also ran for hours without result. Problem is that most unix utilities fail on these huge path lenghts; du gives "path too long", bdf fails as well. A script I made that just did a loop with a "cd core_8286" 100 times showed that the deeper I get the longer it takes to execute a block of 100 cd commands. This is probably due to system calls taking longer while traversing longer path names (exponential growth?)

Does anyone have a efficient method of cleaning this up while keeping the filesystem online (50 databases running on it, so I want to avoid downtime because of restoring the filesystem)?

Txs beforehand Gert. Post a follow-up to this message

Message 2 in thread From: Darren Dunham (ddunham@redwood.taos.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-05 17:28:04 PST

In comp.unix.solaris GertK  wrote: > Hi, > A oracle process has tried to dump a core but created recursively a very > deeply nested directory structure like this instead: > core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes > ran out.

[snip]

> Does anyone have a efficient method of cleaning this up while keeping > the filesystem online (50 databases running on it, so I want to avoid > downtime because of restoring the filesystem)?

If you have the filesystem online, then I can't think of anything you can do but use the standard OS interfaces to handle it.

At the moment it sounds like things are stable so even long-running processes should be sufficient. You have a program that descended a few hundred levels. Can you write one that attempts to go to the bottom and see if it terminates? Do you have an estimate of how many files there are in it? (total inodes in use - inodes elsewhere on the filesystem).

Try running something like....

% perl -e '$level=0;while (chdir "core_8286") { $level++; print "level $level " unless ($level % 100); } print "Terminated at level $level ";'

If that works, then you could have it remove things also... Use the termination level you get in place of  below...

% perl -e 'foreach $i (1 .. ) { chdir "core_8286"; }; while(rmdir "core_8286") { chdir "..";}'

I initially tried to use system (rm -r "xxx" ) there rather than perl's rmdir because I didn't want a single "off-by-one" error where you don't hit the bottom to kill the whole thing. But even a single rm -r at the 100K+ level took many dozens of seconds. The above only took about a minute or two to delete a little over 100K descending directories.

This doesn't ever try to keep the entire pathname anywhere except in the processes current directory, so there's nothing to blow up in the code itself. I don't think it's any faster than anything else using the filesystem, but it doesn't wast time worrying about where it is.

On an Ultra 5 with a slow disk and VxFS I was able to create 100K directories in about a minute or two. That first script would descend and find the bottom in about 15 seconds.

Good luck..

-- Darren Dunham                                          ddunham@taos.com Unix System Administrator                   Taos - The SysAdmin Company Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > Post a follow-up to this message

Message 3 in thread From: Icarus Sparry (usenet@icarus.freeuk.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-05 18:15:21 PST

On Tue, 06 Jan 2004 01:27:03 +0000, Darren Dunham wrote:

> In comp.unix.solaris GertK  wrote: >> Hi, >> A oracle process has tried to dump a core but created recursively a very >> deeply nested directory structure like this instead: >> core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes >> ran out. > > Good luck..

You could also try a shell solution. (echo -n '.' might need to be changed to echo -e '.c', depending on your flavour of shell).

I presume that you are certain that it is an almost infinite tree, and not a corrupted tree. Use 'ls -ia' to make sure that '.', '..' and 'core_8286' all have different inode numbers (the number before the filename).

while cd core_8286 do echo -n '.' done echo echo 'Starting to remove' cd .. while rmdir core_8286 do echo -n 'X' cd .. done Post a follow-up to this message
 * 1) !/bin/sh

Message 4 in thread From: Mark Hittinger (bugs@pu.net) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-05 18:14:20 PST

GertK  writes:

>A oracle process has tried to dump a core but created recursively a very >deeply nested directory structure like this instead: >core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes >ran out. > ... >Does anyone have a efficient method of cleaning this up while keeping >the filesystem online (50 databases running on it, so I want to avoid >downtime because of restoring the filesystem)?

Don't know about efficient since you've got to stay online. How about using a slightly backwards approach? Instead of attempting to go to the bottom and remove the file why not try removing things from the top end?

For example:

renice 5 $$ while [ 1 ] ; do  mv core_8286 core_8286.rm   if [ $? -ne 0 ] ; then exit ; fi  mv core_8286.rm/core_8286. if [ $? -ne 0 ] ; then exit ; fi  rmdir core_8286.rm   if [ $? -ne 0 ] ; then exit ; fi done
 * 1) ! /bin/sh

After one pass you have the same directory tree in core_8286 but with the top level snipped off. It could just grind away in the background until it hits the bottom.

Anyway just another idea to try. Good luck!

Later

Mark Hittinger bugs@pu.net Post a follow-up to this message

Message 5 in thread From: Darren Dunham (ddunham@redwood.taos.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 15:19:03 PST

In comp.unix.solaris Mark Hittinger  wrote: > GertK  writes: >>A oracle process has tried to dump a core but created recursively a very >>deeply nested directory structure like this instead: >>core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes >>ran out. > Don't know about efficient since you've got to stay online. How about > using a slightly backwards approach? Instead of attempting to go to > the bottom and remove the file why not try removing things from the > top end? > For example: > #! /bin/sh > renice 5 $$ > while [ 1 ] ; do >   mv core_8286 core_8286.rm >    if [ $? -ne 0 ] ; then exit ; fi >   mv core_8286.rm/core_8286. >   if [ $? -ne 0 ] ; then exit ; fi >   rmdir core_8286.rm >    if [ $? -ne 0 ] ; then exit ; fi > done > After one pass you have the same directory tree in core_8286 but with the > top level snipped off. It could just grind away in the background until > it hits the bottom.

Elegant! I don't think I would have thought of such an approach. It nicely avoids the need to descend the tree in the first place.

I'd love to know exactly how deep the OP's chain was. I've only tested with about 100K+ or so due to the time to create/kill the things.

-- Darren Dunham                                          ddunham@taos.com Unix System Administrator                   Taos - The SysAdmin Company Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > Post a follow-up to this message

Message 6 in thread From: Mark Hittinger (bugs@pu.net) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 16:12:43 PST

Darren Dunham  writes: >Elegant! I don't think I would have thought of such an approach. It >nicely avoids the need to descend the tree in the first place. >... >I'd love to know exactly how deep the OP's chain was. I've only tested >with about 100K+ or so due to the time to create/kill the things.

It also starts freeing inodes up right away. If the path does happen to have some corruption in it then trimming things from the bottom won't ever free anything :-(.

If the path is indeed corrupt then the shell script should die as soon as the corrupted path is brought to the top. Then you have bigger fish to fry but you can do something like mv the thing into lost+found at that point.

Later

Mark Hittinger bugs@pu.net Post a follow-up to this message

Message 7 in thread From: reb@cypress.com (reb@cypress.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 00:30:22 PST

GertK  wrote in message news:... > A oracle process has tried to dump a core but created recursively a very > deeply nested directory structure like this instead: > core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes > ran out. > The amount of subdirectories is estimated to be almost 100,000 looking > at the increase of used inodes on the filesystem (vxfs on HPUX 11.0). > ... > A script I made that just did a loop with a "cd core_8286" 100 times > showed that the deeper I get the longer it takes to execute a block of > 100 cd commands. This is probably due to system calls taking longer > while traversing longer path names (exponential growth?) > ...

(You should prefer to run the clean up directly running on the HP not a remote host mounting that file system, right)?

What script language did you use?

Looks like some shells (csh and tcsh?) in some circumstances at least, after each chdir will stat each parent dir on up the hier. Thus each shell chdir command from depth N to N+1 will cost one chdir and N+1 stat calls; and individual steps down to a deep depth by separate chdirs will incur an exponential number of stat calls.

The experience of the other responses suggest perl and sh likely never produce similar behavior, so probably you are safe to just stick to one of those. I don't know the purpose in the csh's of doing this, nor if there's a way to avoid it. Post a follow-up to this message

Message 8 in thread From: Darren Dunham (ddunham@redwood.taos.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 15:30:10 PST

In comp.unix.solaris reb@cypress.com wrote: > What script language did you use? > Looks like some shells (csh and tcsh?) in some circumstances at least, > after each chdir will stat each parent dir on up the hier. Thus each > shell chdir command from depth N to N+1 will cost one chdir and N+1 > stat calls; and individual steps down to a deep depth by separate > chdirs will incur an exponential number of stat calls. > The experience of the other responses suggest perl and sh likely never > produce similar behavior, so probably you are safe to just stick to one > of those. I don't know the purpose in the csh's of doing this, nor if > there's a way to avoid it.

When I was testing this on a Solaris 8 U10, my main problem with shell scripts wasn't any stat stuff on chdirs, but just the slowness of forks necessary to invoke the various utilities. Running the previous poster's shell solution for popping the top directory off and removing it would remove about 2500 directories a minute on the test box. (That solution also doesn't use any chdir calls). C and Perl rewrites of the same thing both removed about 33K directories a minute. I don't know if the HP forks would be any lighter weight or not.

-- Darren Dunham                                          ddunham@taos.com Unix System Administrator                   Taos - The SysAdmin Company Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > Post a follow-up to this message

Message 9 in thread From: cbigam@somewhereelse.nucleus.com (cbigam@somewhereelse.nucleus.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 15:53:18 PST

In comp.unix.solaris GertK  wrote: > Hi, > > A oracle process has tried to dump a core but created recursively a very > deeply nested directory structure like this instead: > core_8286/core_8286/core_8286/core_8286/....etc etc. until the inodes > ran out.

I've come across this once, and found some curious shell behaviour that let me solve it.

One shell (I think /bin/sh) let me recursively cd down into the bottom level of the directory. Once there, I could use a different shell (/bin/ksh? Or maybe it was csh, but I doubt it) to recursively "cd .. && rmdir ".

I think I was about 39k directories deep, and it took under an hour.

The other possiblity is that unlink might just wipe out the top level directory. Don't know what happens to the previously allocated inodes when you do it that way, though.

Colin Post a follow-up to this message

Message 10 in thread From: Darren Dunham (ddunham@redwood.taos.com) Subject: Re: How to efficiently remove large directory tree View this article only Newsgroups: comp.sys.hp.hpux, comp.unix.shell, comp.unix.solaris Date: 2004-01-06 16:23:03 PST

In comp.unix.solaris cbigam@somewhereelse.nucleus.com wrote: > The other possiblity is that unlink might just wipe out the top level > directory. Don't know what happens to the previously allocated inodes > when you do it that way, though.

If you succeeded in doing that, I would expect them to remain allocated until you could bring the system down and do an fsck. That wouldn't help the original goal of trying to recover the space/inodes without a reboot.

However, I was unable to directly unlink the parent directory, even as root. VxFS does not appear to have a 'clri' utility the way Solaris UFS does, and unlink fails with "File exists".

-- Darren Dunham                                          ddunham@taos.com Unix System Administrator                   Taos - The SysAdmin Company Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > Post a follow-up to this message