Discussion:
%fdupes
Stephan Kulow
2007-05-16 08:03:02 UTC
Permalink
Hi!

We did some analysis on how much space is wasted by packages storing the same
file twice (or more). While few packages waste megabytes (only 88 waste more
than 1000Mib), 657 waste more than 20K - which sums up to 703MiB in total.

Impressed? Consider using fdupes in your package.

It's pretty simple: BuildRequire fdupes and then use "%fdupes $RPM_BUILD_ROOT"
in your install section. This will check for duplicated files and make them
hardlink. Just be careful that these duplicated files do not end up in
different subpackages - I haven't tried what rpm does in that case.

But you can also use %fdupes -s, which will create symlinks, which are easier
to grasp for rpm :)

So you can also combine this like this
# create symlinks for my man pages
%fdupes -s $RPM_BUILD_ROOT%_mandir
# create hardline for the rest
%fdupes $RPM_BUILD_ROOT

I also added an rpmlint check that will give an error for the package if it's
wasting more than 20KB (which is basically a random number).

Greetings, Stephan
--
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Cristian Rodriguez R.
2007-05-16 08:21:40 UTC
Permalink
Post by Stephan Kulow
Hi!
We did some analysis on how much space is wasted by packages storing the same
file twice (or more). While few packages waste megabytes (only 88 waste more
than 1000Mib), 657 waste more than 20K - which sums up to 703MiB in total.
Interesting.. that is one CD less ,, wow.;-)

Will be nice if the list of offending packages can be published in order
to fix them ;)
Stephan Kulow
2007-05-16 08:28:01 UTC
Permalink
Post by Cristian Rodriguez R.
Post by Stephan Kulow
Hi!
We did some analysis on how much space is wasted by packages storing the
same file twice (or more). While few packages waste megabytes (only 88
waste more than 1000Mib), 657 waste more than 20K - which sums up to
703MiB in total.
Interesting.. that is one CD less ,, wow.;-)
Most of the packages wasting a lot are also big enough to not be on our CDs.
Post by Cristian Rodriguez R.
Will be nice if the list of offending packages can be published in order
to fix them ;)
I'd prefer if every packager checks his own rpmlint reports instead of putting
out a list of blame[¹]

Greetings, Stephan
[1] And yes, that means one or two KDE packages score pretty well ;)
--
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Reinhard Max
2007-05-16 09:52:37 UTC
Permalink
Post by Cristian Rodriguez R.
[...] which sums up to 703MiB in total.
Interesting.. that is one CD less ,, wow.;-)
I guess the 703MB are the size of these files when installed, not the
size they add to the (compressed) RPM files that go to the CDs.

cu
Reinhard
Dirk Mueller
2007-05-30 12:49:44 UTC
Permalink
Post by Stephan Kulow
I also added an rpmlint check that will give an error for the package if
it's wasting more than 20KB (which is basically a random number).
Has been copied to http://en.opensuse.org/Packaging/SUSE_Macros

Greetings,
Dirk
Vladimir Nadvornik
2007-08-24 15:52:48 UTC
Permalink
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for duplicated
files and make them hardlink. Just be careful that these duplicated files
do not end up in different subpackages - I haven't tried what rpm does in
that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See
https://bugzilla.novell.com/show_bug.cgi?id=304167

Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...

fixes the problem.

Do you think that the %fdupes macro should be changed to do this
automatically?

Vladimir
Jan Engelhardt
2007-08-29 17:35:59 UTC
Permalink
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for duplicated
files and make them hardlink. Just be careful that these duplicated files
do not end up in different subpackages - I haven't tried what rpm does in
that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
What if /srv/ftp and /srv/www were separate mounts?
Post by Vladimir Nadvornik
Do you think that the %fdupes macro should be changed to do this
automatically?
Jan
--
Stephan Kulow
2007-09-02 14:23:18 UTC
Permalink
Post by Jan Engelhardt
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for duplicated
files and make them hardlink. Just be careful that these duplicated
files do not end up in different subpackages - I haven't tried what rpm
does in that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
What if /srv/ftp and /srv/www were separate mounts?
Then you still had to find a package that puts files in both?

Greetings, Stephan
Jan Engelhardt
2007-09-02 14:24:37 UTC
Permalink
Post by Stephan Kulow
Post by Jan Engelhardt
Post by Vladimir Nadvornik
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
What if /srv/ftp and /srv/www were separate mounts?
Then you still had to find a package that puts files in both?
I mean I have not seen %fdupes yet, or what it does. Fact is, that I think that
the rpm archive should be created as if the whole tree was one filesystem, and
hardlinks be broken no earlier than rpm -Uhv.

Jan
--
Stephan Kulow
2007-09-02 14:22:58 UTC
Permalink
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for duplicated
files and make them hardlink. Just be careful that these duplicated files
do not end up in different subpackages - I haven't tried what rpm does in
that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See
https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
Do you think that the %fdupes macro should be changed to do this
automatically?
I think it would be logical to make this automatic.

Greetings, Stephan
Marcus Rueckert
2007-09-02 16:00:21 UTC
Permalink
Post by Stephan Kulow
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for duplicated
files and make them hardlink. Just be careful that these duplicated files
do not end up in different subpackages - I haven't tried what rpm does in
that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See
https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
Do you think that the %fdupes macro should be changed to do this
automatically?
I think it would be logical to make this automatic.
and it would be still broken. you can not assume that hardlinks between
different directories will _always_ work. the only place where you can
say "it wont break anything" are hardlinks in the same directory.
anything else can be on a different partition. that said i think the
best would be to patch fdupes and let it use hardlinks for any
duplicates in the same directory, but symlinks for anything else.

darix
--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Bernhard Walle
2007-09-02 19:21:08 UTC
Permalink
Post by Marcus Rueckert
but symlinks for anything else.
But using any automatism like %fdupes for symlinks is also a bad idea
IMO since the semantics of two files (or two hardlinks to the same
file) is different from the semantics of a file and a symlink.
Consider for example the difference when you delete the file and not
the symlink, or chmod, or something else.

Also (for hardlinks _and_ symlinks), what's if a program installs the
same configuration file in /etc and as documentation in
/usr/share/doc/packages. Initially, the contents is the same, but if
you modify the configuration in /etc, the sample configuration in
/usr/share/doc/packages should stay the same.


Thanks,
Bernhard
Adrian Schröter
2007-09-03 07:12:18 UTC
Permalink
Post by Marcus Rueckert
Post by Stephan Kulow
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for
duplicated files and make them hardlink. Just be careful that these
duplicated files do not end up in different subpackages - I haven't
tried what rpm does in that case.
There seems to be another problem. %fdupes can create hardlinks between
files that would finally end on different partitions.
See
https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
Do you think that the %fdupes macro should be changed to do this
automatically?
I think it would be logical to make this automatic.
and it would be still broken. you can not assume that hardlinks between
different directories will _always_ work. the only place where you can
say "it wont break anything" are hardlinks in the same directory.
anything else can be on a different partition. that said i think the
best would be to patch fdupes and let it use hardlinks for any
duplicates in the same directory, but symlinks for anything else.
That is right, but what happens acctually when you have different partitions ?

Does rpm fail to install the package or does it create a full copy of the file
on the other partition ?

If it is the later, I think hardlinks are okay to use ..

bye
adrian
--
Adrian Schroeter
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
email: ***@suse.de
Marcus Rueckert
2007-09-03 07:46:50 UTC
Permalink
Post by Adrian Schröter
Post by Marcus Rueckert
and it would be still broken. you can not assume that hardlinks between
different directories will _always_ work. the only place where you can
say "it wont break anything" are hardlinks in the same directory.
anything else can be on a different partition. that said i think the
best would be to patch fdupes and let it use hardlinks for any
duplicates in the same directory, but symlinks for anything else.
That is right, but what happens acctually when you have different partitions ?
Does rpm fail to install the package or does it create a full copy of the file
on the other partition ?
If it is the later, I think hardlinks are okay to use ..
it fails horribly.
taking into account the comment from bwalle about different meanings of
files in different subdirectories, i think the only valid thing is that
fdupes should only hardlink files in the same directory.

darix
--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Michael Matz
2007-09-03 13:41:45 UTC
Permalink
Hi,
Post by Marcus Rueckert
Post by Adrian Schröter
Does rpm fail to install the package or does it create a full copy of
the file on the other partition ?
If it is the later, I think hardlinks are okay to use ..
it fails horribly.
taking into account the comment from bwalle about different meanings of
files in different subdirectories, i think the only valid thing is that
fdupes should only hardlink files in the same directory.
Fix rpm. _That's_ the only valid thing. If not possible for 10.3, make
%fdupes a noop for now.


Ciao,
Michael.
Marcus Rueckert
2007-09-03 13:46:50 UTC
Permalink
Post by Michael Matz
Post by Marcus Rueckert
Post by Adrian Schröter
Does rpm fail to install the package or does it create a full copy of
the file on the other partition ?
If it is the later, I think hardlinks are okay to use ..
it fails horribly.
taking into account the comment from bwalle about different meanings of
files in different subdirectories, i think the only valid thing is that
fdupes should only hardlink files in the same directory.
Fix rpm. _That's_ the only valid thing. If not possible for 10.3, make
%fdupes a noop for now.
as mls mentioned offline that none of the tools is handling that case
nicely. rsync fails with that too for example. and he declined to fix
that in rpm.

darix
--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org
Michael Matz
2007-09-03 15:04:49 UTC
Permalink
Hi,
Post by Marcus Rueckert
Post by Michael Matz
it fails horribly. taking into account the comment from bwalle about
different meanings of files in different subdirectories, i think the
only valid thing is that fdupes should only hardlink files in the
same directory.
Fix rpm. _That's_ the only valid thing. If not possible for 10.3,
make %fdupes a noop for now.
as mls mentioned offline that none of the tools is handling that case
nicely.
Invalid reasoning. There needs to be just one tool handling it correctly,
namely rpm, perhaps cpio. If other programs don't handle this correctly
doesn't matter for installation of rpms.
Post by Marcus Rueckert
rsync fails with that too for example. and he declined to fix that in
rpm.
Not sure what rsync has to do with the problem at hand.


Ciao,
Michael.
Michael Matz
2007-09-03 15:16:12 UTC
Permalink
Hi,
Post by Michael Matz
Post by Marcus Rueckert
rsync fails with that too for example. and he declined to fix that in
rpm.
Not sure what rsync has to do with the problem at hand.
Especially because it seems to handle copying hardlinks across
directories, when the target directories are on different filesystems just
fine. Just tested.


Ciao,
Michael.
Dirk Mueller
2007-09-03 15:56:06 UTC
Permalink
Post by Michael Matz
Post by Michael Matz
Not sure what rsync has to do with the problem at hand.
Especially because it seems to handle copying hardlinks across
directories, when the target directories are on different filesystems just
fine. Just tested.
would you please discuss this in the appropriate bugreport (bug 304167)
instead of the list here, where it is likely getting forgotten again?

Thanks a lot,
Dirk
--
RPMLINT information under http://en.opensuse.org/Packaging/RpmLint
Cristian Rodriguez
2007-09-04 04:45:51 UTC
Permalink
Post by Marcus Rueckert
and he declined to fix
that in rpm.
Sure, because RPM is not broken, what seems to be broken is the idea of
using this %fdupes thingy, as AFAICS it will cause more harm than good.
--
"You don't have to burn books to destroy a culture. Just get people to
stop reading them." --Ray Bradbury

Cristian Rodríguez R.
SUSE LINUX Products GmbH
Research & Development
Stephan Kulow
2007-09-04 09:25:43 UTC
Permalink
Post by Cristian Rodriguez
Post by Marcus Rueckert
and he declined to fix
that in rpm.
Sure, because RPM is not broken, what seems to be broken is the idea of
using this %fdupes thingy, as AFAICS it will cause more harm than good.
Thanks for your warm words.

Fact 1: hard links are a normal part of the UNIX world, not handling them can
be considered a bug (aka being broken). If it's an important bug is another
issue.
Fact 2: The good %fdupes thingy does is making it possible to have a 700MB ISO
Fact 3: Many packages are broken in installing massive overlap of files
Fact 4: Yes, running fdupes in hardlink mode without thinking twice might not
not be the best idea. But as a matter of fact, I consider every tool not
good or bad per se. It always depends on the use of the tools.
Fact 5: Your communication style is broken, you should check the facts before
calling other people's ideas broken.

Greetings, Stephan
--
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Cristian Rodriguez
2007-09-04 09:51:26 UTC
Permalink
Post by Stephan Kulow
Fact 3: Many packages are broken in installing massive overlap of files
thats the real problem, but while fixing the broken packages may be a
long term goal, currently such task seems to be an overkill.
Post by Stephan Kulow
Fact 5: Your communication style is broken, you should check the facts before
calling other people's ideas broken.
Dont take it personal, my intention was never offend people.
--
"You don't have to burn books to destroy a culture. Just get people to
stop reading them." --Ray Bradbury

Cristian Rodríguez R.
SUSE LINUX Products GmbH
Research & Development
Michael Matz
2007-09-04 13:09:15 UTC
Permalink
Hi,
Post by Cristian Rodriguez
Sure, because RPM is not broken, what seems to be broken is the idea of
using this %fdupes thingy, as AFAICS it will cause more harm than good.
Then you can't see very far.


Ciao,
Michael.

Vladimir Nadvornik
2007-09-03 08:40:06 UTC
Permalink
Post by Adrian Schröter
Post by Marcus Rueckert
Post by Stephan Kulow
Post by Vladimir Nadvornik
Post by Stephan Kulow
It's pretty simple: BuildRequire fdupes and then use "%fdupes
$RPM_BUILD_ROOT" in your install section. This will check for
duplicated files and make them hardlink. Just be careful that these
duplicated files do not end up in different subpackages - I haven't
tried what rpm does in that case.
There seems to be another problem. %fdupes can create hardlinks
between files that would finally end on different partitions.
See
https://bugzilla.novell.com/show_bug.cgi?id=304167
Using something like
%fdupes $RPM_BUILD_ROOT/usr
%fdupes $RPM_BUILD_ROOT/srv
...
fixes the problem.
Do you think that the %fdupes macro should be changed to do this
automatically?
I think it would be logical to make this automatic.
and it would be still broken. you can not assume that hardlinks between
different directories will _always_ work. the only place where you can
say "it wont break anything" are hardlinks in the same directory.
anything else can be on a different partition. that said i think the
best would be to patch fdupes and let it use hardlinks for any
duplicates in the same directory, but symlinks for anything else.
IMHO the best approach is to identify hardlinks between directories with
rpmlint and let the maintainer decide whether they are dangerous or not.
Post by Adrian Schröter
Does rpm fail to install the package or does it create a full copy of the
file on the other partition ?
RPM fails, see the bugreport above.

Vladimir
Petr Cerny
2007-09-03 09:52:33 UTC
Permalink
Post by Vladimir Nadvornik
Post by Marcus Rueckert
Post by Stephan Kulow
Post by Vladimir Nadvornik
Do you think that the %fdupes macro should be changed to do this
automatically?
I think it would be logical to make this automatic.
and it would be still broken. you can not assume that hardlinks between
different directories will _always_ work. the only place where you can
say "it wont break anything" are hardlinks in the same directory.
anything else can be on a different partition. that said i think the
best would be to patch fdupes and let it use hardlinks for any
duplicates in the same directory, but symlinks for anything else.
IMHO the best approach is to identify hardlinks between directories with
rpmlint and let the maintainer decide whether they are dangerous or not.
Or patch rpm to check whether hardlinks are not created across different
partitions/volumes?

Best regards
Petr
Continue reading on narkive:
Loading...