Removing duplicated files in server
I have a wordpress install with heaps of duplicated images and file uploads, really, its a mess.
Need to reduce the size of the website and I thought I could run a bash script to remove the duplicated files and replace them with hard links.
Now, would this work? has anyone attempted before?
Tagged:
Comments
sha256sum your files, compare. Obviously check sizes/etc, but a collision is not horribly likely. I'd suggest a soft link.
My pronouns are asshole/asshole/asshole. I will give you the same courtesy.
If you do not need files to be able to diverge later, then yes it would work - relatively straightforward, see above.
Otherwise you need some pro dedup tool (or filesystem).
I have used
fdupes
in the past with some success. You could setup a script/line to remove any, but I generally just have the output put into a text file to manually review before removing anythingIncrease disk space and forget it.
I remember going through a similar thing about 10 years ago, it did not end well.
https://inceptionhosting.com
Please do not use the PM system here for Inception Hosting support issues.
Another possibility is to copy/rebase onto a btrfs filesystem and use bedup (extent-panel dedup). Then you get copy-on-write if you need to make modifications. ZFS is another option.
I disagree.
You need to increase website size.
The bigger the wordpress, the stronger you become.
I have used fslint and it worked for me.