Duplicate files and strategies for keeping drive space free

**Anastasius** · 14-03-16, 17:42

I am incredulous that you have 100GB of duplicates. I thought I was disorganised but I think you merit an entry in the Guinness Book of Records!

**MrGongGong** · 14-03-16, 17:49

I use this (now free)

No page found

https://www.hardcoded.net/dupeguru/

Though given the relative cheapness of memory these days it's not the issue it once was..... (he said whilst generating another 2TB of data for a 5 minute electroacoustic piece

)

**Richard Barrett** · 14-03-16, 18:38

Originally posted by MrGongGong View Post

generating another 2TB of data for a 5 minute electroacoustic piece

Let me see, a 5-minute mono soundfile at say 24bit/48kHz is about 43 MB, so I conclude that your piece is in more than 46,000 channels. Where are you going to put all those speakers??!

**MrGongGong** · 14-03-16, 19:05

Originally posted by Richard Barrett View Post

Let me see, a 5-minute mono soundfile at say 24bit/48kHz is about 43 MB, so I conclude that your piece is in more than 46,000 channels. Where are you going to put all those speakers??!

Mostly the epic amounts of data are redundant source files, field recordings where i've used a tiny bit and so on
BUT having been to the Sibelius Academy a couple of weeks ago they seem to have more than enough Genelecs for my porpoises

**Dave2002** · 15-03-16, 14:10

Originally posted by Anastasius View Post

I am incredulous that you have 100GB of duplicates. I thought I was disorganised but I think you merit an entry in the Guinness Book of Records!

Actually running it more than once reports more than that - over 150 Gbytes yesterday. Also, you may wish to consider that even though you may consider my storage disorganised, it has proved to be fairly robust. If I inadvertently delete something, or a drive goes down, I've probably got a copy on another partition, drive or memory stick, though I very nearly had a serious loss the other day as I failed to copy in the whole of a camera SDHC card with recent photos securely. Fortunately I lost hardly anything, and I may even be able to recover the 3 or 4 photos which I deleted on my camera, which in the scheme of things were not that important anyway.

You also have to remember that programs such as iTunes may copy music files into areas they "know about". Most of the space found as duplicates (or triplicates, quadruplicates ...) is full of music, with a rather smaller fraction containing video. Also, as mentioned, previously, if the tool reports 150 Gbytes of duplicates, that may only represent 75 Gbytes of "wasted" space.

What I now would like is a way of identifying all the media files - as I said, mostly music - and safely/securely copying them to at least one other drive. Oddly the Gemini program has identified music which could be copied off - but if I remove the duplicates it finds, I will no longer have the details. Are there any apps which will identify music tracks, and copy them to another drive - hopefully without completely jumbling up any structure that might be useful for players etc.? I should be able to find the music files using Mac OS X Finder, but actually copying them systematically and quickly may not be so easy.

Re the total amount of file storage allocated to music, over several machines I think it does run to Terabytes, and we can see that one of our friends here can generate that much data in a few minutes.

I have been thrashing around with video editing, and most of the working files are now on the Firewire external drive. The current size of the files I'm working with seems to be around 60 Gbytes - generated during the last week or so. I'm sure that others can generate a lot more than that in a very short time if they have big projects. Mine is minute in comparison with serious efforts.

**Dave2002** · 15-03-16, 22:42

A slight gripe against the developers of Gemini coming up. Maybe Apple should also take some responsibility.

Gemini has a preferences file. It may be set to move duplicates to the Trash, which perhaps not everyone will want. I did not want that. Today there was an update for the app available. I downloaded it and started to run it, then I looked at the preferences file. It had been reverted back to the "move duplicates to Trash" state, which I had explicitly cancelled. Maybe this kind of thing always happens when there are app updates - I'd never looked or checked before.

Apart from this glitch, the Gemini app is otherwise seeming to be rather useful and helpful.

**Anastasius** · 16-03-16, 10:55

Originally posted by Dave2002 View Post

......

What I now would like is a way of identifying all the media files - as I said, mostly music - and safely/securely copying them to at least one other drive. Oddly the Gemini program has identified music which could be copied off - but if I remove the duplicates it finds, I will no longer have the details. Are there any apps which will identify music tracks, and copy them to another drive - hopefully without completely jumbling up any structure that might be useful for players etc.? I should be able to find the music files using Mac OS X Finder, but actually copying them systematically and quickly may not be so easy.
....

If you mean copying any metadata as well as the actual audio then I've no idea. iTunes keeps the two separate. Why not simply copy over the whole iTunes library. After all, what's another copy to a man with 100GB+

of duplicates ?

**Dave2002** · 16-03-16, 13:59

Originally posted by Anastasius View Post

If you mean copying any metadata as well as the actual audio then I've no idea. iTunes keeps the two separate. Why not simply copy over the whole iTunes library. After all, what's another copy to a man with 100GB+

of duplicates ?

Not all of the music files are associated with iTunes.

100 Gbytes of storage costs about £4 on a 1 Terabyte drive, and probably under £3 on a 2 Terrabyte or larger drive.

The situations which arise are probably due to developments in drive and storage technology. Generally trends are similar in nature to Moore's "law", so that storage costs today are likely to be somewhere around 7-8 times lower than 5 years ago, assuming a halving of cost in about 18 months. Also, the size of available devices and storage may also be a similar factor larger than similar equipment about 5 years ago, making similar assumptions on growth.

Thus it might have been prohibitive for manufacturers to make and sell computers with large storage 5 years ago, whereas nowadays it would not be.
Some users have not changed their behaviour significantly, and modern computers have far more storage and computer power than they really need. On the other hand, many other users have extended their activity, and are now storing and processing much larger volumes of data, which may include photos, audio and video.

Many users may simply expect the drives supplied with their computers to be sufficiently large and have good performance for their activities, but some will have increased the demands on their hardware according to what is now possible. Buying extra storage for computers which are still working well is now not prohibitively expensive, and can be expected to get cheaper.

**french frank** · 16-03-16, 14:47

I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master'

)

**Dave2002** · 16-03-16, 15:16

Originally posted by french frank View Post

I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master'

)

I'm glad you're finding that some attention to these issues is helpful to you. For me I'm using a weeding out process to free up space on machines which I own. This is necessary (a) for my MBP as it "only" has 250 Gbytes of SSD space, and (b) for machines which are currently being used to do video editing. I am discovering that a lot of the large files which are duplicated are in fact music files - plus a relatively small number of video files. Where the files can easily be recovered I'm happy enough to delete them.

Re Gemini - it's not perfect, and there are some elephant traps. I don't like tools which may change Preferences when updated - as I mentioned recently. On the other hand, it does seem to work down at the level of pretty much exact file matching. Maybe dupeGuru does the same, though the "blurb" associated with it mentions file names. I think a duplicate detector/remover should not only work on file names, and what are described as "fuzzy" matchine processes. There are I figure ways of detecting duplicates which are robust, and which would not depend on file names at all. Probably dupeGuru uses those, and the description of how it works is just plain wrong. I may try dupeGuru, perhaps on another machine - though I really don't need to at present. If it does the job, then presumably being cheaper (free) it will suit many people better.

I note that you are carefully checking on duplicate candidates - which I think is a good strategy, at least until you get confident that the tool is doing what it claims to do precisely. Good luck with tidying up, though if you have enough space or backup drives this might actually simply be a waste of effort.

I have little choice right now, as I don't intend to run out immediately and buy a new machine with 3 Tbyte or more space and simply bundle everything over to that, though as performance and size increase, and costs (relatively) decrease, I may do so in the future.

**Anastasius** · 16-03-16, 23:08

Originally posted by french frank View Post

I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master'

)

Be wary of it's handling of Photos. I have got a copy and it is a very good program. However when I ran it some files got flagged up as duplicates but one was in the iPhoto library and the other in the default rebuilt library. Now I've not gone into the details but as I recall, iPhoto keeps at least one copy so that you can revert back to the original. So removing the duplicated flagged up by Gemini in this case might not be a good idea.

I also upgraded CleanMyMac from the same company ..it alerted me to the fact that I had a 10GB mail log!!

**french frank** · 16-03-16, 23:24

Originally posted by Anastasius View Post

Be wary of it's handling of Photos. I have got a copy and it is a very good program. However when I ran it some files got flagged up as duplicates but one was in the iPhoto library and the other in the default rebuilt library. Now I've not gone into the details but as I recall, iPhoto keeps at least one copy so that you can revert back to the original. So removing the duplicated flagged up by Gemini in this case might not be a good idea.

I think I'm 'word oriented' rather than 'image oriented'

- though I did check My Holiday Snaps after I'd scanned that folder. The program seemed pretty good and as far as I could see picked up the same image even when it had a different name.

**MrGongGong** · 16-03-16, 23:29

Originally posted by french frank View Post

I think I'm 'word oriented' rather than 'image oriented'

- though I did check My Holiday Snaps after I'd scanned that folder. The program seemed pretty good and as far as I could see picked up the same image even when it had a different name.

It seems to work by file size alone
I have had a couple of "false positives" where there were two different files with identical sizes which was a bit of a surprise as I imagined that this would have been mathematically impossible But there again my maths is at about the same level as George Osborne so I wouldn't trust it at all.

**Richard Barrett** · 17-03-16, 00:04

Originally posted by MrGongGong View Post

my maths is at about the same level as George Osborne

Blimey, how can you live like that?

Duplicate files and strategies for keeping drive space free

Duplicate files and strategies for keeping drive space free

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment