Duplicate files and strategies for keeping drive space free

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Dave2002
    Full Member
    • Dec 2010
    • 18057

    Duplicate files and strategies for keeping drive space free

    I did a search for duplicate finder programs, and one seemed to come out quite well in some reviews - though not necessarily all of them. That was the Gemini program available from the Apple App store (for Mac).

    indeed it does seem to work, and I've done a few tests to see if it will detect obvious duplicates, and other less obvious ones, and also not get confused if a small edit is made to an existing file.

    I then scanned the whole of my main drive on this machine, and it appears to have found around 100 Gbytes of duplicates - whether that really means 100 GBytes of spare space which can be claimed back I'm not quite sure yet. 50 x 1 Gbytes file duplicated would represent 50 Gbytes which could be cleared off, while 25 x 1 Gbyte files quadruplicated would represent 75 Gbytes which could be cleared.

    Having identified some files which appear to be duplicates, I then move large ones to a folder called something like PENDING DELETE, and leave them there for a while. That should ensure that they get backed up to a TIM backup. Some while later I delete the files which reclaims the space.

    Obviously this will mean that the TM backup could get duplicates - but at present the TM disc is substantially larger than the main drive.

    This strategy should work for a while, at least, and should ensure that files do get spread around different drives, but that duplicates on any one drive (except the TM drive) are avoided.

    I think that later on a more structured approach will be needed - actually classifying each file - music, photos, videos etc. and grouping them onto different media/devices.
  • Anastasius
    Full Member
    • Mar 2015
    • 1860

    #2
    I am incredulous that you have 100GB of duplicates. I thought I was disorganised but I think you merit an entry in the Guinness Book of Records!
    Fewer Smart things. More smart people.

    Comment

    • MrGongGong
      Full Member
      • Nov 2010
      • 18357

      #3
      I use this (now free)



      Though given the relative cheapness of memory these days it's not the issue it once was..... (he said whilst generating another 2TB of data for a 5 minute electroacoustic piece )

      Comment

      • Richard Barrett
        Guest
        • Jan 2016
        • 6259

        #4
        Originally posted by MrGongGong View Post
        generating another 2TB of data for a 5 minute electroacoustic piece
        Let me see, a 5-minute mono soundfile at say 24bit/48kHz is about 43 MB, so I conclude that your piece is in more than 46,000 channels. Where are you going to put all those speakers??!

        Comment

        • MrGongGong
          Full Member
          • Nov 2010
          • 18357

          #5
          Originally posted by Richard Barrett View Post
          Let me see, a 5-minute mono soundfile at say 24bit/48kHz is about 43 MB, so I conclude that your piece is in more than 46,000 channels. Where are you going to put all those speakers??!



          Mostly the epic amounts of data are redundant source files, field recordings where i've used a tiny bit and so on
          BUT having been to the Sibelius Academy a couple of weeks ago they seem to have more than enough Genelecs for my porpoises

          Comment

          • Dave2002
            Full Member
            • Dec 2010
            • 18057

            #6
            Originally posted by Anastasius View Post
            I am incredulous that you have 100GB of duplicates. I thought I was disorganised but I think you merit an entry in the Guinness Book of Records!
            Actually running it more than once reports more than that - over 150 Gbytes yesterday. Also, you may wish to consider that even though you may consider my storage disorganised, it has proved to be fairly robust. If I inadvertently delete something, or a drive goes down, I've probably got a copy on another partition, drive or memory stick, though I very nearly had a serious loss the other day as I failed to copy in the whole of a camera SDHC card with recent photos securely. Fortunately I lost hardly anything, and I may even be able to recover the 3 or 4 photos which I deleted on my camera, which in the scheme of things were not that important anyway.

            You also have to remember that programs such as iTunes may copy music files into areas they "know about". Most of the space found as duplicates (or triplicates, quadruplicates ...) is full of music, with a rather smaller fraction containing video. Also, as mentioned, previously, if the tool reports 150 Gbytes of duplicates, that may only represent 75 Gbytes of "wasted" space.

            What I now would like is a way of identifying all the media files - as I said, mostly music - and safely/securely copying them to at least one other drive. Oddly the Gemini program has identified music which could be copied off - but if I remove the duplicates it finds, I will no longer have the details. Are there any apps which will identify music tracks, and copy them to another drive - hopefully without completely jumbling up any structure that might be useful for players etc.? I should be able to find the music files using Mac OS X Finder, but actually copying them systematically and quickly may not be so easy.

            Re the total amount of file storage allocated to music, over several machines I think it does run to Terabytes, and we can see that one of our friends here can generate that much data in a few minutes.

            I have been thrashing around with video editing, and most of the working files are now on the Firewire external drive. The current size of the files I'm working with seems to be around 60 Gbytes - generated during the last week or so. I'm sure that others can generate a lot more than that in a very short time if they have big projects. Mine is minute in comparison with serious efforts.

            Comment

            • Dave2002
              Full Member
              • Dec 2010
              • 18057

              #7
              A slight gripe against the developers of Gemini coming up. Maybe Apple should also take some responsibility.

              Gemini has a preferences file. It may be set to move duplicates to the Trash, which perhaps not everyone will want. I did not want that. Today there was an update for the app available. I downloaded it and started to run it, then I looked at the preferences file. It had been reverted back to the "move duplicates to Trash" state, which I had explicitly cancelled. Maybe this kind of thing always happens when there are app updates - I'd never looked or checked before.

              Apart from this glitch, the Gemini app is otherwise seeming to be rather useful and helpful.

              Comment

              • Anastasius
                Full Member
                • Mar 2015
                • 1860

                #8
                Originally posted by Dave2002 View Post
                ......

                What I now would like is a way of identifying all the media files - as I said, mostly music - and safely/securely copying them to at least one other drive. Oddly the Gemini program has identified music which could be copied off - but if I remove the duplicates it finds, I will no longer have the details. Are there any apps which will identify music tracks, and copy them to another drive - hopefully without completely jumbling up any structure that might be useful for players etc.? I should be able to find the music files using Mac OS X Finder, but actually copying them systematically and quickly may not be so easy.
                ....
                If you mean copying any metadata as well as the actual audio then I've no idea. iTunes keeps the two separate. Why not simply copy over the whole iTunes library. After all, what's another copy to a man with 100GB+ of duplicates ?
                Fewer Smart things. More smart people.

                Comment

                • Dave2002
                  Full Member
                  • Dec 2010
                  • 18057

                  #9
                  Originally posted by Anastasius View Post
                  If you mean copying any metadata as well as the actual audio then I've no idea. iTunes keeps the two separate. Why not simply copy over the whole iTunes library. After all, what's another copy to a man with 100GB+ of duplicates ?
                  Not all of the music files are associated with iTunes.

                  100 Gbytes of storage costs about £4 on a 1 Terabyte drive, and probably under £3 on a 2 Terrabyte or larger drive.

                  The situations which arise are probably due to developments in drive and storage technology. Generally trends are similar in nature to Moore's "law", so that storage costs today are likely to be somewhere around 7-8 times lower than 5 years ago, assuming a halving of cost in about 18 months. Also, the size of available devices and storage may also be a similar factor larger than similar equipment about 5 years ago, making similar assumptions on growth.

                  Thus it might have been prohibitive for manufacturers to make and sell computers with large storage 5 years ago, whereas nowadays it would not be.
                  Some users have not changed their behaviour significantly, and modern computers have far more storage and computer power than they really need. On the other hand, many other users have extended their activity, and are now storing and processing much larger volumes of data, which may include photos, audio and video.

                  Many users may simply expect the drives supplied with their computers to be sufficiently large and have good performance for their activities, but some will have increased the demands on their hardware according to what is now possible. Buying extra storage for computers which are still working well is now not prohibitively expensive, and can be expected to get cheaper.

                  Comment

                  • french frank
                    Administrator/Moderator
                    • Feb 2007
                    • 30608

                    #10
                    I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master' )
                    It isn't given us to know those rare moments when people are wide open and the lightest touch can wither or heal. A moment too late and we can never reach them any more in this world.

                    Comment

                    • Dave2002
                      Full Member
                      • Dec 2010
                      • 18057

                      #11
                      Originally posted by french frank View Post
                      I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master' )
                      I'm glad you're finding that some attention to these issues is helpful to you. For me I'm using a weeding out process to free up space on machines which I own. This is necessary (a) for my MBP as it "only" has 250 Gbytes of SSD space, and (b) for machines which are currently being used to do video editing. I am discovering that a lot of the large files which are duplicated are in fact music files - plus a relatively small number of video files. Where the files can easily be recovered I'm happy enough to delete them.

                      Re Gemini - it's not perfect, and there are some elephant traps. I don't like tools which may change Preferences when updated - as I mentioned recently. On the other hand, it does seem to work down at the level of pretty much exact file matching. Maybe dupeGuru does the same, though the "blurb" associated with it mentions file names. I think a duplicate detector/remover should not only work on file names, and what are described as "fuzzy" matchine processes. There are I figure ways of detecting duplicates which are robust, and which would not depend on file names at all. Probably dupeGuru uses those, and the description of how it works is just plain wrong. I may try dupeGuru, perhaps on another machine - though I really don't need to at present. If it does the job, then presumably being cheaper (free) it will suit many people better.

                      I note that you are carefully checking on duplicate candidates - which I think is a good strategy, at least until you get confident that the tool is doing what it claims to do precisely. Good luck with tidying up, though if you have enough space or backup drives this might actually simply be a waste of effort.

                      I have little choice right now, as I don't intend to run out immediately and buy a new machine with 3 Tbyte or more space and simply bundle everything over to that, though as performance and size increase, and costs (relatively) decrease, I may do so in the future.

                      Comment

                      • Anastasius
                        Full Member
                        • Mar 2015
                        • 1860

                        #12
                        Originally posted by french frank View Post
                        I just find it confusing to have duplicates of files scattered about ('Are they duplicates? Are they updates?') so I'm having a good time with Gongers' dupeGuru (having first set up a special folder to move the dupes into and carefully examined what it was 'duping' and what it regarded as the 'master' )
                        Be wary of it's handling of Photos. I have got a copy and it is a very good program. However when I ran it some files got flagged up as duplicates but one was in the iPhoto library and the other in the default rebuilt library. Now I've not gone into the details but as I recall, iPhoto keeps at least one copy so that you can revert back to the original. So removing the duplicated flagged up by Gemini in this case might not be a good idea.

                        I also upgraded CleanMyMac from the same company ..it alerted me to the fact that I had a 10GB mail log!!
                        Fewer Smart things. More smart people.

                        Comment

                        • french frank
                          Administrator/Moderator
                          • Feb 2007
                          • 30608

                          #13
                          Originally posted by Anastasius View Post
                          Be wary of it's handling of Photos. I have got a copy and it is a very good program. However when I ran it some files got flagged up as duplicates but one was in the iPhoto library and the other in the default rebuilt library. Now I've not gone into the details but as I recall, iPhoto keeps at least one copy so that you can revert back to the original. So removing the duplicated flagged up by Gemini in this case might not be a good idea.
                          I think I'm 'word oriented' rather than 'image oriented' - though I did check My Holiday Snaps after I'd scanned that folder. The program seemed pretty good and as far as I could see picked up the same image even when it had a different name.
                          It isn't given us to know those rare moments when people are wide open and the lightest touch can wither or heal. A moment too late and we can never reach them any more in this world.

                          Comment

                          • MrGongGong
                            Full Member
                            • Nov 2010
                            • 18357

                            #14
                            Originally posted by french frank View Post
                            I think I'm 'word oriented' rather than 'image oriented' - though I did check My Holiday Snaps after I'd scanned that folder. The program seemed pretty good and as far as I could see picked up the same image even when it had a different name.
                            It seems to work by file size alone
                            I have had a couple of "false positives" where there were two different files with identical sizes which was a bit of a surprise as I imagined that this would have been mathematically impossible But there again my maths is at about the same level as George Osborne so I wouldn't trust it at all.

                            Comment

                            • Richard Barrett
                              Guest
                              • Jan 2016
                              • 6259

                              #15
                              Originally posted by MrGongGong View Post
                              my maths is at about the same level as George Osborne
                              Blimey, how can you live like that?

                              Comment

                              Working...
                              X