When discussing or reviewing data retention and archiving data, what you should determine is if the data is going to be for short term or long term storage. Generally speaking, short term archived data is part of a disaster recovery plan. Some examples of situations where you may wish to pull data from archives would be for accidental deletion, data corruption, or maliciously effected files. Long term storage is going to generally be data that is beyond 30 to 45 days, and often is stored off local storage, off site, or offline on media like DVD or tape. This type of data would likely be recovered for historical review, legal requests, or compliancy regulations.
It’s also important to think about the cost differences. First, repeatedly archiving data multiple times is going to add extra expense. This should not be difficult for you to overcome by using an archive mechanism that evaluates previous backups with a CRC or MD5 value and compares that value to the current proposed data set to be archived. Additionally before being archived, the file or directories can be reviewed for their date time stamp. By reducing repetitive storage you should find you can lower your costs associated with the actual storage by preventing usage of physical media, and by lowering time of human interaction of the archiving process.
The downside to restricting the archive process to prevent data duplication is that if the single data set is lost for any reason, there is no other way of retrieving the data. Performing a mixture of the two processes, also known as Active Archiving, allows the cutting of costs with leveraging minimal amounts of data duplication. While this requires more effort upfront from IT, the long term benefits to a solid process allow for more data to be archived at a lower total cost of operation.