Move hashing and deduplication handling to model
Add fields like this:
class File(models.Model):
file_hash = ...
hash_type = ...
duplicate_of = ...
The file_hash
contains hash of a file. Hash should be calculated before adding the file to database, and checked for duplicates. hash_type
says what type of hashing algorithm was used in this case.
duplicate_of
points to a previous input into database that has the same file_hash, and contains the same data. so there is no need of storing an actual duplicate file.
Edited by Maciej Witold Majewski