Friday, November 13, 2009

CSV files - an introduction


Recently I was brought to my attention that a simple entry introduction to CSV files in our context was lacking. This article will try to provide you with an introduction to what it is all about.

Facts
  • A csv file is a ASCII text file you can read in NotePad or similar
  • The extension for csv files is .csv
  • CSV means: comma seperated values
  • The format used on Vixen Grimoire, El-Toro, WSC etc is not csv, but actually ecsv
  • ECSV means extended csv
  • The purpose of an ecsv file is to be able to organize your files
  • You organize you files according to an ecsv file with a program
  • Programs for organizing is called Collection Managers (click the link for an intro to those)
  • ECSV files have 1 strict format
  • ECSV files has the file extension .csv
  • ECSV files are commonly referred to as just csv files
  • ECSV files are maintained updated by 1 person at the time
  • It is not allowed to tamper or change the ecsv file made by someone else unless there has been given permission to do so. If you get permission, then YOU become the maintainer.
 A csv file (common name) has 1 file listed on every line in the file. Each file has some values to it. The values are in the following order (inside the csv file)

FILENAME, FILESIZE, CRC32 HEX VALUE, PATH, OPTIONAL COMMENT

An example could be:

img_0750.jpg,438648,A2DF81D5,\2008-11\2008-11-27 - Kelly M\,

The line above tell the following. There is a file, which should have the file name img_0750.jpg. This file has the file size of 438648 bytes and the calculated CRC32 checksum value must be A2DF81D5 (hexadecimal presentation) for the file to be the right file. The file should be located in a folder structure which must be \2008-11\2008-11-28 - Kelly M\. There is no comment to this file.

This means that there is only 1 file which actually matches to be the correct file for that line in that ecsv file. In general the uniqueness is assured by the use of the CRC32 checksum calculation  which is almost perfect for distinguising files from each other.

If you have a picture and you calculated the CRC32 value for the picture it may turn up to be (as example) 52AC2E11. If you then change just 1 pixel and save the file again then the file will NOT have the CRC32 value of 52AC2E11. So the two files do NOT match. Same goes if you change the resolution or scale the file.

The files listed inside the ecsv files on El-Toro, Vixen Grimoire and such places have the CRC32 value of the original files from the websites. NOTHING has been changed, not even a meta EXIF piece of information.

Every csv file itself has a CRC32 value, so it is not possible to change an existing csv file and then claim it is original if the CRC32 values of your file does not match the CRC32 value of the file on the csv sites. Here is an example:



The csv file has the name ErroticaArchives-DVD35(Pre-Final)_1872.csv. 1872 lines with file descriptions is listed inside the csv file, of which 88% is pictures. If you add up the file sizes of all 1872 lines the it amounts to 4,439,771,784 bytes, which is ~94% of a single sided DVD. The csv itself has the CRC32 value of 017C2744 to be original. The trigger name is ERRARDVD35 and it belongs to the group Fine Art Erotica, with sub grouping of Errotica Archives. The type for the csv file is regular (there are other types, like asian, wsc and such).

To use a csv file, you must have a program. Such programs are called Collection Managers. The earliest useful collection manager is ScanSort, followed by Hunter and PicCheck. The current tools of choice are PServeCheck or PSProVerify. These two latter tools can be hard to come by, but know a friend who knows somebody, who's connected and may know where to ask for directions OR ask on your favorite csv site *hint*.

No comments:

Post a Comment