Active Archive

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Sign-up & Billing

  • To sign up, email with: Your name, netid, CFOP, and optionally a list of other users who will need access to the archive.
  • Archive usage cost is calculated based on the total size of all files stored on the archive. Each terabyte of data (rounded up to the next terabyte) will cost $200 per ten years. The first 50GB of space is free.
Data Cost
0 - 50.0 GB $0
50.0001 GB - 0.9999 TB $200
1.0000 - 1.9999 TB $400
Etc. Etc.

Accessing the Archive

The archive is mounted on the head node of the Biocluster, at /archive. You will have a directory within /archive created for you when you sign up. Please see the Biocluster help page for instructions on how to get access to and connect to the Biocluster.

Archive Rules

  • All files sent to the archive are final and cannot be removed or modified.
  • All files sent to the archive are to be compressed before being copied over (.gzip, .tgz, .zip, .bzip, .gz, .7z, .rar)
  • It is highly recommended to use bzip2 for compression, as it offers the highest compress ratio and is built into the tar command.
  • It is highly recommended to upload an md5 hash file and a file listing for each tar file you upload to the archive.
  • All files uploaded to the archive must be smaller than 2.5 TB. It is advisable to have files be as close to 2.5 TB as possible.
  • All archive files should be larger than 1.0 GB, with some exceptions. Exceptions include md5 hash files and file listings of tar files.

Archive Procedure

See the Biocluster help page for help connecting to the Biocluster and submitting jobs.

Organize the data you want to archive in a well-thought-out directory structure. Remember you won’t be able to change or rearrange any data once it’s on the archive.

We’re assuming here that your archive file will be called “archive,” and the directory containing your data is called “directory.” Modify those values appropriately when using the following commands. The following three operations should be submitted to the cluster as a job, and DO NOT run on the head node.

  • Use tar to compress the files. The tar file should have the extension .tar.bz2
tar -cjf archive.tar.bz2 directory
  • Extract the file list and output it to a text file
tar -tvf archive.tar.bz2 > archive.tar.bz2.txt
  • Save the md5 hash of the archive to a file
md5sum archive.tar.bz2 > archive.tar.bz2.md5

Copy the three files you’ve just created to your space on the archive. This can only be done using the head node, as the archive is not mounted on any of the compute nodes.

cp archive.tar.bz2* /archive/group/archivename

Create another md5 hash of the tar file once it has been copied to the archive and compare it to the one you created earlier. This ensures the files are identical and there were no transfer issues.