- Archive accounting site - https://biocluster.igb.illinois.edu/archive-accounting Here you can view your data usage and billing info.
- Spectra Logic T950 Tape Library - https://www.spectralogic.com/products/spectra-t950, with LTO-6 2.5TB tapes
- Strongbox T30 - https://strongboxdata.com/products/strongbox/
Sign-up & Billing
- To sign up, email firstname.lastname@example.org with: Your name, netid, CFOP, and optionally a list of other users who will need access to the archive.
- Archive usage cost is calculated based on the total size of all files stored on the archive. Each terabyte of data (rounded up to the next terabyte) will cost $200 per ten years. The first 50GB of space is free.
|0 - 50.0 GB||$0|
|50.0001 GB - 0.9999 TB||$200|
|1.0000 - 1.9999 TB||$400|
Accessing the Archive
The archive is mounted on the login nodes (biologin) of the Biocluster, at /archive. You will have a directory within /archive created for you when you sign up. Please see the Biocluster help page for instructions on how to get access to and connect to the Biocluster.
- All files sent to the archive are final and cannot be removed or modified.
- All files sent to the archive are to be compressed before being copied over (.gzip, .tgz, .zip, .bzip, .gz, .7z, .rar)
- It is highly recommended to use bzip2 for compression, as it offers the highest compress ratio and is built into the tar command.
- It is highly recommended to upload an md5 hash file and a file listing for each tar file you upload to the archive.
- All files uploaded to the archive must be smaller than 2.5 TB. It is advisable to have files be as close to 2.5 TB as possible.
- All archive files should be larger than 1.0 GB, with some exceptions. Exceptions include md5 hash files and file listings of tar files.
See the Biocluster help page for help connecting to the Biocluster and submitting jobs.
Organize the data you want to archive in a well-thought-out directory structure. Remember you won’t be able to change or rearrange any data once it’s on the archive.
We’re assuming here that your archive file will be called “archive,” and the directory containing your data is called “directory.” Modify those values appropriately when using the following commands. The following three operations should be submitted to the cluster as a job, and DO NOT run on the head node.
- Use tar to compress the files. The tar file should have the extension .tar.bz2
tar -cjf archive.tar.bz2 directory
- Extract the file list and output it to a text file
tar -tvf archive.tar.bz2 > archive.tar.bz2.txt
- Save the md5 hash of the archive to a file
md5sum archive.tar.bz2 > archive.tar.bz2.md5
Copy the three files you’ve just created to your space on the archive. This can only be done using the head node, as the archive is not mounted on any of the compute nodes.
cp archive.tar.bz2* /archive/group/archivename
Create another md5 hash of the tar file once it has been copied to the archive and compare it to the one you created earlier. This ensures the files are identical and there were no transfer issues.