Using Bioarchive
Contents
- 1 This page is a work in progress while we are getting the archive service into production
- 2 Request An Account
- 3 Change Password
- 4 Get S3 Keys
- 5 Option A - Get a Biocluster Account
- 6 Option B - Install Eon Browser
- 7 Option C - Install Command Line Tools
- 8 Using Eon Browser
- 9 Using Command Line Tools
- 10 References
This page is a work in progress while we are getting the archive service into production[edit]
This documentation will change frequently as we learn more about the archive, and what users need to access it. This is our attempt to get the most information out to users as quickly as possible.
Request An Account[edit]
Email help@igb.illinois.edu to request an account. The owners and primary users of Strongbox Archive will have accounts created for them automatically as we transfer data over to the new archive.
Change Password[edit]
Once we get the data transferred, or once you request an account, you will need to change your password HOW to ensure that access to your data is secure.
Get S3 Keys[edit]
Although a bit confusing, the main credentials to access your data are the S3 keys that are assigned to your account.You need to log into the web interface of the archive to obtain them.
- Log into https://bioarchive-login.igb.illinois.edu with the username and password created in the previous step, or continue from there if you just changed your password.
- In the top-left corner of the page, click on the "Action" drop down menu
- Select "Show S3 Credentials"
- A pop-up box will appear with your S3 Access ID and S3 Secret Key, copy these someplace safe, you will need them to transfer data to and from the archive.
Option A - Get a Biocluster Account[edit]
The tools to access the archive are already installed on Biocluster. To get an account on Biocluster please fill out the form at https://www.igb.illinois.edu/content/biocluster-account-form
Most users with data on Biocluster will want to use the eon browser directly from biocluster. This keeps users from transferring their data through their personal system and allows for use of the IGB high speed network. At the moment, the eonbrowser is only available on biotransfer.igb.illinois.edu. Eventually, this will become a module on biocluster.
Option B - Install Eon Browser[edit]
You can download and install the Eon Browser GUI for the archive on your local system by downloading the software at https://developer.spectralogic.com/clients/
This is useful if you are archiving data from your local computer or from systems other than Biocluster.
Note for OSX systems
If the system says that the file is corrupted, doing the following in a terminal session after you copy the program to your applications should fix it.
cd /Applications sudo xattr -cr BlackPearlEonBrowser.app
Option C - Install Command Line Tools[edit]
You can download and install the command line tools for the archive on your local system by downloading the software at https://developer.spectralogic.com/clients/
This is primarily for users who really know what they are doing on the cluster. This will allow for data to be archived from a compute node.
Using Eon Browser[edit]
The Eon Browser is the simplest method of transferring data into and out of the archive. CNRG expects that most users will use the EON Browser.
The Spectralogic User Guide for the Eon Browser is at https://developer.spectralogic.com/wp-content/uploads/2018/11/90990126_C_BlackPearlEonBrowser-UserGuide.pdf
Work will begin on our own user guide shortly.
Notes on Eon Browser[edit]
- If you save a profile as a default, it will automatically log into that profile when you open the eon browser
- When uploading a file, it will first go into the cache and then get written to tape. You can see this happening in the eon browser by looking at the storage locations column. A purple disk icon shows the data is in cache, and the green tape icon shows that the data is on tape.
- Also while uploading a file, the session will not show a complete transfer until all of the data uploaded is written to tape. This can take some time. If your session gets interrupted while data is copying from the cache to tape, it should complete successfully.
For the immediate future, the Eonbrowser is available on biotransfer or you can install it on your local system.
Using Command Line Tools[edit]
If you are using the archive on the biocluster, substitute bioarchive.data.igb.illinois.edu in as the name of the archive. This will give you the fastest access possible to the archive. If you are not using the cluster it is important that you use bioarchive.igb.illinois.edu as the name of the archive as you will not have access to the internal IGB research data network. The Spectralogic user guide for the command line tools is at https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/
Common Information[edit]
Many pieces of information are common across the use of this tool.
- Bold Text denotes items that need to be replace with information pertinent to you
- S3_Access_ID is the S3 access id for your account, you can get this by logging into the archive web interface
- S3_Secret_Key is the S3 secret key for your account, you can get this by logging into the archive web interface
- Bucket_Name is the S3 bucket you want to put your data in
- bioarchive.igb.illinois.edu is the dns name of the archive. If you are using in biocluster, you should use bioarchive.data.igb.illinois.edu instead
- insecure does not check the certificate of the service
Configuring Environment for Biocluster[edit]
In order to make using the command line tools easier on biocluster, we have created a program to set up environment variables to simplify usage. Please use the archive_environment script as shown below.
~]$ module load ds3_java_cli -bash: /home/a-m/danield/.archive/credentials: No such file or directory []$ archive_environment.py Directory exists, but file does not Enter S3 Access ID:S3_Access_ID Enter S3 Secret Key:S3_Secret_key Archive variables already loaded in .bashrc Environment updated for next login. To use new values in this session, please type 'source .archive/credentials'
Limits[edit]
There are not many, but as we find them, they will be placed here
- You can only put 500,000 objects at the time on the system. If you need up put more than 500,000 files and folders, you will need to split that up between two or more submissions.
Upload A Directory[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -b Bucket_Name -e bioarchive.data.igb.illinois.edu -c put_bulk -d Directory_Name
- put_bulk tells the program to upload everything inside the directory given
- Directory_Name is the directory to upload, note this will upload everything in this directory, not the directory itself
Upload One File[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -b Bucket_Name -e bioarchive.data.igb.illinois.edu -c put_object -o File_Name
- uploads a single file to the archive
- uploading a file with -o SomeDirectory/somefile will put the uploaded file in the directory SomeDirectory
- to put all files in a certain subdirectory, use the prefix option -p path/to/another/directory/
See Files in Archive[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -b Bucket_Name -e bioarchive.data.igb.illinois.edu -c get_bucket
- This will list all of the files in the archive and the total size of data in the bucket
Download a File From the Archive[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -b Bucket_Name -e bioarchive.data.igb.illinois.edu -c get_object -o File_Name
- This will download the object named File_Name
- Note if there are other read or write jobs running on the archive and the file is not in cache, this will appear to hang while it waits for the drives to become available.
Download a Directory From the Archive[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -b Bucket_Name -e bioarchive.data.igb.illinois.edu -c get_bulk
- This will download all objects in a bucket. Use carefully.
- You can add -p prefix to restrict which files get restored
List Available Buckets[edit]
ds3_java_cli -a S3_Access_ID -k S3_Secret_Key -e bioarchive.data.igb.illinois.edu -c get_service
- Returns a list of buckets that the user has permission to access.