Difference between revisions of "Using Bioarchive"

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
(Configuring Environment for Biocluster)
Line 1: Line 1:
 
=Bioarchive=
 
=Bioarchive=
Bioarchive is meant to be a place for a ten year archival of data at a low up front cost.  With most grants, PIs are responsible to preserving data well beyond the end of the funding period. With Bioarchive, storage space can be purchased before the end of the grant, then data can be safely kept there until 10 years from purchase. PIs might need to contact Grants and Contracts to get an exception for this, but in our experience, these are nearly universally granted. Some ongoing projects also create large amounts of data that do not need to be actively accessed over the lifetime of the project.  By storing data on Bioarchive, you can safely keep this data while not having to pay for the higher cost of storage on Biocluster or other storage services.
+
Bioarchive is set up for long-term data storage at a low upfront cost, which allows grant PIs to preserve data well beyond the funding period. Storage space can be purchased before the end of the grant, and data will be safely kept for 10 years from purchase. If a longer term of storage is needed, PIs may contact Grants and Contracts for approval. Large datasets that are not actively used should also be moved to Bioarchive to save for the higher storage cost on Biocluster or other storage services.Any type of data, '''except for the HIPAA data''', is allowed on Bioarchive.
 
 
==Types of Data that can go in Bioarchive==
 
Pretty much the only data that is not allowed on Bioarchive is that covered by the HIPAA.  All other data is permitted, but the real use case is data larger than the 15GB/file limitation in box.net.  Genomic data, image data, microscope data, are all perfectly fine, although you may need to compress and combine them before hand.  Especially if you have numerous smaller files.
 
 
 
 
==How Bioarchive Works==
 
==How Bioarchive Works==
Bioarchive is a disk to tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This system allows users to transfer data to the archive quickly since it is writing to spinning disk, and then the software can then finish moving the data to tape. Unless specified otherwise, all data on Bioarchive is written two two tapes. One of these tapes stays in the library in case the data is needed, while another tape is taken to a secure offsite storage facility. As more space to store data is needed in the archive, CNRG staff will order more tapes to store the data, making storage space virtually unlimited.
+
Bioarchive is a disk-to-tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This setup allows users to transfer data to the archive quickly because data is written to spinning disks, and then the software will move the data to the tape. Unless specified otherwise, all data on Bioarchive is written onto two tapes. One of them stays in the library in case the data is needed, while the other tape is taken to a secure offsite storage facility.
 
 
 
==How Bioarchive Billing Works==
 
==How Bioarchive Billing Works==
Due to the long term storage goals of Bioarchive, billing on Bioarchive is different than most services. Each private storage area is called a Bucket, and each Bucket needs to have a CFOP number associated with it. Before we charge on any bucket, we give users 80GB of space to make sure the service meets their needs and they want to proceed. Once a bucket passes this initial 80GB threshold and each subsequent 1 TB threshold, we bill for $200 for each new TB in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing a TB of data or paying for an additional 10 years.
+
Each private storage unit is called a Bucket (like a folder), and each bucket needs to be associated with a CFOP number. A user is granted 50GB of storage space before any charge. This is to ensure that the service meets their needs before commitment. Once a bucket passes this initial 50GB threshold, we bill $200 for each TB of data in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing the data or paying for an additional 10 years.To check your Bioarchive billing, go to https://www-app.igb.illinois.edu/bioarchive. if you need to update the CFOP associated with your account, emailhelp@igb.illinois.edu.​
 
+
=How do I start using Bioarchive=
=Request An Account=
+
Please fill out the Bioarchive Bucket Request Form:https://www.igb.illinois.edu/webform/archive_bucket_request_form
Please fill out our Archive Bucket Request Form at this link: https://www.igb.illinois.edu/webform/archive_bucket_request_form
+
You will receive an email from CNRG once your account is created.The email includes a temporary password. Change your password'''IMMEDIATELY'''. To do so, go tohttps://bioarchive-login.igb.illinois.eduand log in using your NetID and the temporary password.<div class="mw-parser-output">
<br>You will receive an email from CNRG once your account is created.
+
* Click on your NetID at the top-right corner and then choose "User Profile".
 
+
* Click on "Action" at the top-left corner and then choose "Edit".
=Change Password=
+
* Enter your temporary password in the Current Password field, and then create a new password and confirm it. Be sure to click "Save" when you are done.
Once your account is activated, you will need to change your password '''IMMEDIATELY''' to ensure that access to your data is secure.
+
</div>​
<br>To change your password, do the following:
+
==Get S3 Keys==
* Go to https://bioarchive-login.igb.illinois.edu and log in using your netID and the temporary password given to you when your account was set-up.
+
The credentials to access your data are the S3 keys that are assigned to your account. To obtain the keys:<div class="mw-parser-output">
* Choose "Action" at the top left and then choose "Edit"
+
* Log into<a class="external free" rel="nofollow">https://bioarchive-login.igb.illinois.eduwith</a> the username and password created in the previous step, or continue from there if you just changed your password.
* Enter your temporary password in the Current Password field, then create a new password and confirm it. Be sure to click "Save" when you are done.
+
* Click on "Action" at the top-left corner and then choose "Show S3 Credentials".
 
+
* A pop-up box will appear with your S3 Access ID and S3 Secret Key. Copy them to a safe place as you will need them to access Bioarchive.
=Get S3 Keys=
+
</div>​
Although a bit confusing, the main credentials to access your data are the S3 keys that are assigned to your account.You need to log into the web interface of the archive to obtain them.
+
=How to transfer data to Bioarchive=
* Log into [https://bioarchive-login.igb.illinois.edu https://bioarchive-login.igb.illinois.edu] with the username and password created in the previous step, or continue from there if you just changed your password.
+
We support data transfer from your local computer system and from Biocluster.To prepare for data transfer, you are recommended to put all data for one project in a directory and compress it into a file. Add a readme file to explain the contents inside. Put both the compressed and readme files in a folder.
* In the top-left corner of the page, click on the "Action" drop down menu
 
* Select "Show S3 Credentials"
 
* A pop-up box will appear with your S3 Access ID and S3 Secret Key, copy these someplace safe, you will need them to transfer data to and from the archive.
 
  
=Determine How You Will Use the Archive=
+
==Data transfer from a computer==
Depending on your situation, the best solution for using the archive could be either using the pre-installed tools on Biocluster, installing the Eon Browser on your own system, or installing the command line tools on your own system. If you are a Biocluster user and transferring data from Biocluster, then the Biocluster pre-installed tools will probably be best. If you are a Core Facilities user wanting to archive Microscope data from your personal system, then installing the Eon Browser on your system will probably be best.
+
The easiest way is to use Eon Browser. Download and install the Eon Browser GUI on your computer.Windows -BlackPearlEonBrowserSetup-5.0.12.msiMacOS -BlackPearlEonBrowser-5.0.12.dmg'''Note for OSX systems'''If the system says that the file is corrupted, open a terminal session after you copy the program to your applications. Type the following to the terminal.
==Option A - Get a Biocluster Account==
 
The tools to access the archive are already installed on Biocluster. To get an account on Biocluster please fill out the form at https://www.igb.illinois.edu/content/biocluster-account-form
 
  
Most users with data on Biocluster will want to use the Eon Browser directly from biocluster.  This keeps users from transferring their data through their personal system and allows for use of the IGB high speed network.  At the moment, the Eon Browser is only available on biotransfer.igb.illinois.edu.  Eventually, this will become a module on biocluster.
 
  
 
==Option B - Install Eon Browser==
 
==Option B - Install Eon Browser==
Line 40: Line 29:
  
 
Windows - [https://help.igb.illinois.edu/images/6/67/BlackPearlEonBrowserSetup-5.0.12.msi BlackPearlEonBrowserSetup-5.0.12.msi]
 
Windows - [https://help.igb.illinois.edu/images/6/67/BlackPearlEonBrowserSetup-5.0.12.msi BlackPearlEonBrowserSetup-5.0.12.msi]
 +
  
 
MacOS - [https://help.igb.illinois.edu/images/6/64/BlackPearlEonBrowser-5.0.12.dmg BlackPearlEonBrowser-5.0.12.dmg]
 
MacOS - [https://help.igb.illinois.edu/images/6/64/BlackPearlEonBrowser-5.0.12.dmg BlackPearlEonBrowser-5.0.12.dmg]
Line 48: Line 38:
  
 
If the system says that the file is corrupted, doing the following in a terminal session after you copy the program to your applications should fix it.
 
If the system says that the file is corrupted, doing the following in a terminal session after you copy the program to your applications should fix it.
 +
 +
 +
 +
  
 
<pre>
 
<pre>
Line 53: Line 47:
 
sudo xattr -cr BlackPearlEonBrowser.app
 
sudo xattr -cr BlackPearlEonBrowser.app
 
</pre>
 
</pre>
 
 
==Option C - Install Command Line Tools==
 
==Option C - Install Command Line Tools==
 
 
You can download and install the command line tools for the archive on your local system by downloading the software at https://developer.spectralogic.com/clients/
 
You can download and install the command line tools for the archive on your local system by downloading the software at https://developer.spectralogic.com/clients/
  
Line 66: Line 58:
  
 
Use this information when opening a new session on the archive:
 
Use this information when opening a new session on the archive:
*Name: Bioarchive
+
* Name: Bioarchive
*Data Path Address: bioarchive.igb.illinois.edu
+
* Data Path Address: bioarchive.igb.illinois.edu
*Port: 443
+
* Port: 443
*Check the box for Use SSL
+
* Check the box for Use SSL
*Access ID: '''Your S3 Access ID'''
+
* Access ID: '''Your S3 Access ID'''
*Secret Key: '''Your S3 Secret Key'''
+
* Secret Key: '''Your S3 Secret Key'''
*Check the box for Default Session if you want it to automatically connect to the archive when you open the program.
+
* Check the box for Default Session if you want it to automatically connect to the archive when you open the program.
*Click Save to save your session settings so that you don't need to put it in every time.
+
* Click Save to save your session settings so that you don't need to put it in every time.
*Click Open to open the archive session.
+
* Click Open to open the archive session.
  
 
You should see your local computer directories on the lefthand side, and your available buckets on the right.
 
You should see your local computer directories on the lefthand side, and your available buckets on the right.
 
To transfer data, just drag from your local computer directories and drop it in the appropriate bucket.  You will see the progress in the bottom of the program window.
 
To transfer data, just drag from your local computer directories and drop it in the appropriate bucket.  You will see the progress in the bottom of the program window.
 
==Notes on Eon Browser==
 
==Notes on Eon Browser==
*When uploading a file, it will first go into the cache and then get written to tape.  You can see this happening in the Eon Browser by looking at the storage locations column.  A purple disk icon shows the data is in cache, and the green tape icon shows that the data is on tape.
+
 
*Also while uploading a file, the session will not show a complete transfer until all of the data uploaded is written to tape.  This can take some time.  If your session gets interrupted while data is copying from the cache to tape, it should complete successfully.
+
* When uploading a file, it will first go into the cache and then get written to tape.  You can see this happening in the Eon Browser by looking at the storage locations column.  A purple disk icon shows the data is in cache, and the green tape icon shows that the data is on tape.
*If you have a significant amount of data to transfer, please check the sleep settings on your computer in advance of the transfer.  If your computer goes to sleep during a transfer, it will interrupt it.
+
* Also while uploading a file, the session will not show a complete transfer until all of the data uploaded is written to tape.  This can take some time.  If your session gets interrupted while data is copying from the cache to tape, it should complete successfully.
 +
* If you have a significant amount of data to transfer, please check the sleep settings on your computer in advance of the transfer.  If your computer goes to sleep during a transfer, it will interrupt it.
 
'''For the immediate future, the Eon Browser is available on biotransfer or you can install it on your local system.'''
 
'''For the immediate future, the Eon Browser is available on biotransfer or you can install it on your local system.'''
  
Line 90: Line 83:
 
==Common Information==
 
==Common Information==
 
Many pieces of information are common across the use of this tool.
 
Many pieces of information are common across the use of this tool.
*'''Bold Text''' denotes items that need to be replace with information pertinent to you
+
* '''Bold Text''' denotes items that need to be replace with information pertinent to you
*'''S3_Access_ID''' is the S3 access id for your account, you can get this by logging into the archive web interface
+
* '''S3_Access_ID''' is the S3 access id for your account, you can get this by logging into the archive web interface
*'''S3_Secret_Key''' is the S3 secret key for your account, you can get this by logging into the archive web interface
+
* '''S3_Secret_Key''' is the S3 secret key for your account, you can get this by logging into the archive web interface
*'''Bucket_Name''' is the S3 bucket you want to put your data in
+
* '''Bucket_Name''' is the S3 bucket you want to put your data in
*bioarchive.igb.illinois.edu is the dns name of the archive.  If you are using in biocluster, you should use bioarchive.data.igb.illinois.edu instead
+
* bioarchive.igb.illinois.edu is the dns name of the archive.  If you are using in biocluster, you should use bioarchive.data.igb.illinois.edu instead
  
 
If you are using the command line tools on biocluster, then you need to load the module by typing:
 
If you are using the command line tools on biocluster, then you need to load the module by typing:
<pre>module load ds3_java_cli</pre>
+
 
This will add the utilities to your environment.
+
 
 +
<pre>module load ds3_java_cli</pre>This will add the utilities to your environment.
  
 
==Configuring Environment for Biocluster==
 
==Configuring Environment for Biocluster==
 
In order to make using the command line tools easier on biocluster, we have created a program to set up environment variables to simplify usage.  Please use the archive_environment.py script as shown below.  Enter in your secret key and access key when prompted.
 
In order to make using the command line tools easier on biocluster, we have created a program to set up environment variables to simplify usage.  Please use the archive_environment.py script as shown below.  Enter in your secret key and access key when prompted.
<pre>
+
 
[]$ module load ds3_java_cli
+
 
 +
<pre>[]$ module load ds3_java_cli
 
-bash: /home/a-m/danield/.archive/credentials: No such file or directory
 
-bash: /home/a-m/danield/.archive/credentials: No such file or directory
 
[]$ archive_environment.py
 
[]$ archive_environment.py
Line 111: Line 106:
 
Archive variables already loaded in .bashrc
 
Archive variables already loaded in .bashrc
 
Environment updated for next login.  To use new values in this session, please type 'source .archive/credentials'
 
Environment updated for next login.  To use new values in this session, please type 'source .archive/credentials'
</pre>
+
</pre>After you run this command please be sure to type  
After you run this command please be sure to type  
 
<pre>source .archive/credentials</pre>
 
In all of the following commands below, we are assuming you are using biocluster to do the transfer and have run this program.  If this is not the case then you need to include: <br>
 
-a '''S3_Access_ID''' -k '''S3_Secret_Key''' -e bioarchive.igb.illinois.edu<br>
 
On every line immediately after the ds3_java_cli command and before the rest of the commands as shown below.
 
  
==Limits==
 
There are not many, but as we find them, they will be placed here
 
*You can only put 500,000 objects at a time on the system.  If you need up put more than 500,000 files and folders, you will need to split that up between two or more submissions.  However this is assuredly a sign that your data needs to be organized better.  Please contact us if you run into this issue.
 
  
==Upload Items in A Directory==
+
<pre>source .archive/credentials</pre><p class="mw_paragraph">In all of the following commands below, we are assuming you are using biocluster to do the transfer and have run this program.  If this is not the case then you need to include: <br class="single_linebreak"/>
ds3_java_cli -b '''Bucket_Name''' -c put_bulk -d '''Directory_Name'''
+
-a '''S3_Access_ID''' -k '''S3_Secret_Key''' -e bioarchive.igb.illinois.edu<br class="single_linebreak"/>
*put_bulk tells the program to upload everything inside the directory given
+
On every line immediately after the ds3_java_cli command and before the rest of the commands as shown below.
*'''Directory_Name''' is the directory to upload, note this will upload everything in this directory, not the directory itself
+
<h2>Limits</h2>There are not many, but as we find them, they will be placed here
 
+
<ul><li>You can only put 500,000 objects at a time on the system.  If you need up put more than 500,000 files and folders, you will need to split that up between two or more submissions.  However this is assuredly a sign that your data needs to be organized better.  Please contact us if you run into this issue.</li></ul>
==Upload One File==
+
<h2>Upload Items in A Directory</h2>ds3_java_cli -b '''Bucket_Name''' -c put_bulk -d '''Directory_Name'''
ds3_java_cli -b '''Bucket_Name''' -c put_object -o '''File_Name'''
+
<ul><li>put_bulk tells the program to upload everything inside the directory given</li><li>'''Directory_Name''' is the directory to upload, note this will upload everything in this directory, not the directory itself</li></ul>
*uploads a single file to the archive
+
<h2>Upload One File</h2>ds3_java_cli -b '''Bucket_Name''' -c put_object -o '''File_Name'''
*uploading a file with -o '''SomeDirectory/somefile'''  will put the uploaded file in the directory SomeDirectory
+
<ul><li>uploads a single file to the archive</li><li>uploading a file with -o '''SomeDirectory/somefile'''  will put the uploaded file in the directory SomeDirectory</li><li>to put all files in a certain subdirectory, use the prefix option -p '''path/to/another/directory/'''</li></ul>
*to put all files in a certain subdirectory, use the prefix option -p '''path/to/another/directory/'''
+
<h2>See Files in Archive</h2>ds3_java_cli -b '''Bucket_Name''' -c get_bucket
 
+
<ul><li>This will list all of the files in the archive and the total size of data in the bucket</li></ul>
==See Files in Archive==
+
<h2>Download a File From the Archive</h2>ds3_java_cli -b '''Bucket_Name''' -c get_object -o '''File_Name'''
ds3_java_cli -b '''Bucket_Name''' -c get_bucket
+
<ul><li>This will download the object named '''File_Name''' </li><li>Note if there are other read or write jobs running on the archive and the file is not in cache, this will appear to hang while it waits for the drives to become available.</li></ul>
*This will list all of the files in the archive and the total size of data in the bucket
+
<h2>Download a Directory From the Archive</h2>ds3_java_cli -b '''Bucket_Name''' -c get_bulk  
 
+
<ul><li>This will download all objects in a bucket.  '''Use carefully.'''</li><li>You can add -p '''prefix''' to restrict which files get restored</li></ul>
==Download a File From the Archive==
+
<h2>List Available Buckets</h2>ds3_java_cli -c get_service
ds3_java_cli -b '''Bucket_Name''' -c get_object -o '''File_Name'''
+
<ul><li>Returns a list of buckets that the user has permission to access.</li></ul><h2>Best Practices for Uploading Data</h2><h3>Create the Environment</h3>First make sure you set up your command line environment properly via the archive_accounting.py script outline above.  You can check if this is working by doing a:<br class="single_linebreak"/>
*This will download the object named '''File_Name'''  
+
ds3_java_cli -c get_service<br class="single_linebreak"/>
*Note if there are other read or write jobs running on the archive and the file is not in cache, this will appear to hang while it waits for the drives to become available.
 
 
 
==Download a Directory From the Archive==
 
ds3_java_cli -b '''Bucket_Name''' -c get_bulk  
 
*This will download all objects in a bucket.  '''Use carefully.'''
 
*You can add -p '''prefix''' to restrict which files get restored
 
 
 
==List Available Buckets==
 
ds3_java_cli -c get_service
 
*Returns a list of buckets that the user has permission to access.
 
==Best Practices for Uploading Data==
 
===Create the Environment===
 
First make sure you set up your command line environment properly via the archive_accounting.py script outline above.  You can check if this is working by doing a:<br>
 
ds3_java_cli -c get_service<br>
 
 
and making sure that your buckets show.
 
and making sure that your buckets show.
 
+
<h3>Upload With a Separate Folder</h3>To upload data, it is best that you have all data in one directory named something like "to_archive"  then you can put all of the data you want to upload to the archive in that folder, exactly like you want it to be on the archive.  Make sure these files do not already exist in those locations in the archive with either the Eon Browser or the command line tools.  When you are ready, then run this command:<br class="single_linebreak"/>
===Upload With a Separate Folder===
+
ds3_java_cli -b '''Bucket_Name''' -c put_bulk -d to_archive<br class="single_linebreak"/>
To upload data, it is best that you have all data in one directory named something like "to_archive"  then you can put all of the data you want to upload to the archive in that folder, exactly like you want it to be on the archive.  Make sure these files do not already exist in those locations in the archive with either the Eon Browser or the command line tools.  When you are ready, then run this command:<br>
 
ds3_java_cli -b '''Bucket_Name''' -c put_bulk -d to_archive<br>
 
 
Note that the to_archive folder does not exist in the bucket now.  This command moves everything inside of the to_archive folder into the archive, but not the to_archive folder itself.
 
Note that the to_archive folder does not exist in the bucket now.  This command moves everything inside of the to_archive folder into the archive, but not the to_archive folder itself.
 
+
<h1>References</h1><ul><li>[https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/ https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/]</li></ul>
= References =
 
* [https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/ https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/]
 

Revision as of 09:50, 29 January 2024

Bioarchive[edit]

Bioarchive is set up for long-term data storage at a low upfront cost, which allows grant PIs to preserve data well beyond the funding period. Storage space can be purchased before the end of the grant, and data will be safely kept for 10 years from purchase. If a longer term of storage is needed, PIs may contact Grants and Contracts for approval. Large datasets that are not actively used should also be moved to Bioarchive to save for the higher storage cost on Biocluster or other storage services.Any type of data, except for the HIPAA data, is allowed on Bioarchive.​

How Bioarchive Works[edit]

Bioarchive is a disk-to-tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This setup allows users to transfer data to the archive quickly because data is written to spinning disks, and then the software will move the data to the tape. Unless specified otherwise, all data on Bioarchive is written onto two tapes. One of them stays in the library in case the data is needed, while the other tape is taken to a secure offsite storage facility.​

How Bioarchive Billing Works[edit]

Each private storage unit is called a Bucket (like a folder), and each bucket needs to be associated with a CFOP number. A user is granted 50GB of storage space before any charge. This is to ensure that the service meets their needs before commitment. Once a bucket passes this initial 50GB threshold, we bill $200 for each TB of data in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing the data or paying for an additional 10 years.To check your Bioarchive billing, go to https://www-app.igb.illinois.edu/bioarchive. if you need to update the CFOP associated with your account, emailhelp@igb.illinois.edu.​

How do I start using Bioarchive[edit]

Please fill out the Bioarchive Bucket Request Form:https://www.igb.illinois.edu/webform/archive_bucket_request_form

You will receive an email from CNRG once your account is created.The email includes a temporary password. Change your passwordIMMEDIATELY. To do so, go tohttps://bioarchive-login.igb.illinois.eduand log in using your NetID and the temporary password.

  • Click on your NetID at the top-right corner and then choose "User Profile".
  • Click on "Action" at the top-left corner and then choose "Edit".
  • Enter your temporary password in the Current Password field, and then create a new password and confirm it. Be sure to click "Save" when you are done.

Get S3 Keys[edit]

The credentials to access your data are the S3 keys that are assigned to your account. To obtain the keys:

  • Log into<a class="external free" rel="nofollow">https://bioarchive-login.igb.illinois.eduwith</a> the username and password created in the previous step, or continue from there if you just changed your password.
  • Click on "Action" at the top-left corner and then choose "Show S3 Credentials".
  • A pop-up box will appear with your S3 Access ID and S3 Secret Key. Copy them to a safe place as you will need them to access Bioarchive.

How to transfer data to Bioarchive[edit]

We support data transfer from your local computer system and from Biocluster.To prepare for data transfer, you are recommended to put all data for one project in a directory and compress it into a file. Add a readme file to explain the contents inside. Put both the compressed and readme files in a folder.​

Data transfer from a computer[edit]

The easiest way is to use Eon Browser. Download and install the Eon Browser GUI on your computer.Windows -BlackPearlEonBrowserSetup-5.0.12.msiMacOS -BlackPearlEonBrowser-5.0.12.dmgNote for OSX systemsIf the system says that the file is corrupted, open a terminal session after you copy the program to your applications. Type the following to the terminal.​


Option B - Install Eon Browser[edit]

You can download and install the Eon Browser GUI for the archive on your local system.

Windows - BlackPearlEonBrowserSetup-5.0.12.msi


MacOS - BlackPearlEonBrowser-5.0.12.dmg

This is useful if you are archiving data from your local computer or from systems other than Biocluster.

Note for OSX systems

If the system says that the file is corrupted, doing the following in a terminal session after you copy the program to your applications should fix it.



cd /Applications
sudo xattr -cr BlackPearlEonBrowser.app

Option C - Install Command Line Tools[edit]

You can download and install the command line tools for the archive on your local system by downloading the software at https://developer.spectralogic.com/clients/

This is primarily for users who need or want command line access from their local computer.

Using Eon Browser[edit]

The Eon Browser is the simplest method of transferring data into and out of the archive. CNRG expects that most users will use the Eon Browser.

The Spectralogic User Guide for the Eon Browser is at https://developer.spectralogic.com/wp-content/uploads/2018/11/90990126_C_BlackPearlEonBrowser-UserGuide.pdf

Use this information when opening a new session on the archive:

  • Name: Bioarchive
  • Data Path Address: bioarchive.igb.illinois.edu
  • Port: 443
  • Check the box for Use SSL
  • Access ID: Your S3 Access ID
  • Secret Key: Your S3 Secret Key
  • Check the box for Default Session if you want it to automatically connect to the archive when you open the program.
  • Click Save to save your session settings so that you don't need to put it in every time.
  • Click Open to open the archive session.

You should see your local computer directories on the lefthand side, and your available buckets on the right. To transfer data, just drag from your local computer directories and drop it in the appropriate bucket. You will see the progress in the bottom of the program window.

Notes on Eon Browser[edit]

  • When uploading a file, it will first go into the cache and then get written to tape. You can see this happening in the Eon Browser by looking at the storage locations column. A purple disk icon shows the data is in cache, and the green tape icon shows that the data is on tape.
  • Also while uploading a file, the session will not show a complete transfer until all of the data uploaded is written to tape. This can take some time. If your session gets interrupted while data is copying from the cache to tape, it should complete successfully.
  • If you have a significant amount of data to transfer, please check the sleep settings on your computer in advance of the transfer. If your computer goes to sleep during a transfer, it will interrupt it.

For the immediate future, the Eon Browser is available on biotransfer or you can install it on your local system.

Using Command Line Tools[edit]

If you are using the archive on the biocluster, please be sure to configure your archive environment on biocluster as shown below. The Spectralogic user guide for the command line tools is at https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/

Common Information[edit]

Many pieces of information are common across the use of this tool.

  • Bold Text denotes items that need to be replace with information pertinent to you
  • S3_Access_ID is the S3 access id for your account, you can get this by logging into the archive web interface
  • S3_Secret_Key is the S3 secret key for your account, you can get this by logging into the archive web interface
  • Bucket_Name is the S3 bucket you want to put your data in
  • bioarchive.igb.illinois.edu is the dns name of the archive. If you are using in biocluster, you should use bioarchive.data.igb.illinois.edu instead

If you are using the command line tools on biocluster, then you need to load the module by typing:


module load ds3_java_cli

This will add the utilities to your environment.

Configuring Environment for Biocluster[edit]

In order to make using the command line tools easier on biocluster, we have created a program to set up environment variables to simplify usage. Please use the archive_environment.py script as shown below. Enter in your secret key and access key when prompted.


[]$ module load ds3_java_cli
-bash: /home/a-m/danield/.archive/credentials: No such file or directory
[]$ archive_environment.py
Directory exists, but file does not
Enter S3 Access ID:S3_Access_ID
Enter S3 Secret Key:S3_Secret_Key
Archive variables already loaded in .bashrc
Environment updated for next login.  To use new values in this session, please type 'source .archive/credentials'

After you run this command please be sure to type


source .archive/credentials

In all of the following commands below, we are assuming you are using biocluster to do the transfer and have run this program. If this is not the case then you need to include:

-a S3_Access_ID -k S3_Secret_Key -e bioarchive.igb.illinois.edu
On every line immediately after the ds3_java_cli command and before the rest of the commands as shown below.

Limits

There are not many, but as we find them, they will be placed here

  • You can only put 500,000 objects at a time on the system. If you need up put more than 500,000 files and folders, you will need to split that up between two or more submissions. However this is assuredly a sign that your data needs to be organized better. Please contact us if you run into this issue.

Upload Items in A Directory

ds3_java_cli -b Bucket_Name -c put_bulk -d Directory_Name

  • put_bulk tells the program to upload everything inside the directory given
  • Directory_Name is the directory to upload, note this will upload everything in this directory, not the directory itself

Upload One File

ds3_java_cli -b Bucket_Name -c put_object -o File_Name

  • uploads a single file to the archive
  • uploading a file with -o SomeDirectory/somefile will put the uploaded file in the directory SomeDirectory
  • to put all files in a certain subdirectory, use the prefix option -p path/to/another/directory/

See Files in Archive

ds3_java_cli -b Bucket_Name -c get_bucket

  • This will list all of the files in the archive and the total size of data in the bucket

Download a File From the Archive

ds3_java_cli -b Bucket_Name -c get_object -o File_Name

  • This will download the object named File_Name
  • Note if there are other read or write jobs running on the archive and the file is not in cache, this will appear to hang while it waits for the drives to become available.

Download a Directory From the Archive

ds3_java_cli -b Bucket_Name -c get_bulk

  • This will download all objects in a bucket. Use carefully.
  • You can add -p prefix to restrict which files get restored

List Available Buckets

ds3_java_cli -c get_service

  • Returns a list of buckets that the user has permission to access.

Best Practices for Uploading Data

Create the Environment

First make sure you set up your command line environment properly via the archive_accounting.py script outline above. You can check if this is working by doing a:

ds3_java_cli -c get_service
and making sure that your buckets show.

Upload With a Separate Folder

To upload data, it is best that you have all data in one directory named something like "to_archive" then you can put all of the data you want to upload to the archive in that folder, exactly like you want it to be on the archive. Make sure these files do not already exist in those locations in the archive with either the Eon Browser or the command line tools. When you are ready, then run this command:

ds3_java_cli -b Bucket_Name -c put_bulk -d to_archive
Note that the to_archive folder does not exist in the bucket now. This command moves everything inside of the to_archive folder into the archive, but not the to_archive folder itself.

References