Using Bioarchive: Difference between revisions

From Carl R. Woese Institute for Genomic Biology - University of Illinois Urbana-Champaign
Jump to navigation Jump to search
Angolz (talk | contribs)
Dslater (talk | contribs)
 
(10 intermediate revisions by 2 users not shown)
Line 2: Line 2:
__TOC__
__TOC__
=Bioarchive=
=Bioarchive=
Bioarchive is set up for long-term data storage at a low upfront cost, which allows grant PIs to preserve data well beyond the funding period. Storage space can be purchased before the end of the grant, and data will be safely kept for 10 years from purchase. If a longer term of storage is needed, PIs may contact Grants and Contracts for approval. Large datasets that are not actively used should also be moved to Bioarchive to save for the higher storage cost on Biocluster or other storage services.
Bioarchive is set up for long-term data storage at a low upfront cost, which allows grant PIs to preserve data well beyond the funding period. Storage space can be purchased before the end of the grant, and data will be safely kept for 10 years from purchase. If a longer term of storage is needed, PIs may contact Grants and Contracts for approval. Large datasets that are not actively used should also be moved to Bioarchive to save on the higher storage cost on Biocluster or other storage services.


Any type of data, '''except for the HIPAA data''', is allowed on Bioarchive.​
Any type of data, '''except for the HIPAA data''', is allowed on Bioarchive.​
==How Bioarchive Works==
==How Bioarchive Works==
Bioarchive is a disk-to-tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This setup allows users to transfer data to the archive quickly because data is written to spinning disks, and then the software will move the data to the tape. Unless specified otherwise, all data on Bioarchive is written onto two tapes. One of them stays in the library in case the data is needed, while the other tape is taken to a secure offsite storage facility.​
Bioarchive is a disk-to-tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This setup allows users to transfer data to the archive quickly because data is written to spinning disks, and then the software will move the data to the tape. Unless specified otherwise, all data on Bioarchive is written onto two tapes. One of them stays in the library in case the data is needed, while the other tape is taken to a secure off-site storage facility.​
==How Bioarchive Billing Works==
==How Bioarchive Billing Works==
Each private storage unit is called a Bucket (like a folder), and each bucket needs to be associated with a CFOP number. A user is granted 50GB of storage space before any charge. This is to ensure that the service meets their needs before commitment. Once a bucket passes this initial 50GB threshold, we bill $200 for each TB of data in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing the data or paying for an additional 10 years.
Each private storage unit is called a Bucket (like a folder), and each bucket needs to be associated with a CFOP number. A user is granted 50GB of storage space before any charge. This is to ensure that the service meets their needs before commitment. Once a bucket passes this initial 50GB threshold, we bill $200 for each TB of data in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing the data or paying for an additional 10 years.


To check your Bioarchive billing, go to https://www-app.igb.illinois.edu/bioarchive. if you need to update the CFOP associated with your account, email help@igb.illinois.edu.​
To check your Bioarchive billing, go to https://www-app.igb.illinois.edu/bioarchive. If you need to update the CFOP associated with your account, email help@igb.illinois.edu.​
=How do I start using Bioarchive=
=How do I start using Bioarchive=
Please fill out the Bioarchive Bucket Request Form: https://www.igb.illinois.edu/webform/archive_bucket_request_form
Please fill out the Bioarchive Bucket Request Form: https://www.igb.illinois.edu/webform/archive_bucket_request_form
Line 17: Line 17:
The initial email from CNRG includes a temporary password. If you need us to reset your password, please email [mailto:help@igb.illinois.edu help@igb.illinois.edu] and we can send you a temporary password.
The initial email from CNRG includes a temporary password. If you need us to reset your password, please email [mailto:help@igb.illinois.edu help@igb.illinois.edu] and we can send you a temporary password.
Change your password '''IMMEDIATELY'''. To do so, go to https://bioarchive-login.igb.illinois.edu and log in using your NetID and the temporary password.<div class="mw-parser-output">
Change your password '''IMMEDIATELY'''. To do so, go to https://bioarchive-login.igb.illinois.edu and log in using your NetID and the temporary password.<div class="mw-parser-output">
* Click on your NetID at the top-right corner and then choose "User Profile".
* Click on your NetID in the top-right corner and then choose "User Profile".
* Click on "Action" at the top-left corner and then choose "Edit".
* Click on "Action" at the top-left corner and then choose "Edit".
* Enter your temporary password in the Current Password field, and then create a new password and confirm it. Be sure to click "Save" when you are done.
* Enter your temporary password in the Current Password field, and then create a new password and confirm it. Be sure to click "Save" when you are done.
Line 26: Line 26:
* Log into https://bioarchive-login.igb.illinois.edu with the netID and password created in the previous step, or continue from there if you just changed your password.
* Log into https://bioarchive-login.igb.illinois.edu with the netID and password created in the previous step, or continue from there if you just changed your password.
* Click on "Action" at the top-left corner and then choose "Show S3 Credentials".
* Click on "Action" at the top-left corner and then choose "Show S3 Credentials".
* A pop-up box will appear with your S3 Access ID and S3 Secret Key. Copy them to a safe place as you will need them to access Bioarchive.
* A pop-up box will appear with your S3 Access ID and S3 Secret Key. Copy them to a safe place, as you will need them to access Bioarchive.


=How to transfer data to Bioarchive=
=How to transfer data to Bioarchive=
We support data transfer from your local computer system and from Biocluster. To prepare for data transfer, you are recommended to put all data for one project in a directory and compress it into a file. Add a readme file to explain the contents inside. Put both the compressed and readme files in a folder.​
We support data transfer from your local computer system and Biocluster. To prepare for data transfer, you are recommended to put all the data for one project in a directory and compress it into a file. Add a readme file to explain the contents inside. Put both the compressed and readme files in a folder.​
 
== Best Practices ==
If you are transferring data from your computer, it is highly recommended to use an Ethernet connection.  Wireless connections are much slower and less reliable.  With the amount of data being transferred, wireless is not dependable enough to be reliable.


==Data transfer from a computer==
==Data transfer from a computer==
Line 50: Line 53:
* In the search box, search for '''Spectra S3''' and check the box for '''Spectra S3 (HTTPS)'''
* In the search box, search for '''Spectra S3''' and check the box for '''Spectra S3 (HTTPS)'''
[[File:Mac cyberduck3.png|600px]]
[[File:Mac cyberduck3.png|600px]]
* Once that is complete, you can close the settings pop up and move onto the next step, '''Using CyberDuck'''
* Once that is complete, you can close the settings pop-up and move on to the next step, '''Using CyberDuck'''
</div>
</div>
</div>
</div>
Line 64: Line 67:
* In the search box, search for '''Spectra S3''' and check the box for '''Spectra S3 (HTTPS)'''
* In the search box, search for '''Spectra S3''' and check the box for '''Spectra S3 (HTTPS)'''
[[File:Windows cyberduck3.png|600px]]
[[File:Windows cyberduck3.png|600px]]
* Once that is complete, you can close the settings pop up and move onto the next step, '''Using CyberDuck'''
* Once that is complete, you can close the settings pop-up and move on to the next step, '''Using CyberDuck'''
</div></div>
</div></div>


Line 82: Line 85:
* Click '''Connect'''
* Click '''Connect'''


* After it is connected, you will see a list of buckets.  You can drag and drop anything into the bucket that you want to upload, or drag and drop anything from in the bucket that you are needing to download.  You can also choose '''File > Download''' or '''File > Upload'''
* After it is connected, you will see a list of buckets.  You can drag and drop anything into the bucket from your computer that you want to upload, or drag and drop anything from inside the bucket to your local computer that you need to download.  You can also choose '''File > Download''' or '''File > Upload'''


* '''To save the session so that you can open it easily next time, you can choose Bookmark > New Bookmark, which will create a bookmark for the connection for next time.  Leave all information the same when creating the bookmark to ensure it saves all the proper settings
* To save the session so that you can open it easily next time, you can choose Bookmark > New Bookmark, which will create a bookmark for the connection for next time.  You will leave all the information the same except for one setting.  At the bottom, choose '''More Options''' and then for the '''Transfer Files''' field, choose '''Open multiple connections''' from the dropdown menu.
[[File:Bookmark cyberduck mac.png|500px|center]]
</div>
</div>
</div>
</div>
Line 92: Line 94:
====Windows====
====Windows====
<div class="mw-collapsible-content">
<div class="mw-collapsible-content">
[[File:Using cyberduck windows.PNG|400px|right]]
* Open CyberDuck
* Open CyberDuck
* Choose '''Edit > Preferences''' from the menu
* Choose '''Open Connection'''
[[File:Windows cyberduck1.png|600px]]
* In the box the pops up, choose '''Spectra S3''' from the dropdown menu and input the following information.  
* In the settings box that pops up, choose the '''Profiles''' tab.
** Server: bioarchive.igb.illinois.edu
[[File:Windows cyberduck2.png|600px]]
** Port: 443
* In the search box, search for '''Spectra S3''' and check the box for '''Spectra S3 (HTTPS)'''
** S3 Access ID: '''Your S3 Access ID'''
[[File:Windows cyberduck3.png|600px]]
** S3 Secret Key: '''Your S3 Secret Key'''
* Once that is complete, you can close the settings pop up and move onto the next step, '''Using CyberDuck'''
** Check the box for '''Save Password''' if you would like CyberDuck to remember this information.
</div></div>
* Click '''Connect'''


Use this information when opening a new session on the archive:
* After it is connected, you will see a list of buckets.  You can drag and drop anything into the bucket from your computer that you want to upload, or drag and drop anything from inside the bucket to your local computer that you need to download. You can also choose '''File > Download''' or '''File > Upload'''
* Name: Bioarchive
* Data Path Address: bioarchive.igb.illinois.edu
* Port: 443
* Check the box for SSL
* Access ID: '''Your S3 Access ID'''
* Secret Key: '''Your S3 Secret Key'''
* Check the box for Default Session if you want it to automatically connect to Bioarhive when you open the program.
* Click Save to save your session settings so that future connections will use the same info.
* Click Open to open the archive session.


* To save the session so that you can open it easily next time, you can choose Bookmark > New Bookmark, which will create a bookmark for the connection for next time.  You will leave all the information the same except for one setting.  At the bottom, choose '''More Options''' and then for the '''Transfer Files''' field, choose '''Open multiple connections''' from the dropdown menu.
</div></div>


You should see your local computer directories on the left, and your available buckets on the right. To transfer data, just drag from your local computer directories and drop it in the appropriate bucket. You will see the progress in the bottom of the program window.
While transferring a file, the session will not show a complete transfer until all data is written to the tape. This can take some time. Please check the sleep settings on your computer in advance of the transfer, which can get interrupted if your computer goes to sleep. Using an ethernet instead of a wireless connection would also ensure a more stable connection during the transfer.
 
When transferring a file, it will first go into the cache and then get written to tape. You can see this happening in the storage locations column. A purple disk icon shows the data is in cache, and the green tape icon shows that the data is on tape.


While transferring a file, the session will not show a complete transfer until all data is written to the tape. This can take some time. Please check the sleep settings on your computer in advance of the transfer, which can get interrupted if your computer goes to sleep. Using an ethernet instead of a wireless connection would also facilitate data transfer.
Another way to transfer data from your local system to Bioarchive is to use a command-line tool. To do so, download and install the command line tool on your local system: https://developer.spectralogic.com/clients. The Spectralogic user guide for the command line tool is at https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/. You may need to reconcile your local computer environment with the tool for this to work. Thus, if you aren’t sure about the difference between the two environments, use Eon Browser. The CNRG would not provide further support.​
 
Another way to transfer data from your local system to Bioarchive is to use a command line tool. To do so, download and install the command line tool on your local system: https://developer.spectralogic.com/clients/. The Spectralogic user guide for the command line tool is at https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/. You may need to reconcile your local computer environment with the tool for this to work. Thus, if you aren’t sure about the difference between the two environments, use Eon Browser. The CNRG would not provide further support.​


==Data transfer from Biocluster==
==Data transfer from Biocluster==
'''If this is your first time using Bioarchive, you will need to run the below script.  If this is NOT your first time using Bioarchive, you can skip this step.''' This will add the utilities to your environment. Then, load the program we created to set up the environment variables. Enter your secret key and access key when prompted. All the bold text in commands below denotes items that need to be replaced with information pertinent to you.
The tool to access Bioarchive is already installed on Biocluster. You will need to load the module.
<code class="mw-code" style="display:block">​​​​​​module load ds3_java_cli
</code>
'''If this is your first time using Bioarchive, you will need to run the below script.  If this is NOT your first time using Bioarchive, you can skip this step.''' This will add the utilities to your environment. Then, load the program we created to set up the environment variables. Enter your secret key and access key when prompted. All the bold text in the commands below denotes items that need to be replaced with information pertinent to you.


<code class="mw-code" style="display:block">[]$ archive_environment.py
<code class="mw-code" style="display:block">[]$ archive_environment.py
Line 132: Line 127:
Environment updated for next login.  To use new values in this session, please type 'source .archive/credentials'
Environment updated for next login.  To use new values in this session, please type 'source .archive/credentials'
</code>
</code>


Be sure to type:
Be sure to type:
<code class="mw-code" style="display:block">source .archive/credentials
<code class="mw-code" style="display:block">source .archive/credentials
</code>​After your first connection to Bioarchive, you will not be asked for the S3 Access ID and S3 Secret Key again if you connect from the same Biocluster account.
</code>​After your first connection to Bioarchive, you will not be asked for the S3 Access ID and S3 Secret Key again if you connect from the same Biocluster account.
The tool to access Bioarchive is already installed on Biocluster. You will need to load the module.
<code class="mw-code" style="display:block">​​​​​​module load ds3_java_cli
</code>


To list available buckets:
To list available buckets:

Latest revision as of 09:56, 18 July 2025

Bioarchive

Bioarchive is set up for long-term data storage at a low upfront cost, which allows grant PIs to preserve data well beyond the funding period. Storage space can be purchased before the end of the grant, and data will be safely kept for 10 years from purchase. If a longer term of storage is needed, PIs may contact Grants and Contracts for approval. Large datasets that are not actively used should also be moved to Bioarchive to save on the higher storage cost on Biocluster or other storage services.

Any type of data, except for the HIPAA data, is allowed on Bioarchive.​

How Bioarchive Works

Bioarchive is a disk-to-tape system that utilizes a Spectralogic Black Perl disk system and a Spectralogic T950 tape library with LTO8 tape drives. This setup allows users to transfer data to the archive quickly because data is written to spinning disks, and then the software will move the data to the tape. Unless specified otherwise, all data on Bioarchive is written onto two tapes. One of them stays in the library in case the data is needed, while the other tape is taken to a secure off-site storage facility.​

How Bioarchive Billing Works

Each private storage unit is called a Bucket (like a folder), and each bucket needs to be associated with a CFOP number. A user is granted 50GB of storage space before any charge. This is to ensure that the service meets their needs before commitment. Once a bucket passes this initial 50GB threshold, we bill $200 for each TB of data in the next monthly billing period. Ten years after a Bucket is billed, the PI will be offered the choice of removing the data or paying for an additional 10 years.

To check your Bioarchive billing, go to https://www-app.igb.illinois.edu/bioarchive. If you need to update the CFOP associated with your account, email help@igb.illinois.edu.​

How do I start using Bioarchive

Please fill out the Bioarchive Bucket Request Form: https://www.igb.illinois.edu/webform/archive_bucket_request_form You will receive an email from CNRG once your account is created.

Reset Your Password

The initial email from CNRG includes a temporary password. If you need us to reset your password, please email help@igb.illinois.edu and we can send you a temporary password.

Change your password IMMEDIATELY. To do so, go to https://bioarchive-login.igb.illinois.edu and log in using your NetID and the temporary password.

  • Click on your NetID in the top-right corner and then choose "User Profile".
  • Click on "Action" at the top-left corner and then choose "Edit".
  • Enter your temporary password in the Current Password field, and then create a new password and confirm it. Be sure to click "Save" when you are done.

Get S3 Keys

The credentials to access your data are the S3 keys that are assigned to your account. To obtain the keys:

  • Log into https://bioarchive-login.igb.illinois.edu with the netID and password created in the previous step, or continue from there if you just changed your password.
  • Click on "Action" at the top-left corner and then choose "Show S3 Credentials".
  • A pop-up box will appear with your S3 Access ID and S3 Secret Key. Copy them to a safe place, as you will need them to access Bioarchive.

How to transfer data to Bioarchive

We support data transfer from your local computer system and Biocluster. To prepare for data transfer, you are recommended to put all the data for one project in a directory and compress it into a file. Add a readme file to explain the contents inside. Put both the compressed and readme files in a folder.​

Best Practices

If you are transferring data from your computer, it is highly recommended to use an Ethernet connection. Wireless connections are much slower and less reliable. With the amount of data being transferred, wireless is not dependable enough to be reliable.

Data transfer from a computer

The easiest way is to use CyberDuck. Download and install CyberDuck on your computer.

You can download CyberDuck from https://cyberduck.io/download/. If you are using a Windows computer, make sure to choose the download for Windows. If you are using a Mac computer, make sure to choose the download for macOS.

Install CyberDuck on your computer.

How to Set Up CyberDuck for the First Time

  • You only need to do this the first time you are setting up CyberDuck. After you do this once, CyberDuck should save the profile for future use.

macOS

  • Open CyberDuck
  • Choose File > Settings

  • In the settings box that pops up, choose the Profiles tab.

  • In the search box, search for Spectra S3 and check the box for Spectra S3 (HTTPS)

  • Once that is complete, you can close the settings pop-up and move on to the next step, Using CyberDuck

Windows

  • Open CyberDuck
  • Choose Edit > Preferences from the menu

  • In the settings box that pops up, choose the Profiles tab.

  • In the search box, search for Spectra S3 and check the box for Spectra S3 (HTTPS)

  • Once that is complete, you can close the settings pop-up and move on to the next step, Using CyberDuck

Using CyberDuck

macOS

  • Open CyberDuck
  • Choose Open Connection
  • In the box the pops up, choose Spectra S3 from the dropdown menu and input the following information.
    • Server: bioarchive.igb.illinois.edu
    • Port: 443
    • S3 Access ID: Your S3 Access ID
    • S3 Secret Key: Your S3 Secret Key
    • Check the box for Add to Keychain if you would like CyberDuck to remember this information.
  • Click Connect
  • After it is connected, you will see a list of buckets. You can drag and drop anything into the bucket from your computer that you want to upload, or drag and drop anything from inside the bucket to your local computer that you need to download. You can also choose File > Download or File > Upload
  • To save the session so that you can open it easily next time, you can choose Bookmark > New Bookmark, which will create a bookmark for the connection for next time. You will leave all the information the same except for one setting. At the bottom, choose More Options and then for the Transfer Files field, choose Open multiple connections from the dropdown menu.

Windows

  • Open CyberDuck
  • Choose Open Connection
  • In the box the pops up, choose Spectra S3 from the dropdown menu and input the following information.
    • Server: bioarchive.igb.illinois.edu
    • Port: 443
    • S3 Access ID: Your S3 Access ID
    • S3 Secret Key: Your S3 Secret Key
    • Check the box for Save Password if you would like CyberDuck to remember this information.
  • Click Connect
  • After it is connected, you will see a list of buckets. You can drag and drop anything into the bucket from your computer that you want to upload, or drag and drop anything from inside the bucket to your local computer that you need to download. You can also choose File > Download or File > Upload
  • To save the session so that you can open it easily next time, you can choose Bookmark > New Bookmark, which will create a bookmark for the connection for next time. You will leave all the information the same except for one setting. At the bottom, choose More Options and then for the Transfer Files field, choose Open multiple connections from the dropdown menu.

While transferring a file, the session will not show a complete transfer until all data is written to the tape. This can take some time. Please check the sleep settings on your computer in advance of the transfer, which can get interrupted if your computer goes to sleep. Using an ethernet instead of a wireless connection would also ensure a more stable connection during the transfer.

Another way to transfer data from your local system to Bioarchive is to use a command-line tool. To do so, download and install the command line tool on your local system: https://developer.spectralogic.com/clients. The Spectralogic user guide for the command line tool is at https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/. You may need to reconcile your local computer environment with the tool for this to work. Thus, if you aren’t sure about the difference between the two environments, use Eon Browser. The CNRG would not provide further support.​

Data transfer from Biocluster

The tool to access Bioarchive is already installed on Biocluster. You will need to load the module. ​​​​​​module load ds3_java_cli If this is your first time using Bioarchive, you will need to run the below script. If this is NOT your first time using Bioarchive, you can skip this step. This will add the utilities to your environment. Then, load the program we created to set up the environment variables. Enter your secret key and access key when prompted. All the bold text in the commands below denotes items that need to be replaced with information pertinent to you.

[]$ archive_environment.py Directory exists, but file does not Enter S3 Access ID: S3_Access_ID Enter S3 Secret Key: S3_Secret_Key Archive variables already loaded in .bashrc Environment updated for next login. To use new values in this session, please type 'source .archive/credentials'

Be sure to type: source .archive/credentials ​After your first connection to Bioarchive, you will not be asked for the S3 Access ID and S3 Secret Key again if you connect from the same Biocluster account.

To list available buckets: ds3_java_cli -c get_service ​It will only return a list of buckets that the user has permission to access.​​


To view files in a bucket on Bioarchive: ds3_java_cli -b Bucket_Name -c get_bucket This will list all files, their sizes, their owners, and the total size of data in the bucket.


To transfer all the contents in a directory to Bioarchive: ds3_java_cli -b Bucket_Name -c put_bulk -d Directory_Name This will transfer everything in this directory, but not the directory itself.


To transfer a file to Bioarchive: ds3_java_cli -b Bucket_Name -c put_object -o File_Name


To transfer a file to a certain directory on Bioarchive: ds3_java_cli -b Bucket_Name -c put_object -o File_Name -p Directory/


To transfer a file from Bioarchive to Biocluster: ds3_java_cli -b Bucket_Name -c get_object -o File_Name


​To transfer all files and directories in a bucket from Bioarchive to Biocluster: ds3_java_cli -b Bucket_Name -c get_bulk ​You can add a -p tag to restrict the transfer scope to a directory.

References

https://developer.spectralogic.com/java-command-line-interface-cli-reference-and-examples/