Archiving guidelines

This page provides a front-end to the long-term storage server of BioMina. Take the following guidelines into account when using it:

  1. Only keep necessary data: If not needed, don't store intermediate analysis results
  2. Compress your data: Please compress (gzip/zip/rar) your data before uploading to save disk space
  3. If you don't compress your data, we will compress them using gzip.
  4. Think twice before uploading: Deleting on the archive is NOT possible directly for safety. Deleting of files is queued, and processed after a few days.

Archiving principle

The archiving process involves the following steps:

  1. Request access to the Archive: send a mail to geert 'dot' vandeweyer 'at' uantwerpen 'dot' be with your galaxy-login name
  2. Upload your data to our Galaxy-FTP server (143.169.238.204, using you galaxy login credentials)
  3. Log in to the archive manager (this page) with the same credentials
  4. Go to 'Archive Data' :
    1. On the left: Select the files/folders to copy from the Galaxy-FTP to the archive servers
    2. On the right : Select the Target folder on the archive servers
    3. All folders and files are copied into the target folder. If you need a new (sub)folder, select the parent folder, check the 'Create subfolder' box and provide the new name.
    4. At the bottom of the page, select 'Queue Transfer'
  5. Go to 'View Queue': Your Archive process is either processing or waiting to be processed.
  6. Once finished, an email is sent to notify the user.
  7. WARNING: Do NOT start importing data queued for archiving into Galaxy before you recieve this email, or your data might get lost!

Alteratively, users with CLI access to the HPC infrastructure, should contact me on how to setup access to the archive from their home directory.

Getting Data From The Archive

This procedure follows the same protocol as the original archiving, but in the opposite direction:

  1. Log in to the archive manager (this page) with the same credentials
  2. Go to 'Request Archived Data' :
    1. On the left: Select the files/folders to copy from the archive servers to the Galaxy-FTP servers
    2. On the right : Select the Target folder on the Galaxy-FTP servers
    3. All folders and files are copied into the target folder. If you need a new (sub)folder, select the parent folder, check the 'Create subfolder' box and provide the new name.
    4. At the bottom of the page, select 'Request Data From Archive'
  3. Go to 'View Queue': Your data is either getting fetched, or waiting to be fetched.
  4. Once finished, an email is sent to notify the user. You can now log in to the Galaxy-FTP server to further access your data.
  5. WARNING: Do NOT start importing data queued for fetching into Galaxy before you recieve this email, or your data might get lost!

Remember:

The archive used the linux 'rsync -a' option set. This means:

  1. When re-archiving data: only changes need to be copied
  2. When archiving to an existing directory: new files are added, changes files (later timestamp) are overwritten
  3. No data is ever deleted on the target side

Remember BIS:

The archive is one of the only safe repository for your data:

  1. Data inside galaxy : no backup
  2. Data inside HPC home directories : no backup (see above)
  3. Data inside CLC-Genomics: backup present, but delete data in CLC, and they vanish on the backup as well (with delay)
  4. Public FTP server : no backup
  5. Raw data on the sequencing machines: No backup & regularly deleted