
Computing Environment Setup for Bioinformatics and Computational BiologyIntroduction to the Shell | ||||||||
| Changed: | ||||||||
| < < | You will want to learn basic Unix commands and syntax for navigating your command-line. environment and running commands. These include things like copying files, interrupting a process, redirecting the output/input of a program. Here is one useful Introduction to Unix that covers these commands/concepts. | |||||||
| > > | You will want to learn basic Unix commands and syntax for navigating your command-line. environment and running commands. These include things like copying files, interrupting a process, redirecting the output/input of a program. | |||||||
| Changed: | ||||||||
| < < | Environmental Variables | |||||||
| > > |
| |||||||
| Added: | ||||||||
| > > |
| |||||||
| Changed: | ||||||||
| < < | Login Scripts and your | |||||||
| > > | Login Scripts | |||||||
| Added: | ||||||||
| > > | When you log into a remote computer or open a terminal window on your computer, you are entering a "shell" program that interprets Unix commands. There are a few different shells, but generally most are bash derivatives. Several "login scripts" are loaded in the new shell. All that means is that the shell commands in those files are run before it gives you a prompt and lets you start typing your own commands. These generally have names like .bashrc or .zshrc that depend on your shell. (But it can get complicated with some login scripts run globally when anyone logs into a computer and some only run for your user account.)
Setting up your
When you invoke commands such as | |||||||
Using TACCConnecting: Head Nodes and Compute Nodes.The current system on TACC that we use for most of our computing is lonestar6. It's address isls6.tacc.utexas.edu, so to ssh to it you use:
ssh <username>@ls6.tacc.utexas.edu idev -m 60 -m 60 is asking for a 60-minute slot on one compute node. Currently, you can make this as high as 120 minutes. For longer jobs, you will need to learn about submitting jobs to the queue.
After some informational messages, your terminal will pop up and now you can run commands on the COMPUTE NODE. They have a lot of cores (processors) and memory (RAM), so you can (and should) be running many jobs in parallel on one of these nodes if you are using it for compute. The idev command is mostly meant for development (that is, writing and testing new code/tools), but it can be used for short tasks, particularly if you are using a job manager like Snakemake that can intelligently use the resources.
If you get lost and can't remember if you are on the HEAD NODE or a COMPUTE NODE, you can use this command:
hostname | ||||||||
| Changed: | ||||||||
| < < | Filesystems: | |||||||
| > > | Filesystems: | |||||||
| Deleted: | ||||||||
| < < | ||||||||
Conda 101Whether on TACC or your own computer, you'll want to become familiar with the Conda package/environment manager. It makes it easy to install a wide variety of command-line tools in a way that prevents them from interfering with one another or other settings on your system.Set up Conda, Mamba, BiocondaConda is the main framework. Mamba speeds up Conda installs (once it is installed usemamba everywhere you would use conda for running commands. Bioconda makes it possible to install additional packages related to bioinformatics and computational biology. You'll want all three of these working together in your environment.
Using Conda EnvironmentsConda environments are a way to:
base conda environment will be loaded.
It's OK to install some general-purpose utilities in this environment, but you should generally *install each of your major bioinformatics tools (or sets of tools) in its own environment*.
This sequence of commands creates an environment called breseq-env and installs breseq in it:
mamba env create -n breseq-env mamba activate breseq-env mamba install breseq mamba install breseq=0.36.1 yaml file:
conda env export > environment.yml yaml file created by someone else, so you can reproduce their work!
conda env create -f environment.yml Miscellaneous Timesavers
See Also | ||||||||
Computing Environment Setup for Bioinformatics and Computational BiologyIntroduction to the Shell | ||||||||
| Added: | ||||||||
| > > | You will want to learn basic Unix commands and syntax for navigating your command-line. environment and running commands. These include things like copying files, interrupting a process, redirecting the output/input of a program. Here is one useful Introduction to Unix that covers these commands/concepts. | |||||||
Environmental Variables Login Scripts and your
| ||||||||
Computing Environment Setup for Bioinformatics and Computational BiologySo, you want to harness the immense power of bioinformatics and computational biology for your science? Here's some advice that will save you headaches and make your life easier when working in a Linux/UNIX environment. NOTE: These instructions are for setting up your Linux/UNIX computational environment. This might be on your local machine, but it could also be on a computing cluster like TACC that you log into remotely. | ||||||||
| Changed: | ||||||||
| < < | Instructions for setting up your local machine (for example, your laptop) are covered over at Computer Setup. | |||||||
| > > | Instructions for setting up your local machine (for example, your laptop) with programs for editing text, accessing remote servers, etc., are covered over at Computer Setup. | |||||||
Introduction to the Shell | ||||||||
| Added: | ||||||||
| > > | Environmental Variables | |||||||
Login Scripts and your
| ||||||||
| Added: | ||||||||
| > > | Connecting: Head Nodes and Compute Nodes.The current system on TACC that we use for most of our computing is lonestar6. It's address isls6.tacc.utexas.edu, so to ssh to it you use:
ssh <username>@ls6.tacc.utexas.edu idev -m 60 -m 60 is asking for a 60-minute slot on one compute node. Currently, you can make this as high as 120 minutes. For longer jobs, you will need to learn about submitting jobs to the queue.
After some informational messages, your terminal will pop up and now you can run commands on the COMPUTE NODE. They have a lot of cores (processors) and memory (RAM), so you can (and should) be running many jobs in parallel on one of these nodes if you are using it for compute. The idev command is mostly meant for development (that is, writing and testing new code/tools), but it can be used for short tasks, particularly if you are using a job manager like Snakemake that can intelligently use the resources.
If you get lost and can't remember if you are on the HEAD NODE or a COMPUTE NODE, you can use this command:
hostname Filesystems:
| |||||||
Conda 101 | ||||||||
| Added: | ||||||||
| > > | Whether on TACC or your own computer, you'll want to become familiar with the Conda package/environment manager. It makes it easy to install a wide variety of command-line tools in a way that prevents them from interfering with one another or other settings on your system. | |||||||
Set up Conda, Mamba, Bioconda | ||||||||
| Changed: | ||||||||
| < < | Conda is the main framework. Mamba speeds up Conda installs (once it is installed use mamba everywhere you would use conda for running commands. Bioconda makes it possible to install additional packages related to bioinformatics and computational biology. | |||||||
| > > | Conda is the main framework. Mamba speeds up Conda installs (once it is installed use mamba everywhere you would use conda for running commands. Bioconda makes it possible to install additional packages related to bioinformatics and computational biology. You'll want all three of these working together in your environment. | |||||||
| ||||||||
| Changed: | ||||||||
| < < |
| |||||||
| > > |
| |||||||
| Added: | ||||||||
| > > | conda install mamba mamba init%ENDCODE% | |||||||
| ||||||||
| Changed: | ||||||||
| < < | Use Environments | |||||||
| > > | Using Conda Environments | |||||||
| Added: | ||||||||
| > > | Conda environments are a way to:
base conda environment will be loaded.
It's OK to install some general-purpose utilities in this environment, but you should generally *install each of your major bioinformatics tools (or sets of tools) in its own environment*.
This sequence of commands creates an environment called breseq-env and installs breseq in it:
mamba env create -n breseq-env mamba activate breseq-env mamba install breseq mamba install breseq=0.36.1 yaml file:
conda env export > environment.yml yaml file created by someone else, so you can reproduce their work!
conda env create -f environment.yml | |||||||
Miscellaneous Timesavers | ||||||||
| Added: | ||||||||
| > > |
| |||||||
| ||||||||
| Added: | ||||||||
| > > |
See Also | |||||||
| ||||||||
| Changed: | ||||||||
| < < | Computational Environment Setup for Bioinformatics and Computational Biology | |||||||
| > > | Computing Environment Setup for Bioinformatics and Computational Biology | |||||||
Introduction to the Shell Login Scripts and your
| ||||||||
| Added: | ||||||||
| > > |
Set up Conda, Mamba, BiocondaConda is the main framework. Mamba speeds up Conda installs (once it is installed usemamba everywhere you would use conda for running commands. Bioconda makes it possible to install additional packages related to bioinformatics and computational biology.
Use Environments | |||||||
Miscellaneous Timesavers
| ||||||||
Computational Environment Setup for Bioinformatics and Computational BiologyIntroduction to the Shell Login Scripts and your
|