Header left.png

HPC Services FAQ

From Systems Group
Jump to: navigation, search

Back to HPC Services


This page contains questions one may have while utilizing HPC services. Please note that this site assumes your default shell is tcsh.

Text formatted like this contains commands for you to type or copy/paste into the shell.


Frequently Asked Questions

Q: I'm prompted for my password every time I try to connect to a compute node. How do I prevent this?


A: There are a few steps you will need to complete to enable passwordless logins:

1. Start the key generator:

ssh-keygen -t rsa

This will create your RSA Key Pair and will ask for the file name and file path to save the RSA key. Press enter to accept the defaults provided by the key generator which will be /home/username/.ssh. Please enter a passphrase when prompted. DO NOT leave this blank! It used to protect your private key and make it more difficult for someone to compromise your account.


2. Add your key to your list of authorized keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2

This will add your newly generated public key to the list of keys allowed for pubkey authentication.


3. Set up ssh-agent:

For tcsh:

cat >> ~/.tcshrc << "EOF" 
if ($TERM != "dumb") then 
 if (! $?SSH_AUTH_SOCK) then 
 eval `ssh-agent -c` 
 ssh-add 
 endif  
endif 
"EOF"

For bash:

cat >> ~/.bashrc << "EOF" 
if [ "$TERM" != "dumb" ]; then  
 if [ ! $SSH_AUTH_SOCK ]; then  
   eval `ssh-agent -s`  
   ssh-add  
 fi   
fi 
"EOF"

This will start the ssh-agent every time you log in. It will prompt you for the passphrase you used when you generated your key pair. You won't need to type it again for the remainder of your session.

For tcsh:

cat >> ~/.logout << "EOF"
if ($?SSH_AGENT_PID) then
 kill $SSH_AGENT_PID
endif
"EOF"
chmod 700 ~/.logout


For bash:

cat >> ~/.bash_logout << "EOF"
if [$SSH_AGENT_PID] then
 kill $SSH_AGENT_PID
fi
"EOF"
chmod 700 ~/.bash_logout

This will configure your account to close any of your active ssh-agent sessions upon logout and set the proper permissions on the .logout file.


Q: When I try to ssh in to a compute node I get an error about the REMOTE HOST IDENTIFICATION and I'm returned to a prompt on the head node. How do I fix this?


A: To fix this, you can copy/paste the following into your shell:

cat >> ~/.ssh/config << "EOF"
Host compute-0-*
UserKnownHostsFile=/etc/ssh/ssh_known_hosts
Host *
UserKnownHostsFile=~/.ssh/known_hosts
ForwardAgent yes
ForwardX11 no
"EOF"


Q: Do I need to use a specific compiler for MPI programs?


A: Yes. The compiler must match your chosen MPI implementation. i.e, You cannot compile a program with the MPICH2 version of the compiler and use the OpenMPI version of 'mpiexec' to run it.

Below is a table showing which MPI implementation corresponds to which compiler and execution program for someone trying to run a parallel program written in C.

Implementation Compiler Execution Program
MPICH /export/software/mpich/bin/mpicc /export/software/mpich/bin/mpirun
MPICH2 /export/software/mpich2/bin/mpicc /export/software/mpich2/bin/mpiexec
OpenMPI /export/software/openmpi/bin/mpicc /export/software/openmpi/bin/mpirun


Q: Should I submit jobs to SGE or run programs using mpiexec, mpirun, etc.?


A: To ensure proper resource utilization, we recommend that you run your parallel programs through SGE.


Q: How do I submit a job using SGE?


A: You should use the qsub command:

qsub /path/to/yourjobscript


Q: What should an SGE parallel job script look like?


A: The following is an example script using the OpenMPI implementation of MPI.

# The shell to be used for job execution 
#$ -S /bin/bash 
# Pass on your environment variables 
#$ -V 
# Set the name of the job 
#$ -N YourJob 
# Set the working directory 
#$ -wd /path/to/some/directory 
# Merge stdout and stderr 
#$ -j y 
# Send email to your CS account 
#$ -M youremail@cs.odu.edu 
# Send email when the job begins and when it has finished 
#$ -m be 
# Set the parallel environment for OpenMPI and the number of slots 
#$ -pe orte 256 

/export/software/openmpi/bin/mpirun -np 256 /path/to/your/program 
 

Please see the Job Script Generator page for help creating SGE job scripts.


Q: Do I need to specify a machine file if I'm submitting the job to SGE?


A: No. In fact, doing so will cause the job to fail to run properly as the SGE parallel environment "mpich" generates one for you.


Q: How do I view the output of jobs submitted through SGE?


A: The output written to stdout will be contained in a file such as "jobname.o#" where # is the job number. If you did not specify the option to merge stderr and stdout into one file, any output written to stderr will be in a file such as "jobname.e#".


Q: Where can I find the SGE output files?


A: The default location for the output files is "/home/$username". If your job script specifies a working directory, the output files will be found in that directory.


Q: How do I control the number of processes per node when running a job with SGE?


A: You can edit your job script so that the line where it calls mpirun or mpiexec looks similar to the following to run the job using only 2 slots per host (the total number of slots you've requested and the number of slots per host you've chosen will determine the total number of hosts on which your job will run):

/opt/openmpi/bin/mpirun -nperhost 2 /path/to/your/program


Q: How do I use $NAME_OF_PROGRAM?


A: Navigate to the "Installed Software" page of the machine you are using. There should be a tutorial if you click on the program name.