Mudanças entre as edições de "Minicluster:Mpich com torque"

De WikiLICC
Ir para: navegação, pesquisa
m (Criou página com 'This is part three of a multi-part tutorial on installing and configuring MPICH2. The full tutorial includes * Installing MPICH * [[MPICH:…')
 
m (Instalação)
 
(8 revisões intermediárias pelo mesmo usuário não estão sendo mostradas)
Linha 1: Linha 1:
This is part three of a multi-part tutorial on installing and configuring [[MPICH: Parallel Programming | MPICH2]]. The full tutorial includes
+
== O "Outro" Mpiexec ==
 +
A funcionalidade Torque fornece um novo binário para o <code>mpiexec</code> instalado com o MPICH2 (com o mesmo nome).  Este "outro" mpiexec - com a funcionalidade Torque - é produzido por [http://www.osc.edu/ Ohio Supercomputer Center] e funciona somente junto com o Torque.  Usuários não poderão usá-lo sem um [[Torque Qsub Scripts | qsub script]].
  
* [[Installing MPICH]]
+
Assim, mantenha o original seguro para rodar sem o Torque e instale o novo nos nós escravos.
* [[MPICH: Pick Your Paradigm]]
 
* [[MPICH with Torque Functionality]]
 
  
== The "Other" Mpiexec ==
+
== Preparação ==
Congratulations!  By going this route, you're saving yourself a lot of hassle by avoiding maintaining an mpd ring on the cluster, or helping your users set up and tear down ringsIf that doesn't make sense, that's fine - that's the beauty of this option.  MPI will take care of all the details for the users.
+
Mova o mpiexec original (ao menos nos nós escravos) tal que não esteja no path do root ou usuáriosAche-o com
  
This functionality is added by using a new binary. Instead of the binary called <code>mpiexec</code> that came with the MPICH2 installation, a new <code>mpiexec</code> is installed.  Unfortunately, they have the same name, but they work very differently.  The "other" mpiexec - the one with Torque-functionality that we need to add to the cluster - is produced out of [http://www.osc.edu/ Ohio Supercomputer Center].  This mpiexec only works with Torque.  In other words, users will not be able to use it outside of a [[Torque Qsub Scripts | qsub script]].
+
  root# which mpiexec
 +
/usr/lib64/mpich2/bin/mpiexec
  
For this reason, you might want to only put the new mpiexec on the worker nodes, and keep the original binary for users to run outside of Torque on the head node.
+
e então faça uma cópia de segurança.
 +
root# cd /usr/lib64/mpich2/bin
 +
root# mv mpiexec mpiexec2
  
== Prep ==
+
== Instalação ==
Before starting, the original mpiexec needs to be moved (at least on the worker nodes) so that it is no longer in root's or users' paths (or just renamed).  Find it with
+
Baixe o último arquivo mpiexec de [http://www.osc.edu/~pw/mpiexec/index.php OSC mpiexec page] em downloads.  Baixe o arquivo (ou última versão)
  
:<code>which mpiexec</code>
+
wget http://www.osc.edu/~pw/mpiexec/mpiexec-0.83.tgz
 +
wget http://www.osc.edu/~djohnson/mpiexec/mpiexec-0.84.tgz
  
and then move it somewhere else as a backup.
+
Descompacte e entre no subdir
 +
tar xvzf mpiexec*.tar.tgz
 +
cd mpiexec-0.84
 +
Mpiexec segue o padrão [[Source Installation Paradigm | source installation paradigm]].  Rode
  
== Installation ==
+
./configure --help
To download the latest version of mpiexec, visit the [http://www.osc.edu/~pw/mpiexec/index.php OSC mpiexec page] and scroll down to the downloads section.  Right click on the file location.  Then, from whenever your source is stored, run
 
  
:<code><nowiki>wget http://www.osc.edu/~pw/mpiexec/mpiexec-0.83.tgz</nowiki></code>
+
para uma lista de opções.  Important options include
 
 
or whatever the latest version is.  Untar it with
 
 
 
:<code>tar xvzf mpiexec*.tar.gz</code>
 
 
 
and then <code>cd</code> into the new directory.  Mpiexec follows the standard [[Source Installation Paradigm | source installation paradigm]].  Run
 
 
 
:<code>./configure --help</code>
 
 
 
to see a list of options.  Important options include
 
  
 
*<code>--prefix=</code> - specify where you want to have the binaries installed.  They need to be accessible by all of the worker nodes.  An [[Mounted File System: NFS | NFS mount]] would be a good choice.
 
*<code>--prefix=</code> - specify where you want to have the binaries installed.  They need to be accessible by all of the worker nodes.  An [[Mounted File System: NFS | NFS mount]] would be a good choice.
Linha 82: Linha 77:
  
 
== References ==
 
== References ==
 +
* http://debianclusters.org/index.php/MPICH_with_Torque_Functionality
 
* [http://www.osc.edu/~pw/mpiexec/index.php Mpiexec - MPI parallel job launcher for PBS]
 
* [http://www.osc.edu/~pw/mpiexec/index.php Mpiexec - MPI parallel job launcher for PBS]
 
* [http://svn.osc.edu/repos/mpiexec/trunk/README Mpiexec Readme]
 
* [http://svn.osc.edu/repos/mpiexec/trunk/README Mpiexec Readme]

Edição atual tal como às 10h58min de 13 de maio de 2011

O "Outro" Mpiexec

A funcionalidade Torque fornece um novo binário para o mpiexec instalado com o MPICH2 (com o mesmo nome). Este "outro" mpiexec - com a funcionalidade Torque - é produzido por Ohio Supercomputer Center e funciona somente junto com o Torque. Usuários não poderão usá-lo sem um qsub script.

Assim, mantenha o original seguro para rodar sem o Torque e instale o novo nos nós escravos.

Preparação

Mova o mpiexec original (ao menos nos nós escravos) tal que não esteja no path do root ou usuários. Ache-o com

root# which mpiexec
/usr/lib64/mpich2/bin/mpiexec

e então faça uma cópia de segurança.

root# cd /usr/lib64/mpich2/bin
root# mv mpiexec mpiexec2

Instalação

Baixe o último arquivo mpiexec de OSC mpiexec page em downloads. Baixe o arquivo (ou última versão)

wget http://www.osc.edu/~pw/mpiexec/mpiexec-0.83.tgz
wget http://www.osc.edu/~djohnson/mpiexec/mpiexec-0.84.tgz

Descompacte e entre no subdir

tar xvzf mpiexec*.tar.tgz
cd mpiexec-0.84

Mpiexec segue o padrão source installation paradigm. Rode

./configure --help

para uma lista de opções. Important options include

  • --prefix= - specify where you want to have the binaries installed. They need to be accessible by all of the worker nodes. An NFS mount would be a good choice.
  • --with-pbs= - necessary to get the Torque functionality! Specify the location of the Torque installation. If you followed my Torque tutorial, it's located at /var/spool/pbs
  • --with-default-comm=mpich2-pmi - used to indicate which version of MPI

Next, run ./configure with all the options necessary. My command looked like this:

./configure --prefix=/shared --with-pbs=/var/spool/pbs/ --with-default-comm=mpich2-pmi

Pbs_iff

To do this next part, Torque will need to already be installed. Mpiexec requires that a file named pbs_iff be on each one of the worker nodes. Normally, this file is only located on the head node and isn't installed as part of the pbs_mom installation, so it needs to be copied out from the head node to each of the other nodes.

There's an easy way to do this by scripting. The first requirement is to have a file with each of the worker nodes listed in it. Assuming Torque is running, this can be generated with

pbsnodes | grep -v = | grep -v '^$' >> machines
  • grep -v = excludes all lines that have an equal sign in them
  • grep -v '^$' contains a regular expression to delete all empty lines

The "machines" file of all the worker node names can then be used in a quick script to copy pbs_iff to each of the worker nodes. Find the original file with

updatedb && locate pbs_iff

(If you receive an error, apt-get install locate and then try again.) Then, replacing my locations below for the location you found it on your cluster, run

for x in `cat machines`; do rsync /usr/local/sbin/pbs_iff $x:/usr/local/sbin/; done

Next, pbs_iff needs to have its permissions changed to setuid root. (This means the binary runs with root privileges, even when run by a different user.) Again, to do this across all the worker nodes at once, use a script and make sure the location is correct for your setup:

for x in `cat machines`; do ssh $x chmod 4755 /usr/local/sbin/pbs_iff; done

Without these steps, users trying to run mpiexec will see errors like these:

pbs_iff: file not setuid root, likely misconfigured
pbs_iff: cannot connect to gyrfalcon:15001 - fatal error, errno=13 (Permission denied)
   cannot bind to reserved port in client_to_svr

mpiexec: Error: get_hosts: pbs_connect: Unauthorized Request .

Testing

Trying to run a program with mpiexec outside of a Torque job, it will give an error:

mpiexec: Error: PBS_JOBID not set in environment.  Code must be run from a
  PBS script, perhaps interactively using "qsub -I".

At least it's a helpful error! Therefore, in order to test it, mpiexec will need to be called from within a script. Continue on to the Torque and Maui: Submitting an MPI Job page to test.

References