Navigation: Distributed Computing

 

Distributed Computing

 

Contact Us: fredsupport@photonengr.com

 

 

 

Description


This Help article draws heavily from the "Intro To FRED Distributed Computing" RTF document, written and maintained by the FRED development team, which can be found in your FRED installation directory under the <install dir>\Resources\Samples\DistributedComputing folder.

 

The architecture of FRED distributed computing is implemented in a very flexible way so that any calculation FRED can perform as a stand-alone instance can also be performed as a distributed computation.  Any license authorized for use of FRED Optimum is allowed to use the distributed computing capability.  The graphic below illustrates the concept of distributed computing, where the time spent for a given calculation running on a single computer can be dramatically reduced by splitting the calculation among any number of computers running in parallel.  Many analyses performed in FRED require millions (or even billions) of rays to generate sufficient statistics in the calculation or require an analysis to be repeated for many configurations (or a combination of these two scenarios).  In either case, the processing time can be significant and distributed computing offers a way to parallelize your analyses in a manner that is only limited by the availability of Windows computers on your network.

 

 

 

How Distributed Computing Works in FRED


What FRED Distributed Computing Is

What FRED Distributed Computing Is NOT

Component Overview

How Master-Remote Communication Works

 

 

What FRED Distributed Computing Is

The architecture of FRED's distributed computing capability is implemented in a very flexible way so that any calculation which can be run on a stand-alone FRED instance can also be run as a distributed calculation across a Windows network.  Only FRED Optimum licenses will allow access to the distributed computing functionality and only one license is required (the computers being distributed to do not require individual licenses).

 

Distributed computing is implemented through FRED's BASIC scripting language by running a script on a "master node" that handles distribution of the calculation over the Windows network to one or more "remote instances".  In a typical setup, the remote instances are instructed by the master node to execute their own analysis scripts, generate results files, and return the results files to the master node for recombination into a final, composite analysis result.

 

Ignoring the specifics of installation and establishing communication between master node and remote instances, the core of a master node script controlling the distributed calculation may be as simple as:

 

1.

 

Push FRED file to remote nodes.

2.

 

Instruct remote nodes to execute embedded script within the FRED file.

3.

 

Pull results files from remote nodes to the master node.

4.

 

Combine individual results files from remote nodes into a final, composite result.

 

 

What FRED Distributed Computing Is NOT

FRED's implementation of distributed computing is not a low-level operation that automatically and transparently distributes primitive FRED calculations (such as a ray trace) to multiple remote nodes.  In this sense, it is not analogous to multi-threading.  Rather, the user is required to write a specialized script being run on the master node that provides explicit instructions for the remote instances.

 

Component Overview

Ignoring, for a moment, how the communication between the master node and the remote instances actually works, there are three network components that are required to run a distributed computation.  These items are described below, with the graphic indicating the flow of information across the network.

Master Node

A master node is a running instance of FRED Optimum and the user initiates a distributed computing calculation by executing a specific FRED script that has been written to manage the distributed calculation.  For this purpose, the scripting language has been extended to include a relatively small number of specialized commands referred to as IPC (Inter-Process Communication) commands.  Full documentation of these IPC scripting commands can be found in the Scripting Reference Manual as well as in the IntroToFREDDistributedComputing.rtf document found in the <install dir>\Resources\Samples\DistributedComputing folder.  The script being run to manage the distributed calculation, which we will refer to as the "master script", will typically take the following sequence of actions:

 

1.

 

Define or load the necessary information for connecting to the remote instances.

2.

 

Launch instances of FRED on the remote machines.

3.

 

Push (copy) a FRED file and/or any additional scripts or auxiliary data files to the remote instances.

4.

 

Load the FRED file that was pushed in step 3.

5.

 

Instruct the remote instances to execute either an embedded script within the FRED file or an external script file.  The script(s) being executed will typically generate one or more data files as output.

6.

 

Pull (copy) the output data files from the remote machines to the to the master machine.

7.

 

Terminate the connections with the remote instances.

8.

 

Combine the individual results files from the remote instances into a composite final result.

 

Transfer Folder(s)

The transfer folder(s) are locations on the network, accessible by both the master node and each individual remote instance, used as intermediate locations for passing files between master and remote.  When a file is pushed from the master node to a remote, the process is actually completed in two steps; (1) master node pushes file to the transfer folder associated with the destination remote, and (2) destination remote pulls the file from the transfer directory to its local working directory.  The operations is similar when the master requests a file to be pulled from the remotes; (1) remote pushes a file to its associated transfer folder, and (2)  the master node pulls the file from the transfer directory to its working directory.  In reality, there is no requirement that each remote have its own unique transfer folder. A single transfer folder can be used for multiple remotes as long as it is accessible to all of them.  The file push process is illustrated in the graphic below.

 

 

Remote Instance(s)

Remote instances, when successfully launched by a master script, listen for instructions being sent to them by the master node in the form of IPC script commands.  IPC scripting commands direct the remote instance to perform specialized tasks that ultimately result in generation of output files that can be returned to the master node.  Such instructions include IPCLoadModel (instructs the remote instance to load a specific FRED file), IPCExecEmbeddedScript (instructs the remote instance to execute a specific embedded script in the currently loaded model), and IPCPullFile (copy a specific file from the remote machine across the network to the master node).  Remote instances do not require access to a FRED hardware key, license services, or drivers, though FRED does need to be installed on each of the remote machines.  The only software license required is a FRED Optimum license accessible by the master node.

 

How Master-Remote Communication Works

Communication between the master node and remote instances works through the use of Windows "named pipes", which are simply conduits for communication between sever (the process that created the pipe) and client (the process that connects to the pipe) and subject to Windows security checks and authorization.  When a remote instance of FRED is started, it creates a pipe, the master node connects to the pipe, and the IPC commands flow through the pipe from master to remote.

 

According to the available documentation, named pipes may use UDP ports 137-139 and/or TCP 445.  Network administrators are responsible for ensuring that the Windows firewall and other network protection settings allow communication through named pipes over these ports.

 

There are two methods of establishing the initial connection between master node and remote instances; the appropriateness of a given method depends on the specific security settings and software installed on your Windows network.  The installation and requirements for each method is described in another section of this document.  Please contact your network administrator with any questions regarding your network policies.

 

Method 1: FRED Remote Service

The FRED Remote Service is an optional service that runs on the remote machines to facilitate communication between master node and remote machine.  When started, the FRED Remote Service creates its own named pipe and listens on it for connection requests coming from a master node.  A master script, managing the distributed calculation, will send out an IPCConnect() request to a remote machine using the named pipe created by the FRED Remote Service.  When an IPCConnect() request is received by the FRED Remote Service, it asks Windows to authenticate the user account that initiated the request and, if the account is validated by Windows, it creates a temporary local working directory on the remote machine before launching a new remote FRED instance.  Once the remote FRED instance is launched by the FRED Remote Service, a new named pipe is created by the remote FRED instance, the information about the named pipe is passed back to the master node via the FRED Remote Service, and the master node can then communicate directly to the remote FRED instance on the dedicated named pipe between them.  Having established communication directly between the remote FRED instance and the master node, the FRED Remote Service has performed its function and returns to listening for IPCConnect() requests over its own named pipe.  In this way, the FRED Remote Service is capable of starting as many remote FRED instances on its own machine as requested by a master node, as long as the local computer resources can support them.  Communication between a master node, FRED Remote Service and remote FRED instance is illustrated in the graphic below.

 

Method 2: MPICH

MPI (Message Passing Interface) is a third party standard commonly used to facilitate classical distributed computing.  While FRED is not a native MPI application and does not use MPI for communication between the master node and remote instances, it can use MPI utilities to launch FRED instances on the remote machines as a substitute for the functionality provided by the FRED Remote Service.  In some network environments the use of the FRED Remote Service may be prohibited due to account restrictions. In such cases, the MPI method may be required.

 

MPI is an open standard (to some degree) and different "flavors" of MPI are available from various sources, some of which can be downloaded for free and used with restriction, while others may involve a fee.  Some MPI flavors we are aware of are MPICH, Open MPI, Intel MPI, Microsoft MPI, etc.  This list is not meant to be exhaustive.  Many flavors of MPI may work with FRED, but certain aspects of installation and usage may be unique to each.

 

The MPI utilities are used solely for their ability to launch applications on remote machines in response to the proper request from a master machine.  MPI plays no further role once the remote instances have been launched.

 

The MPI utilities perform several tasks when launching a Remote:

 

There are two utility executables that perform the work of launching the FRED application:  mpiexec.exe and smpd.exe.

 

On the master machine, mpiexec.exe sends a request to launch FRED.

 

On the remote machine, smpd.exe receives the request and launches the FRED executable.

 

smpd.exe places the running FRED instance in FRED Remote Distributed Computing mode and sets its working directory according to the T_IPCINSTANCE specification provided by the user.

 

mpiexec.exe and smpd.exe have no further involvement with the Remote FRED after it has been launched.

 

 

Some advantages of using the MPI launch method are:

 

MPI is a well-known and widely-known standard used for distributed computing.

 

It is relatively easy to install and configure.

 

It will often work in network environments with strict security policies.

 

 

Some disadvantages of using the MPI launch method are:

 

Photon Engineering does not supply the MPI software.  It be downloaded and installed independently of the FRED installation.  See the Installation and Testing section for more details.

 

It is the user's responsibility to create, delete, and maintain the working directories on the remote machines.  When using the FRED Remote Service method, maintenance of the working directories on the remote machines is handled automatically.

 

 

Installation and Testing


The installation procedure for distributed computing is different for the FRED Remote Service and MPICH methods.  Only one method is required and the installation for each method is described separately below.  Please keep in mind that Windows networks come in a wide variety of configurations and Photon Engineering cannot anticipate every installation scenario.  Consequently, your installation process may be different than that shown here and you may need to involve your local IT administrator in order to identify any specific issues during installation. 

 

Installation Procedure for using the FRED Remote Service

1.

 

Install FRED on each remote machine(s)

 

 

a.

 

Copy the FRED install executable to the remote machine.  Right mouse click on the installer and select, "Run as administrator".

 

b.

 

Proceed through the installer until you get to the "Choose Components" step.

- you do NOT need to install the Hardware Key Driver

- you DO need to install the Remote Service

 

c.

 

Proceed to the "Choose Install Location" step

- IF you change the Destination Folder from the default location ("C:\Program Files\Photon Engineering\FRED XX.XXX.X\"), make a note of the modified location.  You will need to specify this location later when you attempt to test your distributed computing setup.

 

d.

 

Proceed through the remainder of the installer

 

2.

 

Confirm that the FRED Remote Service is running on the remote machine(s) (Navigation to items described for Windows 7)

 

 

a.

 

Verify that the FRED Remote Service is present and started. Open, Control Panel > Administrative Tools > Services.

 

 

 

 

 

 

b.

 

Confirm that the rFredLogonUser account exists as a local account.  Open, Control Panel > User Accounts > Manage User Accounts.

 

 

 

 

 

 

c.

 

Confirm that the rFredLogonUser account has "log on as a service" privileges.  Open, Control Panel > Administrative Tools > Local Security Policy.

 

 

 

 

 

 

3.

 

Install FRED on the Master Machine

 

 

a.

 

The master machine requires an installed version of FRED with access to an Optimum license.

 

b.

 

It is possible for the master node (the instance of FRED running on the master machine) to distribute the calculation to itself, so that the master computer hosts the master node and one or more remote instances.  If you want to operate in this configuration, install FRED and the FRED Remote Service in the same manner as described in steps 1-2 above.

 

c.

 

If the master node is NOT distributing any calculations to itself, the FRED Remote Service does not need to be installed.

 

 

4.

 

Setting up a Transfer Folder

 

When running a distributed calculation, files need to be transferred between master and remote.  When pushing files to the remotes, the master moves the file from its source location to an intermediate transfer directory and the remote moves the file from the transfer directory to its local working directory.  File pulling (from remotes to master) happens through the same process but in reverse.  There is no requirement that the transfer directory be located on either the master or remote, but the rFredLogonUser and the user account running the master both need read/write privileges to that directory.  Insufficient read/write privileges of the transfer directory will cause file transfer to fail.

 

Although there is no requirement that the master machine have the FRED Remote Service installed and running, doing so may simplify selection and sharing of an appropriate transfer folder.  If you have installed the FRED Remote Service on the master machine, this means that the remote machines and the master machine all share a common rFredLogonUser account.  Having done so, there are a couple of transfer folder options:

 

 

 

a.

 

Use the \\MasterMachineName\Users\Public\Documents\Fred_Remote\ folder as your transfer directory, replacing MasterMachineName with the network name of your master machine

 

b.

 

Choose a folder on the master machine or any remote machine.  Right mouse click on the folder and go to Share With > Specific People and add "rFredLogonUser". Set the permission level to Read/Write.  The dialog below is shown for Windows 7.  When you attempt to run a distributed calculation, specify the full network path to this folder as the transfer folder in your configuration file.

 

 

 

 

Installation Procedure for using MPICH

MPI (Message Passing Interface) is a third party standard commonly used to facilitate distributed computing.  While FRED is not a native MPI application and does not use MPI for communication between the master and remotes, it can use MPI utilities to launch remote FRED instances as a substitute for the functionality of the FRED Remote Service.  See "How Master-Remote Communication Works" above for more information.

 

MPI is an open standard and different "flavors" of MPI are available from various sources, some of which can be downloaded for free and used without restriction, while others may involve a fee.  Some MPI flavors we are aware of are MPICH, Open MPI, Intel MPI, Microsoft MPI, etc.  This list is not meant to be exhaustive.  Many flavors of MPI may work with FRED, but certain aspects of installation and usage may be unique to each.

 

FRED Distributed Computing has been successfully tested with MPICH, and the information provided below provides a summary description of how MPICH may be installed for use with FRED Distributed Computing.  We first describe how to install FRED on the master and remote machines, followed by a description of how to install MPICH for use with FRED.

 

Installation of FRED on the Master Machine

Follow the normal FRED installation procedure on the master machine.  The "Remote Service" installation option does NOT need to be toggled.

 

Installation of FRED on the Remote Machine(s)

Follow the normal FRED installation procedure on each remote machine.  The "Hardware Key" and "Remote Service" installation options do NOT need to be toggled.

 

MPICH Installation on the Master and Remote Machines

Photon Engineering does not distribute or install MPICH as part of the FRED installation.  We recommend the user examine the documentation that can be found on the www.mpich.org website for complete information, since the user is ultimately responsible for download and installation.  A summary description of the relevant installation details is given below.

 

 

Download an MPICH installation from http://www.mpich.org/static/downloads.  We recommend that you download version 1.4.1.p1 as this seems to be the most recent version that officially supports the Microsoft Windows operating system.

 

Install MPICH on the master machine.  By default this will install to the "\Program Files\MPICH2" directory.  For our purposes we utilize only the mpiexec.exe and smpd.exe executable files, which may be found in the installation "bin" subdirectory.  mpiexec.exe is the utility on the master machine that sends commands to the remote machines.  smpd.exe is the utility on the remote machines that responds to the commands and launches FRED.

 

Add the path to mpiexec.exe to your PATH environment variable on the master machine.  Note it is possible that you may have an MPI flavor version conflict if you have another flavor of MPI installed on your machine.  In this case you must ensure that the MPICH mpiexec.exe is the one that is executed by default.  You can place its path before any others in the PATH environment variable, or copy mpiexec.exe to the FRED bin directory.  If you have trouble updating your PATH environment variable, contact your local IT group.

 

Copy the mpiexec.exe and smpd.exe executable files to a directory on each remote machine.

 

On the master machine and each remote machine, you must prepare MPI by opening a command prompt window in administrator mode and typing in the following commands:
        > cd <directory where smpd.exe and mpiexec.exe reside>
        > smpd -install
        > mpiexec -register
The mpiexec -register command will prompt you for your username and password.  The command usually presents your username as a default and you can simply press the <enter> key in response to the username prompt.

 

Once you have registered, and smpd has been installed, you should validate that MPICH is properly configured.  Open a command prompt window on the master machine and type:

                    > mpiexec -validate -host <remote_machine>

where <remote_machine> is the name of the FRED remote machine that you wish to validate.  The command will return the string "SUCCESS" if validation is successful.

 

If the validation is not successful, the problem might be in your firewall settings on either the master machine, the remote machine, or both.  You can try temporarily turning off the firewall and then revalidate.  If that results in "SUCCESS", then you may need to adjust firewall settings by explicitly opening the port used by mpiexec and smpd, and/or by explicitly allowing mpiexec.exe and smpd.exe through the firewall. Contact your local IT group with any questions regarding this procedure.

 

Command line options for these executables can be listed by opening a command prompt window in Administrator mode and typing mpiexec -help, mpiexec -help2, smpd -help, or smpd -help2.

 

 

Load Balancing and Work Units


When a distributed calculation is run in FRED, there is no requirement that the remote computers performing the calculation have the same hardware and, therefore, the same computing power.  In other words, FRED's distributed computing capability can be run on an inhomogeneous Windows network.  If, in the simplest case, a raytrace of the same exact FRED model is performed simultaneously by an inhomogeneous set of remote computers, you would expect that the more powerful computers will finish the raytrace faster than the less powerful computers.  Since the master computer will be waiting to pull the calculation result from all of the remote instances, the distributed calculation time is limited by the slowest remote instance.  This type of configuration is inefficient because the powerful computers are sitting idle while they wait for the slowest computers to finish.

 

Load balancing is the idea that we can distribute the total workload of the calculation unevenly across all of the remote instances, accounting for their relative performance.  For example, suppose that a calculation is distributed to two remote instances in which Remote1 is twice as fast as Remote2.  For maximum efficiency, we would want to "load balance" the calculation so that Remote1 gets twice as much work to do as Remote2 and the two remote instances should finish at approximately the same time.  In total, this minimizes the time that remotes and master sit idle.

 

To be clear, load balancing is NOT required to run a distributed calculation in FRED.

 

FRED's implementation of load balancing is performed through the concept of 'work units'.  In a master script the user will declare that there are N total work units for the distributed calculation.  When the master script directs the remotes to either run an embedded script or run a standalone script file, the N total work units will be parsed out to the remote instances proportionally to each of the remote instance's relative computing power.  There are two ways in which the N work units can be parsed out by the master node; "statically" or "dynamically".  In a statically load balanced calculation the master knows ahead of time what the relative power of each remote instance is and can assign work units to each remote instance as soon as the instruction to run an embedded script or standalone script file is issued.  In a dynamically load balanced calculation the master does not know ahead of time what the relative power of each remote instance is and the work units cannot be assigned when the instruction to run an embedded script or standalone script file is issued.  Instead, the master issues a single work unit to each remote instance (assuming there are more work units than remotes) and then waits for each remote to signal back that it has finished processing its assigned work unit.  Upon receiving the signal that a remote is available for more work, the master assigns another single work unit to the available remote.  This process proceeds until all work units have been assigned.

 

In order to make use of the work units, the scripts being run on the remote instances do need to be written in such a way that they expect, and make use of, work unit values provided by the master script.  Examples of this are provided in the Samples as well as in the script documentation for IPCGetWorkUnits.

 

So, what exactly are work units?  From a programmatic perspective, work units are simply integers.  If you declare 10 work units in your master script, the specific work units that will be assigned to the remote nodes have values 0-9.  It is only in the context of your application that work units have any specific meaning.  For example, lets consider an application where the same model configuration needs to be analyzed 10 times at 10 different wavelengths.  We could implement this easily in a load balanced distributed calculation by declaring 10 work units in the master script.  In the remote script, an incoming work unit value is mapped to one of the 10 wavelengths that need to be analyzed, the source is updated to use the wavelength value, the raytrace and analyses are performed, and the result written to a file that can be sent back to the master node.

 

 

Distributed Computing Commands


The full list of scripting commands for distributed computing can be found in the Scripting Reference Manual.

 

 

 

 

 

 

 

Copyright © Photon Engineering, LLC