Description Sets the distributed computing load balancing mode to be used by the IPCExecEmbeddedScript and IPCExecFileScript commands.
In FRED distributed computing, load balancing is the process whereby work units are assigned to remote instances in such a way that each remote is utilized to maximum efficiency and all remotes finish their work at nearly the same time, thus minimizing the total time it takes to complete the calculation.
For example, a good load balancing strategy in the case of two remotes instances in which it is known ahead of time that one can complete two work units in the same time the other can complete one work unit might be to assign two work units to the fast remote for every one work unit assigned to the slow remote. In this case one might expect that both remote instances would complete their work units at about the same time, with each working at full capacity. This is the strategy behind FRED's "static" load balancing mode, which is the default in the absence of an IPCSetLoadBalance command in the master script. In general the number of work units assigned to a remote is roughly proportional to the "speed" of that remote as specified in the configuration file (or connections() array of T_IPCINSTANCE structures).
Another load balancing strategy using the same remotes described above would be for the master to assign one work unit to each remote and then wait for a remote to signal that it has completed its work unit. The master would then assign another work unit to that remote. The process repeats until all work units have been assigned and both remotes have completed their assigned work units. This is the strategy behind FRED's "dynamic" load balancing mode. That is, the master will immediately assign a new work unit to any remote that has signaled completion of its current work unit. This continues until all the work units have been assigned and completed, at which point the master script will proceed with execution.
In dynamic load balancing mode the master node should not be disconnected from the remote nodes, since in this mode of operation the master is issuing work units on the fly. If the master node is disconnected, the remote nodes will stop being assigned work to do. In such cases where the master becomes disconnected from the remotes, the master should then attempt to reconnect and terminate the remote connections (see example script in IPCRecoverConnections).
The IPCSetLoadBalance command should be placed in the master script prior to any IPCExecEmbeddedScript or IPCExecFileScript commands. The default mode is "static" if no IPCSetLoadBalance command appears in the master script.
Syntax count = IPCSetLoadBalance( connections(), mode )
Parameters count (Long) Returned number of active remote nodes that successfully had their load balance mode set.
connections() As T_IPCINSTANCE An array of T_IPCINSTANCE structures, with each structure defining the connection information for a remote node.
mode As String The load balance mode in string form. Options are "static" or "dynamic".
Example The example below demonstrates a distributed computing calculation where a FRED file (the one being run by the master) is pushed to the remote nodes, an embedded script within the FRED file is executed and FRED grid data (FGD) output files are pulled back to the master node for accumulation into a final result. The embedded script being run by the remotes is configured to use work units and you can see in the code below that IPCSetLoadBalance and IPCDeclareTotalWorkUnits are issued prior to having the remote nodes load the FRED model and execute the embedded script. Several helper functions are included to keep the code modular. For this load balancing example, note that a special function, CalcWorkUnits, is called that returns the total number of work units to be distributed to the remotes. The implementation of this function will change depending on your specific calculation but in this example the total number of work units corresponds to the number of pixels in a specific analysis surface.
Sub Main
'This script demonstrates how to use the Distributed Computing component of FRED in order 'to send an analysis out to "remote" nodes on a Windows network, retrieve the results from 'each remote node and then recombine them on the "master" computer.
'Log the current time Dim tStart tStart = Time()
'Master document preparation Dim nDisconnect As Long ARNDeleteAllNodes() PrefsSetARNRetainCount( -1 ) ClearOutputWindow()
'Remove any FGD and TXT files in the current directory Dim nDel As Long Print "" nDel = KillFiles( GetDocDir(), "fgd" ) Print "Removed " & nDel & " FGD files from the current directory." nDel = KillFiles( GetDocDir(), "fdcd" ) Print "Removed " & nDel & " DEBUG files from the current directory."
'First, we load the configuration file to generate an array of type T_IPCINSTANCE. 'The array, named 'connections' by convention, defines all of the information needed 'to connect to the remote computers, transfer files, issue commands and retrieve results. 'Although it is possible to manually populate the 'connections' array, it is typically 'easiest to maintain a CSV file with the available computers on your network. 'Once IPCLoadConfigFile has executed, FRED will print some summary information to the 'output window regarding the contents read from the configuration file. Dim nRemoteLoad As Long Dim connections() As T_IPCINSTANCE Dim configFile As String, configLocation As String configLocation = GetDocDir() & "\" configFile = "thermalImaging_Working.csv" nRemoteLoad = IPCLoadConfigFile( connections(), configLocation & configFile )
'Now that we have loaded the configuration file, lets attempt to connect to the nodes 'that were specified as "active" in the configuration. 'FRED will print some summary information to the output window automatically when 'this is called. Dim nConnect As Long nConnect = IPCConnect( connections() )
'Query the remote nodes for their status. This prints some information to the 'output window. At this point, the remotes should be idle. Dim nStatus As Long nStatus = IPCQueryStatus( connections(), "" )
'Lets push the model (the one we are using right now) out to the remote nodes using 'the "copy" option, which leaves a copy of the file in the source location (instead 'of copying and then removing the source file). By leaving the destination folder 'string empty, the default working directory on the remote node is used. The default 'working directory on the remote node is automatically created by each remote instance. 'If we don't push any files to the remotes, lets stop here and terminate the connections. Dim nPush As Long Dim fName As String fName = name 'returns the document name with extension nPush = IPCPushFile( connections(), fName, "", "copy" ) If nPush < 1 Then Print ">>> No files were successfully pushed to the remotes." Print ">>> Stopping the script and disconnecting from remotes." nDisconnect = IPCTerminate( connections(), "sterilize", "" ) End End If
'Tell the remote nodes what type of load balancing to use 'If static, then the RelSpeed parameter of the connections configuration data 'is used to determine how the work units are distributed to the remotes. 'If dynamic, then the master polls the remotes to see if they are idle and 'hands out work units dynamically. Dim nLoadBal As Long nLoadBal = IPCSetLoadBalance( connections(), "dynamic" )
'How many work units? IPCDeclareTotalWorkUnits( CalcWorkUnits() )
'Have the remotes load the FRED model and then execute the embedded script. 'Note the use of the long wait time in each of the IPC commands. This makes sure 'that the FRED model has finished loading before the embedded script gets executed. 'Although the maximum wait time is one day (1d), as soon as the remotes are idle, 'the next command in the IPC queue will be processed. Dim scriptName As String, nScript As Long, nLoad As Long scriptName = "Remote Node Script" nLoad = IPCLoadModel( connections(), fName, "1d" ) If nLoad < 1 Then Print ">>> No FRED files were successfully queued for loading." Print ">>> Stopping the script and disconnecting from remotes." nDisconnect = IPCTerminate( connections(), "sterilize", "" ) End End If Print "Embedded script Start: " & Chr(9) & Time() nScript = IPCExecEmbeddedScript( connections(), scriptName, "1d" ) If nScript < 1 Then Print ">>> No embedded scripts were successfully queued for execution." Print ">>> Stopping the script and disconnecting from remotes." nDisconnect = IPCTerminate( connections(), "sterilize", "" ) End End If Print "Embedded script End: " & Chr(9) & Time()
'Wait here IPCWaitForIdle( connections(), "1d" )
'The embedded script that we have asked to execute should have written out an FGD file 'containing the result that we want to retrieve. The wildcard 'notation indicates that we want to remove all FGD files from the 'remote and put them in the configLocation directory. The use of the "move" option 'means that a copy of the FGD files are not left behind on the remote. We pass in 'an empty string array, pulledFiles(), that gets populated with the full file path 'of the files that were successfully retrieved from the remote nodes. We can then 'use this string array directly to access the pulled files. Dim nPull As Long Dim pulledFiles() As String nPull = IPCPullFile( connections(), "*.fgd", configLocation, "move", pulledFiles() )
'Print a report of the files that were pulled from the remotes Dim curFile As String Print "" Print "The following files were successfully pulled from the remote nodes:" For Each curFile In pulledFiles() Print Chr(9) & curFile Next
'At this point, we can re-assemble the FGD files retrieved from the 'remote nodes into a final result. Lets load the FGD files we recovered from the remote 'nodes into the model as ARN and then combine them into a composite result. Dim nFgd As Long, finalArn As Long nFgd = LoadFgd( pulledFiles() ) Print "" Print "Loaded " & nFgd & " FGD files into ARNs." If nFgd > 0 Then finalArn = CompositeAllARN( ) Print "Final result stored in ARN " & finalArn Else Print "No ARN available to composite into a final result." End If
'Pull debugging text file(s) nPull = IPCPullFile( connections(), "*.fdcd", configLocation, "move", pulledFiles() )
'Finally, terminate the connection to the remotes using the "sterilize" option. 'This option deletes the working directory even if it contains files. We have 'already moved the FGD files off of the remote but we don't care about retrieving the FRED model, so 'we just delete the whole directory and its contents. nDisconnect = IPCTerminate( connections(), "sterilize", "" )
Print "" Print "Distributed calculation completed."
'Log the current time Dim tEnd tEnd = Time()
Print "Time Started: " & Chr(9) & tStart Print "Time Completed: " & Chr(9) & tEnd
End Sub
Function LoadFgd( ByVal in_files() As String ) As Long
'This helper function loops over an array of strings that 'specify the full file path to a set of FGD files and then 'loads them into the FRED document as ARN. 'INPUTS: ' in_files() = string defining the directory to be searched 'OUTPUTS: ' returns the number of ARN that were created
Dim cFile As String, splitStrs() As String Dim nF As Long
nF = 0 For Each cFile In in_files() splitStrs = Split(cFile, "\") 'split the full file path name by "\" splitStrs = Split( splitStrs(UBound(splitStrs)), ".") 'split the file name by "." to remove the file extension If LCase(splitStrs(1)) = "fgd" Then ARNCreateFromFile( cFile, splitStrs(0) ) nF += 1 End If Next
Return nF
End Function
Function CompositeAllARN( ) As Long
'This helper function loops over the Analysis Results folder 'and summs all of the ARN into a composite result. 'INPUTS: ' None 'OUTPUTS: ' Returns the node number of the composite ARN
'What node number is the last ARN on the tree at? Dim nArn As Long nArn = ARNGetMaxNodeNum()
'Make a copy of the zeroth ARN. This will become the 'final composite result node. Dim compArn As Long compArn = ARNCreateCopy( 0, "Composite Result" )
'Start looping over the other ARN starting at index 1 Dim curArn As Long For curArn = 1 To nArn ARNLinearCombination( 1, compArn, 1, curArn ) Next
Return compArn
End Function
Function KillFiles( ByVal in_dir As String, _ ByVal in_ext As String ) As Long
'This helper function scans a directory for files 'with a specific extension and deletes them 'INPUTS: ' in_dir = string defining the directory to be searched ' in_ext = string specifying what type of files should be deleted 'OUTPUTS: ' returns the number of files that were deleted
'Change to the search directory ChDir( in_dir )
Dim nF As Long Dim cFile As String nF = 0 cFile = Dir$("*." & in_ext) While cFile <> "" Kill( cFile ) nF += 1 cFile = Dir$() Wend
Return nF
End Function
Function CalcWorkUnits( ) As Long
'This is a calculation specific function that computes 'how many work units will be distributed. This function 'needs to change based on what type of analysis you are performing.
'Analysis surface at the detector that we are reverse raytracing from Dim anaNode As Long anaNode = FindFullName( "Analysis Surface(s).detectorAna" )
'Retrieve the analysis grid information Dim tAnaSurf As T_ANALYSISSURF Dim nX As Long, nY As Long Dim xMax As Double, xMin As Double Dim yMax As Double, yMin As Double Dim px As Double, py As Double GetAnalysisSurf( anaNode, tAnaSurf ) nX = tAnaSurf.anaNumX nY = tAnaSurf.anaNumY
'Work units is the pixel count of the analysis surface Return nX*nY
End Function
The example code above produces the following content in the FRED output window:
Removed 0 FGD files from the current directory. Removed 0 DEBUG files from the current directory.
[MASTER]IPCLoadConfigFile: 14 of 14 connections read from 'C:\temp\thermalImaging.csv'
index active speed host status 1 true 100 computer1 --- 2 true 100 computer1 --- 3 true 100 computer1 --- 4 true 100 computer1 --- 5 true 100 computer1 --- 6 true 100 computer1 --- 7 true 100 computer2 --- 8 true 100 computer2 --- 9 true 100 computer2 --- 10 true 100 computer2 --- 11 true 100 computer2 --- 12 true 100 computer2 --- 13 true 100 computer3 --- 14 true 100 computer3 ---
[MASTER]IPCConnect: 14 of 14 Remote FRED instances connected Remote launch method: FRED Remote Service
index host result message 1 computer1 connected --- 2 computer1 connected --- 3 computer1 connected --- 4 computer1 connected --- 5 computer1 connected --- 6 computer1 connected --- 7 computer2 connected --- 8 computer2 connected --- 9 computer2 connected --- 10 computer2 connected --- 11 computer2 connected --- 12 computer2 connected --- 13 computer3 connected --- 14 computer3 connected ---
[MASTER]IPCQueryStatus: 14 connections queried
index host message 1 computer1 idle 2 computer1 idle 3 computer1 idle 4 computer1 idle 5 computer1 idle 6 computer1 idle 7 computer2 idle 8 computer2 idle 9 computer2 idle 10 computer2 idle 11 computer2 idle 12 computer2 idle 13 computer3 idle 14 computer3 idle
[MASTER]IPCPushFile: 14 files pushed from 'thermalimaginglens.frd' to ''
index host src/xfer/dest message 1 computer1 1/1/1 finished 2 computer1 1/1/1 finished 3 computer1 1/1/1 finished 4 computer1 1/1/1 finished 5 computer1 1/1/1 finished 6 computer1 1/1/1 finished 7 computer2 1/1/1 finished 8 computer2 1/1/1 finished 9 computer2 1/1/1 finished 10 computer2 1/1/1 finished 11 computer2 1/1/1 finished 12 computer2 1/1/1 finished 13 computer3 1/1/1 finished 14 computer3 1/1/1 finished
[MASTER]IPCSetLoadBalance: 14 of 14 active nodes set to 'dynamic'
[MASTER]IPCDeclareTotalWorkUnits = 130
[MASTER]IPCLoadModel: 14 of 14 queued for loading file 'thermalImagingLens.frd' Embedded script Start: 11:15:08 AM
[MASTER]IPCExecEmbeddedScript: 130 of 130 Work Units submitted
[MASTER]IPCWaitForIdle: 14 of 14 active connections are idle (6 sec)
[MASTER]IPCExecEmbeddedScript: 14 of 14 queued execution 'Remote Node Script' (dynamic load balancing)
index host message 1 computer1 finished 9 Work Units 2 computer1 finished 9 Work Units 3 computer1 finished 9 Work Units 4 computer1 finished 9 Work Units 5 computer1 finished 9 Work Units 6 computer1 finished 9 Work Units 7 computer2 finished 10 Work Units 8 computer2 finished 9 Work Units 9 computer2 finished 10 Work Units 10 computer2 finished 9 Work Units 11 computer2 finished 10 Work Units 12 computer2 finished 10 Work Units 13 computer3 finished 9 Work Units 14 computer3 finished 9 Work Units Embedded script End: 11:16:25 AM
[MASTER]IPCWaitForIdle: 14 of 14 active connections are idle (0 sec)
[MASTER]IPCPullFile: 14 files pulled from '*.fgd' to 'C:\temp\'
index host src/xfer/dest message 1 computer1 1/1/1 finished 2 computer1 1/1/1 finished 3 computer1 1/1/1 finished 4 computer1 1/1/1 finished 5 computer1 1/1/1 finished 6 computer1 1/1/1 finished 7 computer2 1/1/1 finished 8 computer2 1/1/1 finished 9 computer2 1/1/1 finished 10 computer2 1/1/1 finished 11 computer2 1/1/1 finished 12 computer2 1/1/1 finished 13 computer3 1/1/1 finished 14 computer3 1/1/1 finished
The following files were successfully pulled from the remote nodes: C:\temp\ThermalIrradiance_computer1_lh_20160302111440_00000078.fgd C:\temp\ThermalIrradiance_computer1_lh_20160302111440_00000079.fgd C:\temp\ThermalIrradiance_computer1_lh_20160302111441_00000080.fgd C:\temp\ThermalIrradiance_computer1_lh_20160302111441_00000081.fgd C:\temp\ThermalIrradiance_computer1_lh_20160302111442_00000082.fgd C:\temp\ThermalIrradiance_computer1_lh_20160302111442_00000083.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111443_00000018.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111444_00000019.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111444_00000020.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111445_00000021.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111445_00000022.fgd C:\temp\ThermalIrradiance_computer2_lh_20160302111446_00000023.fgd C:\temp\ThermalIrradiance_computer3_lh_20160302111446_00000033.fgd C:\temp\ThermalIrradiance_computer3_lh_20160302111447_00000034.fgd
Loaded 14 FGD files into ARNs. Final result stored in ARN 14
[MASTER]IPCPullFile: 14 files pulled from '*.fdcd' to 'C:\temp\'
index host src/xfer/dest message 1 computer1 1/1/1 finished 2 computer1 1/1/1 finished 3 computer1 1/1/1 finished 4 computer1 1/1/1 finished 5 computer1 1/1/1 finished 6 computer1 1/1/1 finished 7 computer2 1/1/1 finished 8 computer2 1/1/1 finished 9 computer2 1/1/1 finished 10 computer2 1/1/1 finished 11 computer2 1/1/1 finished 12 computer2 1/1/1 finished 13 computer3 1/1/1 finished 14 computer3 1/1/1 finished
[MASTER]IPCTerminate: 14 of 14 Remote FRED instance terminations initiated
index host message 1 computer1 initiated 2 computer1 initiated 3 computer1 initiated 4 computer1 initiated 5 computer1 initiated 6 computer1 initiated 7 computer2 initiated 8 computer2 initiated 9 computer2 initiated 10 computer2 initiated 11 computer2 initiated 12 computer2 initiated 13 computer3 initiated 14 computer3 initiated
Distributed calculation completed. Time Started: 11:14:40 AM Time Completed: 11:17:33 AM
See Also Distributed Computing Script Commands
|