Class BalancedCluster

  • All Implemented Interfaces:
    Refiner, Refiner

    public class BalancedCluster
    extends Basic
    An extension of the default refiner, that allows the user to specify the number of transfer nodes per execution site for stagein and stageout. The files are distributed in a round robin manner across the stagein and stageout jobs. Currently it is per workflow for the stage-in while for stageout it is per level of the workflow.
    Version:
    $Revision$
    Author:
    Karan Vahi
    • Field Detail

      • DESCRIPTION

        public static final java.lang.String DESCRIPTION
        A short description of the transfer refinement.
        See Also:
        Constant Field Values
      • DEFAULT_LOCAL_STAGE_IN_CLUSTER_FACTOR

        public static final java.lang.String DEFAULT_LOCAL_STAGE_IN_CLUSTER_FACTOR
        The default bundling factor that identifies the number of transfer jobs that are being created per execution pool for stageing in data for the workflow.
        See Also:
        Constant Field Values
      • DEFAULT_REMOTE_STAGE_IN_CLUSTER_FACTOR

        public static final java.lang.String DEFAULT_REMOTE_STAGE_IN_CLUSTER_FACTOR
        The default bundling factor that identifies the number of transfer jobs that are being created per execution pool for stageing in data for the workflow.
        See Also:
        Constant Field Values
      • DEFAULT_LOCAL_STAGE_OUT_CLUSTER_FACTOR

        public static final java.lang.String DEFAULT_LOCAL_STAGE_OUT_CLUSTER_FACTOR
        The default bundling factor that identifies the number of transfer jobs that are being created per execution pool for stageing out data for the workflow.
        See Also:
        Constant Field Values
      • DEFAULT_REMOTE_STAGE_OUT_CLUSTER_FACTOR

        public static final java.lang.String DEFAULT_REMOTE_STAGE_OUT_CLUSTER_FACTOR
        The default bundling factor that identifies the number of transfer jobs that are being created per execution pool for stageing out data for the workflow.
        See Also:
        Constant Field Values
      • mStageInLocalMapPerLevel

        private java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> mStageInLocalMapPerLevel
        The map containing the list of stage in transfer jobs that are being created for the workflow indexed by the execution poolname.
      • mStageInRemoteMapPerLevel

        private java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> mStageInRemoteMapPerLevel
        The map containing the list of stage in transfer jobs that are being created for the workflow indexed by the execution poolname.
      • mRelationsParentMap

        private java.util.Map mRelationsParentMap
        The map indexed by compute jobnames that contains the list of stagin job names that are being added during the traversal of the workflow. This is used to construct the relations that need to be added to workflow, once the traversal is done.
      • mStageinLocalBundleValue

        protected BalancedCluster.BundleValue mStageinLocalBundleValue
        The BundleValue that evaluates for local stage in jobs.
      • mStageInRemoteBundleValue

        protected BalancedCluster.BundleValue mStageInRemoteBundleValue
        The BundleValue that evaluates for remote stage-in jobs.
      • mStageOutLocalBundleValue

        protected BalancedCluster.BundleValue mStageOutLocalBundleValue
        The BundleValue that evaluates for local stage out jobs.
      • mStageOutRemoteBundleValue

        protected BalancedCluster.BundleValue mStageOutRemoteBundleValue
        The BundleValue that evaluates for remote stage out jobs.
      • mSetupMap

        protected java.util.Map mSetupMap
        The map indexed by staged executable logical name. Each entry is the name of the corresponding setup job, that changes the XBit on the staged file.
      • mStageOutLocalMapPerLevel

        private java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> mStageOutLocalMapPerLevel
        A map indexed by site name, that contains the pointer to the local stage out PoolTransfer objects for that site. This is per level of the workflow.
      • mStageOutRemoteMapPerLevel

        private java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> mStageOutRemoteMapPerLevel
        A map indexed by site name, that contains the pointer to the remote stage out PoolTransfer objects for that site. This is per level of the workflow.
      • mCurrentSOLevel

        private int mCurrentSOLevel
        The current level of the jobs being traversed.
      • mJobPrefix

        protected java.lang.String mJobPrefix
        The job prefix that needs to be applied to the job file basenames.
      • mPegasusProfilesInProperties

        protected Pegasus mPegasusProfilesInProperties
        Pegasus Profiles specified in the properties.
      • mSiteStore

        protected SiteStore mSiteStore
        Handle to the SiteStore
      • mAddNodesForSettingXBit

        protected boolean mAddNodesForSettingXBit
        A boolean indicating whether chmod jobs should be created that set the xbit in case of executable staging.
      • mCurrentSILevel

        private int mCurrentSILevel
        The current level of the jobs being traversed.
    • Constructor Detail

      • BalancedCluster

        public BalancedCluster​(ADag dag,
                               PegasusBag bag)
        The overloaded constructor.
        Parameters:
        dag - the workflow to which transfer nodes need to be added.
        bag - the bag of initialization objects
    • Method Detail

      • initializeClusterValues

        protected void initializeClusterValues()
        Initializes the bundle value variables, that are responsible determining the bundle values.
      • getDefaultClusterValueFromProperties

        protected java.lang.String getDefaultClusterValueFromProperties​(java.lang.String key,
                                                                        java.lang.String defaultKey,
                                                                        java.lang.String defaultValue)
        Returns the default value for the clustering/bundling of jobs to be used. The factor is computed by looking up the pegasus profiles in the properties.
            return value of pegasus profile key if it exists,
            else return value of pegasus profile defaultKey if it exists, 
            else the defaultValue
         
        Parameters:
        key - the pegasus profile key
        defaultKey - the default pegasus profile key
        defaultValue - the default value.
        Returns:
        the value as string.
      • addStageInXFERNodes

        public void addStageInXFERNodes​(Job job,
                                        java.util.Collection<FileTransfer> files,
                                        java.util.Collection<FileTransfer> symlinkFiles)
        Adds the stage in transfer nodes which transfer the input files for a job, from the location returned from the replica catalog to the job's execution pool.
        Specified by:
        addStageInXFERNodes in interface Refiner
        Overrides:
        addStageInXFERNodes in class Basic
        Parameters:
        job - Job object corresponding to the node to which the files are to be transferred to.
        files - Collection of FileTransfer objects containing the information about source and destURL's.
        symlinkFiles - Collection of FileTransfer objects containing source and destination file url's for symbolic linking on compute site.
      • addStageInXFERNodes

        public void addStageInXFERNodes​(Job job,
                                        boolean localTransfer,
                                        java.util.Collection files,
                                        int type,
                                        java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> stageInMap,
                                        BalancedCluster.BundleValue bundleValue,
                                        Implementation implementation)
        Adds the stage in transfer nodes which transfer the input files for a job, from the location returned from the replica catalog to the job's execution pool.
        Parameters:
        job - Job object corresponding to the node to which the files are to be transferred to.
        localTransfer - boolean indicating whether transfer has to happen on local site.
        files - Collection of FileTransfer objects containing the information about source and destURL's.
        type - the type of transfer job being created
        stageInMap - Map indexed by site name that gives all the transfers for that site.
        bundleValue - used to determine the bundling factor to employ for a job.
        implementation - the transfer implementation to use.
      • addStageOutXFERNodes

        public void addStageOutXFERNodes​(Job job,
                                         java.util.Collection files,
                                         ReplicaCatalogBridge rcb,
                                         boolean localTransfer,
                                         boolean deletedLeaf)
        Adds the stageout transfer nodes, that stage data to an output site specified by the user.
        Specified by:
        addStageOutXFERNodes in interface Refiner
        Overrides:
        addStageOutXFERNodes in class Basic
        Parameters:
        job - Job object corresponding to the node to which the files are to be transferred to.
        files - Collection of FileTransfer objects containing the information about source and destURL's.
        rcb - bridge to the Replica Catalog. Used for creating registration nodes in the workflow.
        localTransfer - whether the transfer should be on local site or not.
        deletedLeaf - to specify whether the node is being added for a deleted node by the reduction engine or not. default: false
      • getComputeJobBundleValue

        protected java.lang.String getComputeJobBundleValue​(Job job)
        Returns the bundle value associated with a compute job as a String.
        Parameters:
        job -
        Returns:
        value as String or NULL
      • done

        public void done()
        Signals that the traversal of the workflow is done. At this point the transfer nodes are actually constructed traversing through the transfer containers and the stdin of the transfer jobs written.
        Specified by:
        done in interface Refiner
        Overrides:
        done in class Basic
      • resetStageInMaps

        protected void resetStageInMaps()
        Resets the local and remote stage out maps.
      • resetStageInMap

        public java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> resetStageInMap​(java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> stageInMap,
                                                                                                  Implementation implementation,
                                                                                                  int stageInJobType,
                                                                                                  boolean localTransfer)
        Signals that the traversal of the workflow is done. At this point the transfer nodes are actually constructed traversing through the transfer containers and the stdin of the transfer jobs written.
        Parameters:
        stageInMap - maps site names to PoolTransfer
        implementation - the transfer implementation to use
        stageInJobType - whether a stagein or symlink stagein job
        localTransfer - indicates whether transfer job needs to run on local site or not.
        Returns:
      • getDescription

        public java.lang.String getDescription()
        Returns a textual description of the transfer mode.
        Specified by:
        getDescription in interface Refiner
        Overrides:
        getDescription in class Basic
        Returns:
        a short textual description
      • getStageOutPoolTransfer

        public BalancedCluster.PoolTransfer getStageOutPoolTransfer​(java.lang.String site,
                                                                    boolean localTransfer,
                                                                    int num)
        Returns the appropriate pool transfer for a particular site.
        Parameters:
        site - the site for which the PT is reqd.
        localTransfer - whethe the associated transfer job runs on local site or remote.
        num - the number of Stageout jobs required for that Pool.
        Returns:
        the PoolTransfer
      • resetStageOutMaps

        protected void resetStageOutMaps()
        Resets the local and remote stage out maps.
      • resetStageOutMap

        protected java.util.Map resetStageOutMap​(java.util.Map<java.lang.String,​BalancedCluster.PoolTransfer> map,
                                                 boolean localTransfer)
        Resets a single map
        Parameters:
        map - the map to be reset
        localTransfer - whether the transfer jobs need to run on local site or not
        Returns:
        the reset map