Class InPlace

  • All Implemented Interfaces:
    CleanupStrategy

    public class InPlace
    extends java.lang.Object
    implements CleanupStrategy
    This generates cleanup jobs in the workflow itself.
    Version:
    $Revision$
    Author:
    Arun ramakrishnan, Karan Vahi
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String CLEANUP_JOB_PREFIX
      The prefix for CLEANUP_JOB ID i.e prefix+the parent compute_job ID becomes ID of the cleanup job.
      static int DEFAULT_CLUSTERED_CLEANUP_JOBS_PER_LEVEL
      The default value for the number of clustered cleanup jobs created per level.
      static java.lang.String DEFAULT_MAX_JOBS_FOR_CLEANUP_CATEGORY
      The default value for the maxjobs variable for the category of cleanup jobs.
      private int mCleanupJobsPerLevel
      The number of cleanup jobs per level to be created
      private int mCleanupJobsSize
      the number of cleanup jobs clustered into a clustered cleanup job
      private java.util.HashSet mDoNotClean
      HashSet of Files that should not be cleaned up
      private CleanupImplementation mImpl
      The handle to the CleanupImplementation instance that creates the jobs for us.
      private LogManager mLogger
      The handle to the logging object used for logging.
      private int mMaxDepth
      The max depth of any job in the workflow useful for a priorityQueue implementation in an array
      private PegasusProperties mProps
      The handle to the properties passed to Pegasus.
      private java.util.HashMap mResMap
      The mapping to siteHandle to all the jobs that are mapped to it mapping to siteHandle(String) to Set
      private java.util.HashMap mResMapLeaves
      The mapping of siteHandle to all subset of the jobs mapped to it that are leaves in the workflow mapping to siteHandle(String) to Set.
      private java.util.HashMap mResMapRoots
      The mapping of siteHandle to all subset of the jobs mapped to it that are roots in the workflow mapping to siteHandle(String) to Set.
      private boolean mUseSizeFactor
      A boolean indicating whether we prefer use the size factor or the num factor
    • Constructor Summary

      Constructors 
      Constructor Description
      InPlace()
      The default constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Graph addCleanupJobs​(Graph workflow)
      Adds cleanup jobs to the workflow.
      private void addCleanUpJobs​(java.lang.String site, java.util.Set leaves, Graph workflow)
      Adds cleanup jobs for the workflow scheduled to a particular site a breadth first search strategy is implemented based on the depth of the job in the workflow
      protected void applyJobPriorities​(Graph workflow)
      Adds job priorities to the jobs in the workflow on the basis of the levels in the traversal order given by the iterator.
      private java.util.List<GraphNode> clusterCleanupGraphNodes​(java.util.List<GraphNode> cleanupNodes, java.util.HashMap cleanedBy, java.lang.String site, int level)
      Takes in a list of cleanup nodes ,one per cleanupNode(compute/stageout job) whose files need to be deleted) and clusters them into a smaller set of cleanup nodes.
      private GraphNode createClusteredCleanupGraphNode​(java.util.List<GraphNode> nodes, java.util.HashMap cleanedBy, java.lang.String site, int level, int index)
      Creates a clustered cleanup graph node that aggregates multiple cleanup nodes into one node
      protected java.lang.String generateCleanupID​(Job job)
      Returns the identifier that is to be assigned to cleanup job.
      java.lang.String generateClusteredJobID​(java.lang.String site, int level, int index)
      Generated an ID for a clustered cleanup job
      private int getClusterSize​(int size)
      Returns the number of cleanup jobs clustered into one job per level.
      java.lang.String getDefaultCleanupMaxJobsPropertyKey()
      Returns the property key that can be used to set the max jobs for the default category associated with the registration jobs.
      protected java.lang.String getSiteForCleanup​(Job job)
      Returns site to be used for the cleanup algorithm.
      void initialize​(PegasusBag bag, CleanupImplementation impl)
      Intializes the class.
      protected void reduceDependency​(GraphNode node)
      Reduces the number of edges between the nodes and it's parents.
      protected void reset()
      Resets the internal data structures.
      private void setDepth_ResMap​(java.util.List roots)
      A BFS implementation to set depth value (roots have depth 1) and also to populate mResMap ,mResMapLeaves,mResMapRoots which contains all the jobs that are assigned to a particular resource
      protected boolean typeNeedsCleanUp​(int type)
      Checks to see which job types are required to be looked at for cleanup.
      protected boolean typeNeedsCleanUp​(GraphNode node)
      Checks to see which job types are required to be looked at for cleanup.
      protected boolean typeStageOut​(int type)
      Checks to see if job type is a stageout job type.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • CLEANUP_JOB_PREFIX

        public static final java.lang.String CLEANUP_JOB_PREFIX
        The prefix for CLEANUP_JOB ID i.e prefix+the parent compute_job ID becomes ID of the cleanup job.
        See Also:
        Constant Field Values
      • DEFAULT_MAX_JOBS_FOR_CLEANUP_CATEGORY

        public static final java.lang.String DEFAULT_MAX_JOBS_FOR_CLEANUP_CATEGORY
        The default value for the maxjobs variable for the category of cleanup jobs.
        See Also:
        Constant Field Values
      • DEFAULT_CLUSTERED_CLEANUP_JOBS_PER_LEVEL

        public static final int DEFAULT_CLUSTERED_CLEANUP_JOBS_PER_LEVEL
        The default value for the number of clustered cleanup jobs created per level.
        See Also:
        Constant Field Values
      • mResMap

        private java.util.HashMap mResMap
        The mapping to siteHandle to all the jobs that are mapped to it mapping to siteHandle(String) to Set
      • mResMapLeaves

        private java.util.HashMap mResMapLeaves
        The mapping of siteHandle to all subset of the jobs mapped to it that are leaves in the workflow mapping to siteHandle(String) to Set.
      • mResMapRoots

        private java.util.HashMap mResMapRoots
        The mapping of siteHandle to all subset of the jobs mapped to it that are roots in the workflow mapping to siteHandle(String) to Set.
      • mMaxDepth

        private int mMaxDepth
        The max depth of any job in the workflow useful for a priorityQueue implementation in an array
      • mDoNotClean

        private java.util.HashSet mDoNotClean
        HashSet of Files that should not be cleaned up
      • mImpl

        private CleanupImplementation mImpl
        The handle to the CleanupImplementation instance that creates the jobs for us.
      • mProps

        private PegasusProperties mProps
        The handle to the properties passed to Pegasus.
      • mLogger

        private LogManager mLogger
        The handle to the logging object used for logging.
      • mCleanupJobsPerLevel

        private int mCleanupJobsPerLevel
        The number of cleanup jobs per level to be created
      • mCleanupJobsSize

        private int mCleanupJobsSize
        the number of cleanup jobs clustered into a clustered cleanup job
      • mUseSizeFactor

        private boolean mUseSizeFactor
        A boolean indicating whether we prefer use the size factor or the num factor
    • Constructor Detail

      • InPlace

        public InPlace()
        The default constructor.
    • Method Detail

      • addCleanupJobs

        public Graph addCleanupJobs​(Graph workflow)
        Adds cleanup jobs to the workflow.
        Specified by:
        addCleanupJobs in interface CleanupStrategy
        Parameters:
        workflow - the workflow to add cleanup jobs to.
        Returns:
        the workflow with cleanup jobs added to it.
      • reset

        protected void reset()
        Resets the internal data structures.
      • setDepth_ResMap

        private void setDepth_ResMap​(java.util.List roots)
        A BFS implementation to set depth value (roots have depth 1) and also to populate mResMap ,mResMapLeaves,mResMapRoots which contains all the jobs that are assigned to a particular resource
        Parameters:
        roots - List of GraphNode objects that are roots
      • addCleanUpJobs

        private void addCleanUpJobs​(java.lang.String site,
                                    java.util.Set leaves,
                                    Graph workflow)
        Adds cleanup jobs for the workflow scheduled to a particular site a breadth first search strategy is implemented based on the depth of the job in the workflow
        Parameters:
        site - the site ID
        leaves - the leaf jobs that are scheduled to site
        workflow - the Graph into which new cleanup jobs can be added
      • reduceDependency

        protected void reduceDependency​(GraphNode node)
        Reduces the number of edges between the nodes and it's parents.
         For the node look at the parents of the Node. 
         For each parent Y see if there is a path to any other parent Z of X.
         If a path exists, then the edge from Z to node can be removed.
         
        Parameters:
        node - the nodes whose parent edges need to be reduced.
      • applyJobPriorities

        protected void applyJobPriorities​(Graph workflow)
        Adds job priorities to the jobs in the workflow on the basis of the levels in the traversal order given by the iterator. Later on this would be a separate refiner.
        Parameters:
        workflow - the workflow on which to apply job priorities.
      • generateCleanupID

        protected java.lang.String generateCleanupID​(Job job)
        Returns the identifier that is to be assigned to cleanup job.
        Parameters:
        job - the job with which the cleanup job is primarily associated.
        Returns:
        the identifier for a cleanup job.
      • generateClusteredJobID

        public java.lang.String generateClusteredJobID​(java.lang.String site,
                                                       int level,
                                                       int index)
        Generated an ID for a clustered cleanup job
        Parameters:
        site - the site associated with the cleanup jobs
        level - the level of the workflow
        index - the index of the job on that level
        Returns:
      • typeNeedsCleanUp

        protected boolean typeNeedsCleanUp​(GraphNode node)
        Checks to see which job types are required to be looked at for cleanup. COMPUTE_JOB , STAGE_OUT_JOB , INTER_POOL_JOB are the ones that need cleanup
        Parameters:
        node - the graph node
        Returns:
        boolean
      • typeNeedsCleanUp

        protected boolean typeNeedsCleanUp​(int type)
        Checks to see which job types are required to be looked at for cleanup. COMPUTE_JOB , STAGE_OUT_JOB , INTER_POOL_JOB are the ones that need cleanup
        Parameters:
        type - the type of the job.
        Returns:
        boolean
      • typeStageOut

        protected boolean typeStageOut​(int type)
        Checks to see if job type is a stageout job type.
        Parameters:
        type - the type of the job.
        Returns:
        boolean
      • getSiteForCleanup

        protected java.lang.String getSiteForCleanup​(Job job)
        Returns site to be used for the cleanup algorithm. For compute jobs the staging site is used, while for stageout jobs is used. For all other jobs the execution site is used.
        Parameters:
        job - the job
        Returns:
        the site to be used
      • getDefaultCleanupMaxJobsPropertyKey

        public java.lang.String getDefaultCleanupMaxJobsPropertyKey()
        Returns the property key that can be used to set the max jobs for the default category associated with the registration jobs.
        Returns:
        the property key
      • clusterCleanupGraphNodes

        private java.util.List<GraphNode> clusterCleanupGraphNodes​(java.util.List<GraphNode> cleanupNodes,
                                                                   java.util.HashMap cleanedBy,
                                                                   java.lang.String site,
                                                                   int level)
        Takes in a list of cleanup nodes ,one per cleanupNode(compute/stageout job) whose files need to be deleted) and clusters them into a smaller set of cleanup nodes.
        Parameters:
        cleanupNodes - List of stub cleanup nodes created corresponding to a job in the workflow that needs cleanup. the cleanup jobs have content as a CleanupJobContent
        cleanedBy - a map that tracks which file was deleted by which cleanup job
        site - the site associated with the cleanup jobs
        level - the level of the workflow
        Returns:
        a set of clustered cleanup nodes
      • createClusteredCleanupGraphNode

        private GraphNode createClusteredCleanupGraphNode​(java.util.List<GraphNode> nodes,
                                                          java.util.HashMap cleanedBy,
                                                          java.lang.String site,
                                                          int level,
                                                          int index)
        Creates a clustered cleanup graph node that aggregates multiple cleanup nodes into one node
        Parameters:
        nodes - list of cleanup nodes that are to be aggregated
        cleanedBy - a map that tracks which file was deleted by which cleanup job
        site - the site associated with the cleanup jobs
        level - the level of the workflow
        index - the index of the cleanup job for that level
        Returns:
        a clustered cleanup node with the appropriate linkages added to the workflow else, null if the clustered cleanup node has no files to delete
      • getClusterSize

        private int getClusterSize​(int size)
        Returns the number of cleanup jobs clustered into one job per level.
        Parameters:
        size - the number of cleanup jobs created by the algorithm before clustering for the level.
        Returns:
        the number of clustered cleanup jobs to be created for the level