Class SUBDAXGenerator


  • public class SUBDAXGenerator
    extends java.lang.Object
    The class that takes in a dax job specified in the DAX and renders it into a SUBDAG with pegasus-plan as the appropriate prescript.
    Version:
    $Revision$
    Author:
    Karan Vahi
    • Constructor Summary

      Constructors 
      Constructor Description
      SUBDAXGenerator()
      The default constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected Job constructDAGJob​(Job subdaxJob, java.io.File directory, java.io.File subdaxDirectory, java.lang.String basenamePrefix)
      Constructs a job that plans and submits the partitioned workflow, referred to by a Partition.
      java.lang.String constructDAGManKnobs​(Job job)
      Constructs Any extra arguments that need to be passed to dagman, as determined from the properties file.
      Job constructPegasusPlanPrescript​(Job job, PlannerOptions options, java.lang.String rootUUID, java.lang.String properties, java.lang.String log)
      Constructs the pegasus plan prescript for the subdax
      protected java.io.File constructPlannerPrescriptWrapper​(Job dagJob, java.io.File directory, java.lang.String executable, java.lang.String arguments)
      Construct a pegasus plan wrapper script that changes the directory in which pegasus-plan is launched.
      private TransformationCatalogEntry constructTCEntryFromEnvironment()
      Returns a transformation catalog entry object constructed from the environment An entry is constructed if either of the following environment variables are defined 1) CONDOR_HOME 2) CONDOR_LOCATION CONDOR_HOME takes precedence over CONDOR_LOCATION
      private TransformationCatalogEntry constructTCEntryFromEnvProfiles​(ENV env)
      Returns a tranformation catalog entry object constructed from the environment An entry is constructed if either of the following environment variables are defined 1) CONDOR_HOME 2) CONDOR_LOCATION CONDOR_HOME takes precedence over CONDOR_LOCATION
      private TransformationCatalogEntry constructTCEntryFromPath()
      Returns a tranformation catalog entry object constructed from the path environment variable
      private TransformationCatalogEntry constructTransformationCatalogEntryForDAGMan​(java.lang.String path)
      Constructs TransformationCatalogEntry for DAGMan.
      protected java.lang.String createSubmitDirectory​(ADag dag, java.lang.String dir, java.lang.String user, java.lang.String vogroup, boolean timestampBased)
      Creates the submit directory for the workflow.
      protected java.lang.String createSubmitDirectory​(java.lang.String label, java.lang.String dir, java.lang.String user, java.lang.String vogroup, boolean timestampBased)
      Creates the submit directory for the workflow.
      protected boolean createSymbolicLink​(java.lang.String source, java.lang.String destination)
      This method generates a symlink between two files
      protected boolean createSymbolicLink​(java.lang.String source, java.lang.String destination, boolean logErrorToDebug)
      This method generates a symlink between two files
      boolean createSymbolicLinktoCacheFile​(PlannerOptions options, java.lang.String label, java.lang.String index)
      Creates a symbolic link to the DAX file in a dax sub directory in the submit directory
      java.lang.String createSymbolicLinktoDAX​(java.lang.String submitDirectory, java.lang.String dax)
      Creates a symbolic link to the DAX file in a dax sub directory in the submit directory
      private TransformationCatalogEntry defaultTCEntry​(java.lang.String site)
      Returns a default TC entry to be used in case entry is not found in the transformation catalog.
      Job generateCode​(Job job)
      Generates code for a job
      protected java.lang.String getBasename​(java.lang.String prefix, java.lang.String suffix)
      Returns the basename of a dagman (usually) related file for a particular partition.
      protected java.lang.String getCacheFile​(PlannerOptions options, java.lang.String label, java.lang.String index)
      Returns the path to the cache file in a workflow's submit directory
      protected java.lang.String getCacheFileName​(PlannerOptions options, java.lang.String label, java.lang.String index)
      Constructs the basename to the cache file that is to be used to log the transient files.
      java.util.Set<java.lang.String> getParentsTransientRC​(Job job)
      Returns a set containing the paths to the parent dax jobs transient replica catalogs.
      protected java.lang.String getWorkflowFileBasenamePrefix​(PlannerOptions options, java.lang.String label, java.lang.String index)  
      protected java.lang.String getWorkflowFileName​(PlannerOptions options, java.lang.String label, java.lang.String index, java.lang.String suffix)
      Constructs the basename to a workflow file that.
      void initialize​(PegasusBag bag, ADag dag, java.io.PrintWriter dagWriter)
      Initializes the class.
      protected static int parseInt​(java.lang.String s)
      Parses a string into an integer.
      protected static void sanityCheck​(java.io.File dir)
      Checks the destination location for existence, if it can be created, if it is writable etc.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEFAULT_SUBDAX_CATEGORY_KEY

        public static final java.lang.String DEFAULT_SUBDAX_CATEGORY_KEY
        The default category for the sub dax jobs.
        See Also:
        Constant Field Values
      • GENERATE_SUBDAG_KEYWORD

        public static final boolean GENERATE_SUBDAG_KEYWORD
        Whether to generate the SUBDAG keyword or not.
        See Also:
        Constant Field Values
      • CACHE_FILE_SUFFIX

        private static final java.lang.String CACHE_FILE_SUFFIX
        Suffix to be applied for cache file generation.
        See Also:
        Constant Field Values
      • CPLANNER_LOGICAL_NAME

        public static final java.lang.String CPLANNER_LOGICAL_NAME
        The logical name with which to query the transformation catalog for cPlanner executable.
        See Also:
        Constant Field Values
      • CONDOR_DAGMAN_NAMESPACE

        public static final java.lang.String CONDOR_DAGMAN_NAMESPACE
        The namespace to use for condor dagman.
        See Also:
        Constant Field Values
      • CONDOR_DAGMAN_LOGICAL_NAME

        public static final java.lang.String CONDOR_DAGMAN_LOGICAL_NAME
        The logical name with which to query the transformation catalog for the condor_dagman executable, that ends up running the mini dag as one job.
        See Also:
        Constant Field Values
      • NAMESPACE

        public static final java.lang.String NAMESPACE
        The namespace to which the job in the MEGA DAG being created refer to.
        See Also:
        Constant Field Values
      • RETRY_LOGICAL_NAME

        public static final java.lang.String RETRY_LOGICAL_NAME
        The planner utility that needs to be called as a prescript.
        See Also:
        Constant Field Values
      • DAGMAN_KNOBS

        public static final java.lang.String[][] DAGMAN_KNOBS
        The dagman knobs controlled through property. They map the property name to the corresponding dagman option.
      • mUser

        private java.lang.String mUser
        The username of the user running the program.
      • mNumFormatter

        private java.text.NumberFormat mNumFormatter
        The number formatter to format the run submit dir entries.
      • mPegasusPlanOptions

        private PlannerOptions mPegasusPlanOptions
        The object containing all the options passed to the Concrete Planner.
      • mLogger

        private LogManager mLogger
        Handle to the logging manager.
      • mBag

        private PegasusBag mBag
        Bag of Pegasus objects
      • mDAGWriter

        private java.io.PrintWriter mDAGWriter
        The print writer handle to DAG file being written out.
      • mCondorVersion

        private long mCondorVersion
        The long value of condor version.
      • mDAXJobIDToSubmitDirectoryCacheFile

        private java.util.Map<java.lang.String,​java.lang.String> mDAXJobIDToSubmitDirectoryCacheFile
        Maps a sub dax job id to it's submit directory. The population relies on top down traversal during Code Generation.
      • mDAG

        private ADag mDAG
      • mCurrentDAGCacheFile

        private java.lang.String mCurrentDAGCacheFile
        Cache file for the current DAG
    • Constructor Detail

      • SUBDAXGenerator

        public SUBDAXGenerator()
        The default constructor.
    • Method Detail

      • initialize

        public void initialize​(PegasusBag bag,
                               ADag dag,
                               java.io.PrintWriter dagWriter)
        Initializes the class.
        Parameters:
        bag - the bag of objects required for initialization
        dag - the dag for which code is being generated
        daxReplicaStore - the dax replica store.
        dagWriter - handle to the dag writer
      • generateCode

        public Job generateCode​(Job job)
        Generates code for a job
        Parameters:
        job - the job for which code has to be generated.
        Returns:
        a Job if a submit file needs to be generated for the job. Else return null.
      • constructPlannerPrescriptWrapper

        protected java.io.File constructPlannerPrescriptWrapper​(Job dagJob,
                                                                java.io.File directory,
                                                                java.lang.String executable,
                                                                java.lang.String arguments)
        Construct a pegasus plan wrapper script that changes the directory in which pegasus-plan is launched.
        Parameters:
        dagJob - the DAG job corresponding to which the prescript is associated.
        directory - the directory where the submit file for dagman job has to be written out to.
        executable - the path to the planner that needs to be called in the prescript
        arguments - the arguments with which the planner is called.
        Returns:
        the wrapper script that gets called in the prescript for the dag job
      • constructDAGJob

        protected Job constructDAGJob​(Job subdaxJob,
                                      java.io.File directory,
                                      java.io.File subdaxDirectory,
                                      java.lang.String basenamePrefix)
        Constructs a job that plans and submits the partitioned workflow, referred to by a Partition. The main job itself is a condor dagman job that submits the concrete workflow. The concrete workflow is generated by running the planner in the prescript for the job.
        Parameters:
        subdaxJob - the original subdax job.
        directory - the directory where the submit file for dagman job has to be written out to.
        subdaxDirectory - the submit directory where the submit files for the subdag reside.
        basenamePrefix - the basename to be assigned to the files associated with DAGMan
        Returns:
        the constructed DAG job.
      • constructDAGManKnobs

        public java.lang.String constructDAGManKnobs​(Job job)
        Constructs Any extra arguments that need to be passed to dagman, as determined from the properties file.
        Parameters:
        job - the job
        Returns:
        any arguments to be added, else empty string
      • parseInt

        protected static int parseInt​(java.lang.String s)
        Parses a string into an integer. Non valid values returned as -1
        Parameters:
        s - the String to be parsed as integer
        Returns:
        the int value if valid, else -1
      • getBasename

        protected java.lang.String getBasename​(java.lang.String prefix,
                                               java.lang.String suffix)
        Returns the basename of a dagman (usually) related file for a particular partition.
        Parameters:
        prefix - the prefix.
        suffix - the suffix for the file basename.
        Returns:
        the basename.
      • getCacheFile

        protected java.lang.String getCacheFile​(PlannerOptions options,
                                                java.lang.String label,
                                                java.lang.String index)
        Returns the path to the cache file in a workflow's submit directory
        Parameters:
        options - the options for the workflow.
        label - the label for the workflow.
        index - the index for the workflow.
        Returns:
        the path to the cache file
      • getCacheFileName

        protected java.lang.String getCacheFileName​(PlannerOptions options,
                                                    java.lang.String label,
                                                    java.lang.String index)
        Constructs the basename to the cache file that is to be used to log the transient files. The basename is dependant on whether the basename prefix has been specified at runtime or not.
        Parameters:
        options - the options for the sub workflow.
        label - the label for the workflow.
        index - the index for the workflow.
        Returns:
        the name of the cache file
      • getWorkflowFileName

        protected java.lang.String getWorkflowFileName​(PlannerOptions options,
                                                       java.lang.String label,
                                                       java.lang.String index,
                                                       java.lang.String suffix)
        Constructs the basename to a workflow file that. The basename is dependant on whether the basename prefix has been specified at runtime or not.
        Parameters:
        options - the options for the sub workflow.
        label - the label for the workflow.
        index - the index for the workflow.
        suffix - the suffix for the workfklow file.
        Returns:
        the name of the cache file
      • getWorkflowFileBasenamePrefix

        protected java.lang.String getWorkflowFileBasenamePrefix​(PlannerOptions options,
                                                                 java.lang.String label,
                                                                 java.lang.String index)
      • defaultTCEntry

        private TransformationCatalogEntry defaultTCEntry​(java.lang.String site)
        Returns a default TC entry to be used in case entry is not found in the transformation catalog.
        Parameters:
        site - the site for which the default entry is required.
        Returns:
        the default entry.
      • constructTCEntryFromEnvironment

        private TransformationCatalogEntry constructTCEntryFromEnvironment()
        Returns a transformation catalog entry object constructed from the environment An entry is constructed if either of the following environment variables are defined 1) CONDOR_HOME 2) CONDOR_LOCATION CONDOR_HOME takes precedence over CONDOR_LOCATION
        Returns:
        the constructed entry else null.
      • constructTCEntryFromPath

        private TransformationCatalogEntry constructTCEntryFromPath()
        Returns a tranformation catalog entry object constructed from the path environment variable
        Parameters:
        env - the environment profiles.
        Returns:
        the entry constructed else null if environment variables not defined.
      • constructTCEntryFromEnvProfiles

        private TransformationCatalogEntry constructTCEntryFromEnvProfiles​(ENV env)
        Returns a tranformation catalog entry object constructed from the environment An entry is constructed if either of the following environment variables are defined 1) CONDOR_HOME 2) CONDOR_LOCATION CONDOR_HOME takes precedence over CONDOR_LOCATION
        Parameters:
        env - the environment profiles.
        Returns:
        the entry constructed else null if environment variables not defined.
      • constructTransformationCatalogEntryForDAGMan

        private TransformationCatalogEntry constructTransformationCatalogEntryForDAGMan​(java.lang.String path)
        Constructs TransformationCatalogEntry for DAGMan.
        Parameters:
        path - path to dagman
        Returns:
        TransformationCatalogEntry for dagman if path is not null, else null.
      • constructPegasusPlanPrescript

        public Job constructPegasusPlanPrescript​(Job job,
                                                 PlannerOptions options,
                                                 java.lang.String rootUUID,
                                                 java.lang.String properties,
                                                 java.lang.String log)
        Constructs the pegasus plan prescript for the subdax
        Parameters:
        job - the subdax job
        options - the planner options with which subdax has to be invoked
        rootUUID - the root workflow uuid
        properties - the properties file.
        log - the log for the prescript output
        Returns:
        the prescript
      • createSymbolicLinktoCacheFile

        public boolean createSymbolicLinktoCacheFile​(PlannerOptions options,
                                                     java.lang.String label,
                                                     java.lang.String index)
        Creates a symbolic link to the DAX file in a dax sub directory in the submit directory
        Parameters:
        options - the options for the sub workflow.
        label - the label for the workflow.
        index - the index for the workflow.
        Returns:
        boolean whether symlink is created or not
      • createSymbolicLinktoDAX

        public java.lang.String createSymbolicLinktoDAX​(java.lang.String submitDirectory,
                                                        java.lang.String dax)
        Creates a symbolic link to the DAX file in a dax sub directory in the submit directory
        Parameters:
        submitDirectory - the submit directory for the sub workflow.
        dax - the dax file to which the symbolic link has to be created.
        Returns:
        the symbolic link created.
      • createSubmitDirectory

        protected java.lang.String createSubmitDirectory​(ADag dag,
                                                         java.lang.String dir,
                                                         java.lang.String user,
                                                         java.lang.String vogroup,
                                                         boolean timestampBased)
                                                  throws java.io.IOException
        Creates the submit directory for the workflow. This is not thread safe.
        Parameters:
        dag - the workflow being worked upon.
        dir - the base directory specified by the user.
        user - the username of the user.
        vogroup - the vogroup to which the user belongs to.
        timestampBased - boolean indicating whether to have a timestamp based dir or not
        Returns:
        the directory name created relative to the base directory passed as input.
        Throws:
        java.io.IOException - in case of unable to create submit directory.
      • createSubmitDirectory

        protected java.lang.String createSubmitDirectory​(java.lang.String label,
                                                         java.lang.String dir,
                                                         java.lang.String user,
                                                         java.lang.String vogroup,
                                                         boolean timestampBased)
                                                  throws java.io.IOException
        Creates the submit directory for the workflow. This is not thread safe.
        Parameters:
        label - the label of the workflow
        dir - the base directory specified by the user.
        user - the username of the user.
        vogroup - the vogroup to which the user belongs to.
        timestampBased - boolean indicating whether to have a timestamp based dir or not
        Returns:
        the directory name created relative to the base directory passed as input.
        Throws:
        java.io.IOException - in case of unable to create submit directory.
      • sanityCheck

        protected static void sanityCheck​(java.io.File dir)
                                   throws java.io.IOException
        Checks the destination location for existence, if it can be created, if it is writable etc.
        Parameters:
        dir - is the new base directory to optionally create.
        Throws:
        java.io.IOException - in case of error while writing out files.
      • createSymbolicLink

        protected boolean createSymbolicLink​(java.lang.String source,
                                             java.lang.String destination)
        This method generates a symlink between two files
        Parameters:
        source - the file that has to be symlinked
        destination - the destination of the symlink
        Returns:
        boolean indicating if creation of symlink was successful or not
      • createSymbolicLink

        protected boolean createSymbolicLink​(java.lang.String source,
                                             java.lang.String destination,
                                             boolean logErrorToDebug)
        This method generates a symlink between two files
        Parameters:
        source - the file that has to be symlinked
        destination - the destination of the symlink
        logErrorToDebug - whether to log messeage to debug or not
        Returns:
        boolean indicating if creation of symlink was successful or not
      • getParentsTransientRC

        public java.util.Set<java.lang.String> getParentsTransientRC​(Job job)
        Returns a set containing the paths to the parent dax jobs transient replica catalogs.
        Parameters:
        job - the job
        Returns:
        Set of paths