Setting Up Notebooks for Python Execute

Python Notebooks are a very flexible tool. To incorporate your work within a Python Execute operator in a visual workflow, here are some best practices for preparing your Notebooks.

Suggestions for Setting Up Notebooks for Python Execute

The automatically generated tag Ready For Python Execute indicates that you can attach a Notebook to a Python Execute operator. To achieve this attribute, you need the following.

  • At least one input or output specified in the notebook with the argument use_input_substitution = True or use_output_substitution = True.
  • The notebook input(s) argument execution_label are distinct and use exclusively one of the following strings: "1", "2", or "3".
  • All Inputs and output defined with use_input_substitution = True must come from the same type of data source (Hadoop or Database).

Run the Notebook in its entirety from the toolbar by clicking Cell > Run All. Do NOT run cells out of order - this can cause issues with metadata information that is passed to the Python Execute operator. After running all cells, save the notebook before attempting to run the notebook in a workflow.

You can create an input for substitution using the following steps:

  1. Associate a dataset with the workspace.
  2. In the notebook toolbar, click Data.
  3. Select the dataset, and then click Import.

    This generates a cell with functions to read the data (for example, cc.read_input_file or cc.read_input_table, depending on whether you selected a database table or a file in HDFS).

  4. Change use_input_substitution=False to use_input_substitution=True.
  5. Add a named argument to the function, called execution_label. This argument should have the string value "1", "2", or "3", and is used to identify the inputs in the visual workflow Python Execute operator. It should look something like the following.
    df_account=cc.read_input_table(table_name='account', schema_name='demo', database_name='miner_demo',use_input_substitution=True, execution_label="1")
  6. Run the generated cell. This fetches the data, creates a dataframe in the Notebook, and saves this information as a valid input for the Python Execute operator in the visual workflow editor.
To create an output for substitution.
  1. Use the cc.write_output_file or cc.write_output_table function, depending on whether you want to write a table or a file. You can see the function arguments by executing help(cc.write_output_table) in your Notebook.
  2. Run the cell. This writes the dataset and save this information as a valid output for the Python Execute operator in the visual workflow editor.
  3. Before using a Python Execute operator, ensure that you have cleaned your Notebook to remove any interactive code (for example, the help() function).

A notebook is invalid if any of the following conditions exist.

  • It has duplicate execution labels.
  • It has execution labels that are NOT "1", "2", or "3".
    Note: These values are strings enclosed in double quotations.
  • It mixes HDFS and DB inputs with substitution.
    Note: Mixing inputs without substitution is allowed
  • It has more than one output with substitution.

Team Studio Commander Functions

Team Studio Commander is a set of functions available in a Notebook that connects the Python environment to the Team Studio environment. This includes features such as adding data associations, importing files, and outputting results. To access detailed documentation, run help(cc) from a Notebook.
Related reference