Setting Up Notebooks for Python Execute
Python Notebooks are a very flexible tool. To incorporate your work within a Python Execute operator in a visual workflow, here are some best practices for preparing your Notebooks.
Suggestions for Setting Up Notebooks for Python Execute
The automatically generated tag Ready For Python Execute indicates that you can attach a Notebook to a Python Execute operator. To achieve this attribute, you need the following.
- At least one input or output specified in the notebook with the argument
use_input_substitution = Trueoruse_output_substitution = True. - The notebook input(s) argument
execution_labelare distinct and use exclusively one of the following strings:"1","2", or"3". - All Inputs and output defined with
use_input_substitution = Truemust come from the same type of data source (Hadoop or Database).
Run the Notebook in its entirety from the toolbar by clicking . Do NOT run cells out of order - this can cause issues with metadata information that is passed to the Python Execute operator. After running all cells, save the notebook before attempting to run the notebook in a workflow.
You can create an input for substitution using the following steps:
- Procedure
- Associate a dataset with the workspace.
- In the notebook toolbar, click Data.
- Select the dataset, and then click
Import.
This generates a cell with functions to read the data (for example, cc.read_input_file or cc.read_input_table, depending on whether you selected a database table or a file in HDFS).
- Change use_input_substitution=False to use_input_substitution=True.
- Add a named argument to the function, called
execution_label. This argument should have the string value"1","2", or"3", and is used to identify the inputs in the visual workflow Python Execute operator. It should look something like the following.df_account=cc.read_input_table(table_name='account', schema_name='demo', database_name='miner_demo',use_input_substitution=True, execution_label="1")
- Run the generated cell. This fetches the data, creates a dataframe in the Notebook, and saves this information as a valid input for the Python Execute operator in the visual workflow editor.
To create an output for substitution.
- Procedure
- Use the
cc.write_output_fileorcc.write_output_tablefunction, depending on whether you want to write a table or a file. You can see the function arguments by executing help(cc.write_output_table) in your Notebook. - Run the cell. This writes the dataset and saves this information as a valid output for the Python Execute operator in the visual workflow editor.
- Before using a Python Execute operator, ensure that you have cleaned your Notebook to remove any interactive code (for example, the
help()function).
A notebook is invalid if any of the following conditions exist.
- It has duplicate execution labels.
- It has execution labels that are NOT
"1","2", or"3".Note: These values are strings enclosed in double quotations. - It mixes HDFS and DB inputs with substitution.
Note: Mixing inputs without substitution is allowed
- It has more than one output with substitution.