TIBCO Data Science – Team Studio Configuration Properties

Use the file chorus.properties in chorus container to configure the TIBCO Data Science – Team Studio.

The configuration file and its companion example file are located in the directory <installation directory>/shared/. The example file chorus.properties.example , contains examples of all of the properties that you can set in chorus.properties. You can use this example file to learn about the possible properties you can set in chorus.properties. You can include any attribute from the example file in the configuration file.

Note: You must restart TIBCO Data Science – Team Studio for the changes you make in chorus.properties to take effect.
chorus.properties options
Property Default setting Description
java_options -Djava.library.path =$CHORUS_HOME/vendor /hadoop/lib/ -server -Xmx4096m XX:MaxPermSize=128 Command-line options to include on the Java command line when running the TIBCO Data Science – Team Studio application.
workflow.enabled true Enables the workflow feature in TIBCO Data Science – Team Studio. The TIBCO Data Science – Team Studio server port is set during installation.
workflow.url http://localhost:8070 The URL of the TIBCO Data Science – Team Studio server, which is installed with TIBCO Data Science – Team Studio. The URL can be an IP address or fully-qualified machine name. Whichever is used, it should be reachable from a browser.

If you must change the port number after installing TIBCO Data Science – Team Studio, be sure to also change the port number in the TIBCO Data Science – Team Studio Tomcat server configuration file, $CHORUS_HOME/alpine/apache-tomcat-7.x.x/conf/server.xml. Look for the <Connector> element with attribute protocol="HTTP/1.1" under the <Service name="Catalina"> element.

smtp.address localhost Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users. Sets the network address of the SMTP service.
smtp.port 587 Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users. Sets the port for the SMTP service.
smtp.user_name USER_NAME Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users.
smtp.password PASSWORD Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users.
smtp.authentication login Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users.
smtp.enable_starttls_auto false Configures the SMTP connection that TIBCO Data Science – Team Studio uses to deliver email notifications to users.
mail.enabled FALSE If true, TIBCO Data Science – Team Studio delivers job completion and failure notifications to users by email.
mail.from FROMNAME <noreply@chorus.com> Sets the from header in the email message.
mail.reply_to REPLY NAME <noreply@chorus.com> Sets the reply_to header in the email message.
sandbox_recommended_size_in_gb 5 The sandbox-related setting. The default unit is in GB.

Note: This value provides a visual indicator that indicates when a workspace's sandbox exceeds the recommended size.

worker_threads 200 Configures the thread pool size of web server and worker processes.
webserver_threads 800 The number of web server threads determines the maximum number of simultaneous web requests.
database_threads 1200 The number of worker threads determines the maximum number of asynchronous jobs, such as table copying or importing, that can be run simultaneously.

Each web or worker thread can use its own connection to the local PostgreSQL database. Therefore, the sum of worker_threads and webserver_threads must be less than the max_connections configured in postgresql.conf. The max_connections parameter can be based on the operating system's kernel shared memory size. For example, on OS X this parameter defaults to 20.

session_timeout_minutes 480 The default session timeout time. The number of minutes you can be inactive before you are logged out.
clean_expired_api_tokens_interval_hours 24 renamed in 6.2 from clean_expired_sessions_interval_hours.
delete_unimported_csv_files_interval_hours 1  
delete_unimported_csv_files_after_hours 12  
instance_poll_interval_minutes 5  
reindex_search_data_interval_hours 24  
reindex_datasets_interval_hours 24 Sets the frequency for data set reindexing.
reset_counter_cache_interval_hours 24  
file_download.name_prefix n/a This optional string is prefixed on all generated file names. For example, if a user downloads a dataset, the name of the file downloaded is the specified prefix, followed by the dataset name and then the .csv extension. Only the first 20 characters of the prefix are used.
file_sizes_mb.workfiles 10 Maximum upload work file size.
file_sizes_mb.csv_imports 100 Maximum size for imported files.
file_sizes_mb.user_icon 5 Maximum size for the user icon.
file_sizes_mb.workspace_icon 5 Maximum size for the workspace icon.
file_sizes_mb.attachment 10 Maximum size for file attachments.
logging.syslog.enabled false If true, logs are written to syslog rather than to files.
logging.loglevel info The minimum severity of messages to log. Can be debug, info, warn, error, or fatal.
oracle.enabled TRUE Enables use of Oracle databases.
gpfdist.ssl.enabled false To enable data movement between databases, gpfdist must be installed and running on the TIBCO Data Science – Team Studio host. Two gpfdist processes must be started with different ports pointing to the same directory. An SSL certificate must be installed on all segment servers.
gpfdist.url sample-gpfdist-server  
gpfdist.write_port 8000  
gpfdist.read_port 8001  
gpfdist.data_dir /tmp  
tableau.enabled TRUE If false, Tableau is disabled even if other Tableau parameters are specified.
tableau.url >ip address> The URL of the Tableau server. The URL can be an IP address or a fully-qualified computer name. Whichever is used, it should be reachable from a browser.
tableau.port 8000 The Tableau server port.

Note: This port must be opened on your Tableau server.

tableau.sites marketing,sales The list of Tableau sites. TIBCO Data Science – Team Studio supports this parameter starting in version 5.3. If this option is not present, TIBCO Data Science – Team Studio publishes to the default Tableau site.
newrelic.enabled false Enables New Relic application performance monitoring. See http://newrelic.com for more information.
newrelic.license_key NEWRELIC_LICENSE_KEY  
default_preview_row_limit 500 The maximum number of preview rows.
execution_timeout_in_minutes 300 The workfile execution timeout in minutes.
visualization.overlay_string n/a This optional string is displayed on all visualizations, both when displaying and when saving.

Only the first 40 characters of the prefix are used.

database_login_timeout 10 Database connection timeout, in seconds. If you are using Google BigQuery as a data source and you are copying large amounts of data between databases, consider increasing this value so the operation does not fail unexpectedly.
jdbc_schema_blacklist.postgresql [information_schema, pg_catalog] Specifies a list of PostgreSQL schemas that are excluded from display, index, and search. (That is, they are effectively excluded from TIBCO Data Science – Team Studio).
jdbc_schema_blacklist.sqlserver [db_accessadmin, db_backupoperator, db_datareader, db_datawriter, db_ddladmin, db_denydatareader, db_denydatawriter, db_owner, db_securityadmin, dbo, INFORMATION_SCHEMA, sys] Specifies a list of SQL Server schemas that are excluded from display, index, and search. (That is, they are effectively excluded from TIBCO Data Science – Team Studio).
jdbc_schema_blacklist.teradata [All, Crashdumps, DBC, dbcmngr, Default, EXTUSER, LockLogShredder, PUBLIC, SQLJ, SysAdmin, SYSBAR, SYSLIB, SYSSPATIAL, SystemFe, SYSUDTLIB, Sys_Calendar, TDPUSER, TDQCD, TDStats, tdwm, TD_SYSFNLIB, TD_SYSXML] Specifies a list of Teradata schemas that are excluded from display, index, and search. (That is, they are effectively excluded from TIBCO Data Science – Team Studio).
job_timeout_in_minutes 60

When the value is set to 0, the feature is bypassed. Else, the worker process resets to scheduled after the set time has elapsed.

Note:
  1. If the set value is greater than the scheduled interval (for example, a job runs every 1 hour but job_timeout_in_minutes is set to 85), the job does not unstuck.

  2. The job_timeout_in_minutes should be greater than the longest running scheduled job (for example, any enabled job that takes 1.5 hours to run, the job_timeout_in_minutes should be greater than 90).

  3. When a job status is reset to scheduled, the process spawned by the job is not terminated. The job continues to run until it completes or an error occurs, even though the user interface shows that the job is not running.