Contents
The TIBCO StreamBase® Web Reader adapter reads web pages via HTTP GET or POST requests and emits the page contents in a string
field of its Data
output port.
The adapter can be configured to read a web page on demand when receiving a tuple on its control input port, or to periodically poll the web page configured in its HTTP URL property.
The adapter has multiple samples, described in Web Reader Input Adapter Samples. Note that these samples will demonstrate how to perform REST and SOAP requests by the command port to send header information and SOAP or REST payloads.
This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.
Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Adapter: A read-only field that shows the formal name of the adapter.
Class name: Shows the fully qualified class name that implements the functionality of this adapter. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this adapter starts.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Description |
---|---|
HTTP URL | The URL of the web page to read. When the control port is enabled, this contains the default value used when the input tuple's URL field is null. When the control port is disabled, the URL is polled periodically based on the value of the Poll Frequency property. |
HTTP request method | The type of request to send to the HTTP server, the available options are GET and POST. When the control port is enabled, this contains the default value used when the input tuple's RequestType field is null. |
Charset | Use the charset to determine the connection character set as well as how to encode the POST data sent to the server. |
Connect timeout | Sets a specified timeout value, in milliseconds, to be used when opening. A timeout of zero is interpreted as an unlimited timeout. |
Read timeout | Sets a specified timeout value, in milliseconds, to be used when reading. A timeout of zero is interpreted as an unlimited timeout. |
Use Default Charset | If selected, specifies whether the Java platform default character set is to be used. If cleared, a valid character set name must be specified for the Character Set property. |
Character Set | The name of the character set encoding that the adapter is to use to read input or write output. |
Output a tuple for each line received | Used mainly for streaming applications this option will output a tuple for each line of data received from the server. |
Output blank lines | This option will send blank tuples when a blank line is received. Note: Option only available when outputting tuples per line |
Output null tuple on completion | This option will send a tuple with all fields set to null when reading is complete. Note: Option only available when outputting tuples per line |
Maintain Line Separator | If enabled this will maintain the new line and carriage return characters produced by the server in the output result. |
Use basic auth | Enable basic authentication |
Username | When basic authentication is enabled this is the username that will be sent to the server. |
Password | When basic authentication is enabled this is the password that will be sent to the server. |
Enable Control Port | Enables a control input port used to request web pages on demand. Selecting this check box disables the Poll Frequency control. |
Poll Frequency | The time, in milliseconds, to wait between HTTP GET requests. Ignored if the control port is enabled, in which case web requests are made on demand on receipt of an input tuple. |
Enable Pass-Through Fields | Enable the pass-through fields to allow all fields of the incoming control tuple to be copied to the outgoing data and stutus tuple. When enabled the outgoing data tuple will contain a new field called 'PassThroughFields' and the status tuple will contain a new field called 'InputTuple' which will contain the entire contents of the incoming control tuple. |
Ignore certificate errors | If enabled any errors produced by invalid SSL certificates will be ignored and the website will be processed as normal. Warning! This can lead to man in the middle attacks. |
Process As File Download | If enabled the web page will be processed as a binary file download and the data output will be changed to a blob field. |
Log Level | Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE. |
Property | Description |
---|---|
Use Proxy | Use a proxy server in processing the HTTP GET request. |
Proxy Host | The proxy server host name or IP address. |
Proxy Port | The proxy server TCP port number. |
Proxy User | The username to use with the proxy if required. |
Proxy Pass | The password to use with the proxy if required. |
Property | Description |
---|---|
Default HTTP Headers | These HTTP headers will always be sent with the web request. If the HTTPHeaders input is used on the control port then if a key matches it will replace the default. Otherwise, the defaults are appended to the control port's list. |
Property | Description |
---|---|
Decode HTML results | If enabled the adapter will decode strings before output. |
URL Encode | If set to true the Value portion of the URLParams will be URL encoded.\nIf false is no encoding is performed.\nThe control ports options URLEncode value will override this value if present. |
URL Encode Post Data | If set to true the entire PostData value will be URL encoded.\nIf false is no encoding is performed.\nThe control ports options URLEncodePostData value will override this value if present. |
Default URL Params | A list of key value parameter pairs to send to the server along with this request. If the request type is GET this list will be added to the end of the URL field and a "?" is appended between the URL and the parameters. If the URLParams input is used on the control port and if a key matches, it will replace the default. Otherwise, the defaults are appended to the control ports list. |
Default Post Data | If this value is set, it is sent to the server directly and the URLParams value is ignored. Control port values override these values. If the control port contains URLParams, they are used and this value is ignored. |
Use the settings in this tab to allow this operator or adapter to start and stop based on conditions that occur at runtime in a cluster with more than one node. During initial development of the fragment that contains this operator or adapter, and for maximum compatibility with TIBCO Streaming releases before 10.5.0, leave the Cluster start policy control in its default setting, Start with module.
Cluster awareness is an advanced topic that requires an understanding of StreamBase Runtime architecture features, including clusters, quorums, availability zones, and partitions. See Cluster Awareness Tab Settings on the Using Cluster Awareness page for instructions on configuring this tab.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
The Web Reader adapter's ports are used as follows:
-
Control (input): Tuples enqueued on this port cause the adapter to fetch web pages. The schema for this port has the following field:
-
URL, string, the HTTP URL to read. If null, the URL is taken from the adapter's HTTP URL property.
-
(Optional) URLParams, List of Tuples, A list of key value pairs to send to the server along with this request. If the request type is GET this list will be added to the end of the URL field and a "?" will be appended between the URL and params.
-
Key, string, The key value of the HTTP parameter.
-
Value, string, The value to send to the server associated with the given key, null values are ignored but empty values are allowed
-
-
(Optional) PostData, string, If this value is set it will be sent to the server directly and the URLParams value will be ignored.
-
(Optional) URLEncode, boolean, If set to true the Value portion of the URLParams will be URL encoded. If null, false is assumed and no encoding is performed.
-
(Optional) URLEncodePostData, boolean, If set to true the entire PostData value will be URL encoded. If null, false is assumed and no encoding is performed.
-
(Optional) HTTPHeaders, List of Tuples, A list of key value pairs to send to the server as HTTP headers.
-
Key, string, The key value of the HTTP header.
-
Value, string, The header value to send to the server associated with the given key, null values are ignored but empty values are allowed.
-
-
(Optional) RequestType, string, Sets the outgoing HTTP request type, valid values are "POST" and "GET" any other value will be ignored and the default will be used.
-
(Optional) ReadTimeout, int, If this field is present and is not null it will override the default read timeout value for this single HTTP request. This value is in milliseconds, a timeout of zero is interpreted as an unlimited timeout
-
(Optional) ConnectTimeout, int, If this field is present and is not null it will override the default connect timeout value for this single HTTP request. This value is in milliseconds, a timeout of zero is interpreted as an unlimited timeout
-
-
Status (output): The adapter emits tuples from this port when significant events occur, such as when an attempt to read a web page fails. The schema for this port has the following fields:
-
type, string: returns one of the following values to convey the type of event:
-
Read
-
UserInput
-
-
Action, string: returns an action associated with the event Type:
-
Failed
-
Rejected
-
-
Object, string: returns an event type-specific value, such as the HTTP URL for which a read failed or the control input tuple that was rejected.
-
Message, string: Returns a human-readable description of the event.
-
InputTuple, tuple: If Enable Pass-Through Fields is checked this field will contain the input tuple which caused this status message.
-
-
Data (output): Tuples are emitted on this port when web pages are successful read. The schema for this port has the following fields:
-
Data, string, The contents of the web page.
-
Headers, List<Tuple>, The web page response headers. Each tuple will contain a Header (String) value and (List<String>) Values for that header ()
-
PassThroughFields, Tuple, When 'Enable Pass-Through Fields' option is checked this field will appear and contains the entire contents of the incoming control port request.
-
The Web Reader adapter uses typecheck messages to help you configure the adapter within your StreamBase application. In particular, the adapter generates typecheck messages for the following reasons:
-
The Control Input Port is disabled and no HTTP URL value is provided.
-
The Control Input Port is disabled and an invalid (unspecified or negative) Polling Frequency is specified.
-
The Control Input Port is enabled but is not presented with the required schema.
-
The Use Proxy property is enabled but no Proxy Host or Proxy Port is specified.
The adapter generates warning messages during runtime under various conditions, including:
-
A control tuple is received with a null value in its URL field and a value for the adapter's HTTP URL property has not been specified.
-
An error occurs attempting to read a web page.
When suspended, the adapter stops processing web pages.
When resumed, the adapter once again starts processing web pages.