The Web Graph Database


Probe Framework: Probe implementation guidelines


When creating any MeshObject in a Probe, care needs to be taken to assign it a "good" identifier. This is important because the functioning of the Probe Framework depends on any Probe assigning the same MeshObjectIdentifier to the same accessed data element every time the Probe runs.

Automatically generated identifiers, as often used in other parts of InfoGrid applications, are definitely unsuitable because in this case, the Probe Framework's Differencer will be unable to determine which information provided by the external data source changed from one Probe run to the next. The Differencer determines this by looking for MeshObjects with identical MeshObjectIdentifiers in the old and new MeshObjectGraph. If MeshObjectIdentifiers are not kept consistent across Probe runs, the Differencer would detect a very large number of changes. Worse, given that information obtained through a Probe may be related to other information from other data sources in the application's main NetMeshBase, an incorrect assisgnment of MeshObjectIdentifiers would cause those relationships to be deleted upon subsequent runs of the Probe, which would be counter-productive.

As all MeshObjects created in a Probe are NetMeshObjects. This means that any MeshObjectIdentifier created in a Probe will automatically be a NetMeshObjectIdentifier that is comprised of two parts:

For example, when accessing a data source at URL, a MeshObject may be created as follows:

MeshBaseLifecycleManager    life   = freshMeshBase.getMeshBaseLifecycleManager();
MeshObjectIdentifierFactory idFact = freshMeshBase.getMeshObjectIdentifierFactory();

MeshObject abc = life.createMeshObject( idFact.fromExternalForm( "#abc" ));

The MeshObjectIdentifierFactory here determines that string #abc is a relative path, and thus creates a compound NetMeshObjectIdentifier that can be written as

Probes and Exceptions

Probes are allowed and encouraged to throw a variety of Exceptions and RuntimeExceptions in their respective work methods.

When a Probe runs for the very first time, any thrown exception will be thrown back immediately to the caller of the NetMeshBase's accessLocally method. This prevents the Probe Framework from setting up the Probe and corresponding ShadowMeshBase at all, on the grounds that the data source could not be accessed.

Any subsequent Probe runs occur on a background thread, and thus cannot be thrown back. When a Probe throws an exception, the exception is intercepted by the InfoGrid Probe Framework. Regardless of exception type, this indicates to the Probe Framework that the most recent Probe run was unsuccessful.

If no exception was thrown, the Differencer will run immediately after the Probe was run, construct the difference between the "old" and the "new" MeshObjectGraph found at this data source, and update (i.e. create / modify / delete) the information held in the ShadowMeshBase accordingly.

If an exception was thrown by the Probe during a run, the Differencer will not run and the information in the ShadowMeshBase will not be updated.

Depending on the ProbeUpdateSpecification in use, the time of the next scheduled Probe run may differ based on whether an exception was thrown or not in the most recent run.

To determine whether any Probe is running successfully or not during subsequent runs, you can subscribe to events raised by the ShadowMeshBase related to the Probe.

Configuring the ProbeDirectory

The ProbeDirectory contains the information needed to map data sources to Probe classes. When accessing a data source, the Probe framework consults the information in the ProbeDirectory to determine which Probe class to instantiate and to run for this data source.

Each ShadowMeshBase has an associated ProbeDirectory, which may be shared across ShadowMeshBases. As data sources can change their data content dramatically over time, the ProbeDirectory is consulted for the correct Probe class not just prior to the first run, but also any subsequent Probe run.

Generally, the Probe Framework determines which Probe to run in a two-stage process:

  1. If a specific match between a specific URL and a certain Probe class has been configured in the ProbeDirectory, this match will be used.
  2. If no specific match is found, the type of the data obtained from a stream is used to look up which Probe to use. The type of data source that drives the selection is either the MIME type of a non-XML data stream, the Document Data Type of an XML data stream, the name of the XML root element, or the protocol specified in the URL for API Probes.
Last modified 5 years ago Last modified on 07/17/09 05:43:43