Sunday, January 22, 2012

Metadata import from external sources in Autonomy connectors

My first entry will be about importing metadata from external sources when using Autonomy connectors for IDOL Server.

Sometimes, it is required to import additional metadata for indexed documents. In this article, I will describe import process based on Autonomy Lotus Notes Connector, but this procedure is applicable to other Autonomy connectors.

The easiest and most efficient way (yet not very well documented) to import data from basically any external source is to use ImportURL feature. It allows to import metadata using web page. I have used ASP.NET web page to import data from SQL database, but it can be used in similar way to connect to any backend system.

Your exported metadata must be located in head section in meta tags where name attribute should contain name of your field and content should contain exported data. When I was designing import, I have distinguished two types of fields:

  • Single value - which does not require further processing on Notes Connector side
  • Multi value - which were spitted on Notes Connector side and indexed by IDOL Server as multi value field

Following output is expected by Notes Connector ImportURL

<HTML>
<HEAD>
<TITLE>
<META name=SingelValueField1 content=sampleContent1>
<META name=SingelValueField2 content=sampleContent2>
<META name=MultiValueField content=Value1,Value2,Value3>
</HEAD>
<BODY>
</BODY>
</HTML>

Let's assume our meta data import page will be queried like this:

http://some_address/someweb_app/MetaDataImport.aspx?ParameterOne=some_value&ParameterTwo=some_Value

Now, how this is done on Notes Connector side.

As a first step, I recommend to define field with url prefix, which can be used later on to form full url. This is useful, in case you need to change address of your import page.

FixedFieldName0=tmp_import_url
FixedFieldValue0=http://some_address/someweb_app/
MetaDataImport.aspx?ParameterOne=

Let's assume we are importing following fields from Lotus Notes document, which later on, will be used to ask for metadata (this can be for ex document id in SQL database)

DreField0=tmp_param1_field
NotesField0=param1_field

DreField1=tmp_param2_field
NotesField1=param2_field

Next, we should define fields for escaping parameters (in case we would like to use imported fields in original state for other purposes). It is good to set some default value in case Lotus field is not present. This way we will avoid crash of Notes Connector

FixedFieldName1=tmp_param1_escape_field
FixedFieldValue1="null_value"

FixedFieldName2=tmp_param2_escape_field
FixedFieldValue2="null_value"

and copy our parameters there

ImportFieldOp1=FieldGlue
ImportFieldOpApplyTo1=tmp_param1_escape_field
ImportFieldOpParam1=Fnameparam1_field

ImportFieldOp2=FieldGlue
ImportFieldOpApplyTo2=tmp_param2_escape_field
ImportFieldOpParam2=Fnameparam2_field

Then we should escape our input parameters and form output url

ImportFieldOp3=Escape
ImportFieldopApplyTo3=tmp_param1_escape_field

ImportFieldOp4=Escape
ImportFieldopApplyTo4=tmp_param2_escape_field

ImportFieldOp5=FieldGlue
ImportFieldOpApplyTo5=tmp_import_url
ImportFieldOpParam5=Fnametmp_import_url,
Fnametmp_param1_escape_field,
&ParameterTwo=,Fnametmp_param2_escape_field

Now it is time to import data from our web page
ImportFieldOp6=ImportURL
ImportFieldOpApplyTo6=tmp_import_url
ImportFieldOpParam6=1;

After this operation, our output IDX field will contain following fields

...
#DREFILED SingelValueField1="sampleContent1"
#DREFILED SingelValueField2="sampleContent2"
#DREFILED MultiValueField="Value1,Value2,Value3"
...

Last thing we need to do is process our raw MultiValueField into real multi value field

ImportFieldOp7=Expand
ImportFieldOpApplyTo7=MultiValueField
ImportFieldOpParam7=,;multi_value_field

As an output we will get following IDX file. Please notice, that your IDX file will still contain original MultiValueField field. If you don't need it, you can filter it out on IDOL Server level.

...
#DREFILED SingelValueField1="sampleContent1"
#DREFILED SingelValueField2="sampleContent2"
#DREFILED MultiValueField="Value1,Value2,Value3"
#DREFIELD multi_value_field="Value1"
#DREFIELD multi_value_field="Value2"
#DREFIELD multi_value_field="Value3"
...

In log file, you should get something similar to this

...
12/01/2012 11:48:40 [0] IMPORTURL retrieving URL [http://some_address/someweb_app/MetaDataImport.aspx?ParameterOne=SomeValue&ParameterTwo=NextValue]
12/01/2012 11:48:40 [0] IMPORTURL returned 200 for http://some_address/someweb_app/MetaDataImport.aspx?ParameterOne=SomeValue&ParameterTwo=NextValue
12/01/2012 11:48:40 [0] IMPORTURL created temp file [c:\PathToNotesConnector\NotesFetch\Temp\IMPORTURL132636532092008988.tmp.AutnImportedURL.html] of 7481 bytes
...

And this is it.

Using ImportURL you are able to integrate with any data source and aggregate your data before putting them into IDOL Server. It has their limitations, but it is very simple and flexible

I hope this will save you couple of hours and/or couple of additional grey hairs :)

Please leave a comment if you have any ideas for improvements and good luck!

No comments:

Post a Comment