Sunday, January 29, 2012

Efficient dates processing in Autonomy Connectors

It happens that data for indexing are in poor shape. Majority of problems are different formats of the same piece of information. This time I will write shortly about dates processing and how to keep them consistent. I'm going to show ImportExtractDate command and it's parameters:

Input formats

ImportExtractDateFormatCSVs0=

This is useful if we have number of date/time formats which we would like to convert into one common. Each of input date format should be specified according to Autonomy documentation. You should start specifying formats from longest date strings to make sure they are resolved properly.
Source of the date

ImportExtractDateFrom0=

This parameter can be used to specify source of our data

  • 0 No date is extracted.
  • 1 The current time is extracted.
  • 2 The date that the document was last accessed is extracted.
  • 4 The time that the document was created is extracted.
  • 8 The date that the document was last modified is extracted.
  • 16 The date is extracted from the ImportExtractDateFromFieldN
  • 32 The date is extracted from the document's content.
  • 64 The date is extracted from the document's file name.

As you can see, there are plenty of options. In my final example, I will use 16 (I will get my date from field)

Source field

ImportExtractDateFromField0=

If you had specified 16 in previous option, you should give here name of field which will be source of date for parsing

Output field

ImportExtractDateToField0=

Destination format

ImportExtractDateToFormat0=

Here you can specify output format.

In my final example I will parse date from field tmp_date_of_creation, convert it to common format YYYY/MM/DD HH:NN:SS and write it to field date_of_creation for possible further processing


ImportExtractDateFormatCSVs2="MM-DD-YYYY HH:NN:SS ZZZ",
"MM-DD-YYYY HH:NN:SS","MM-DD-YYYY"
ImportExtractDateFrom2=16
ImportExtractDateFromField2=tmp_date_of_creation
ImportExtractDateToField2=date_of_creation
ImportExtractDateToFormat2="YYYY/MM/DD HH:NN:SS"


Enjoy!

No comments:

Post a Comment