English | Deutsch
 
Products

ATTENTION!
The RecogniContact series of products moved to

http://address-parser.com
Please change your bookmarks.

RecogniContact – International Address Parser

Try online now


Contact data and address parsing for your applications

RecogniContact is a software component for parsing address data and contact information. Software producers can integrate RecogniContact into their applications.

  • RecogniContact splits up text into fields:
    First name, last name, street address, ZIP/postal code, place name, etc.

  • RecogniContact standardizes field values:
    phone number formats, names of countries, etc.

  • RecogniContact adds implicit information:
    Gender of a person's first name, country of a postal address, etc.

Wherever software users copy or transfer address and contact information manually from various sources, RecogniContact can help reduce the effort to an absolute minimum.

  • When your clients enter contact information or addresses to input masks or form fields

  • When adding contact information to tables or databases

  • When standardizing and normalizing addresses and contact data


Sample applications
Here are some examples of applications in which RecogniContact is currently used:

  • An application that allows copying the address data from the signature at the end of an email to a personal contact database with just a mouse click.

  • A tool that automatically transfers contact data you receive by email to the address field of an ERP solution

  • A tool that automatically collects sales leads on web sites and transfers them to a CRM database

  • A Form-printing software that allows users to fill in name and address fields of parcel labels and bank transfer forms. Instead of filling in form values field by field, users simply copy the entire address block to the application.

Windows COM object
Help
RecogniContact is available as a Windows COM object. It can be integrated into software projects with minimum effort. Every modern Windows development environment allows integrating COM objects.
See the RecogniContact online help for detailled technical information.

Free 30-day trial license
A free RecogniContact trial version of is available on request. Please add a short description of the project into which you plan to integrate RecogniContact.


Features

Countries and Languages

RecogniContact splits up contact information without postal addresses for all countries of the world.

The following preconditions apply:

  • The data is noted in Latin letters (as opposed to, for example, Greek or Cyrillic letters)
  • Language dependent elements are specified in one of the 13 languages currently supported

For the following countries, RecogniContact also recognizes contact information including postal addresses. For these countries, RecogniContact includes a comprehensive database with place names, such that it can identify the country of a postal address even if the country is not explicitly specified as a part of it.

Countries
RecogniContact splits up contact information including postal address data for the following countries:

  • AT - Austria
  • BE - Belgium
  • CH - Switzerland (plus Liechtenstein)
  • DE - Germany
  • DK - Denmark
  • ES - Spain (plus Andorra)
  • FI - Finland
  • FR - France (plus Monaco)
  • GB - United Kingdom
  • IE - Ireland
  • IS - Iceland
  • IT - Italy (plus San Marino and the Vatican)
  • LU - Luxembourg
  • NL - The Netherlands
  • NO - Norway
  • PT - Portugal
  • SE - Sweden
  • US - United States

Languages
RecogniContact recognizes all commonly used strings for structuring contact information (e.g. name:, address:, phone:, email:, etc.) in the following languages:

  • Catalan
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Icelandic
  • Italian
  • Norwegian
  • Portuguese
  • Spanish
  • Swedish

Contact information fields

RecogniContact extracts the following fields from text containing contact information:

  • Person-related fields
    • Name prefixes (Mr., Dr., etc.)

    • First name or initial letter
    • Second name or initial letter
    • Last name
    • Suffix - titles such as Ph.D., MBA or name suffixes such as Junior, Jr
    • Position (technical director, marketing manager, etc.)

  • Company/organization-related fields
    • Company/organization name
    • Department

  • Address
    • Street address
    • ZIP/postal code of the street address
    • Post office box address
    • ZIP/postal code of the post office box address
    • Place name
    • Country
    • Region information: state (USA), county (Ireland), province (Italy), canton (Switzerland), Bundesland (Germany), …

  • Telephone numbers
    • Fixed line
    • Mobile phone
    • Fax number

  • Internet
    • Email address
    • Website address

Structuring elements
RecogniContact recognizes structuring element that are embedded in contact information (Name: Address: Tel: Fax:) and uses them as a help to interpret the data. RecogniContact understands structuring elements in 13 different languages (see above).

Country identification
RecogniContact automatically identifies the country that contact information comes from. It uses the following information for this purpose:

  • ZIP/postal code format and place name (The integrated database comprises more than 200.000 place-names)
  • Country codes in telephone numbers
  • Country domains in email and web addresses

This information will be used to standardize phone numbers to a unified format or to add country information if it is missing in a postal address.

Persons' gender
If a contact record contains a person's name, RecogniContact automatically adds the person's gender from the first name.

First names that don't allow a conclusion on the person's gender are taken into consideration: Alex, Cameron, Chris, Sasha.

Mobile phone numbers
If a phone number starts with the prefix of mobile phone network, RecogniContact automatically assigns it to the mobile phone number field.

Format independence
In addition to standardized formats for address and phone numbers, RecogniContact recognizes all other commonly used conventions for each country. In particular, RecogniContact does not require contact information elements to be separated by any specific separators, or separators to be used consistently throughout the text.

This is particularly helpful

  • if contact data comes from sources (email messages, websites) where the items don't have any predefined structure or
  • if addresses are copied from tabular sources like spreadsheets or tables on web sites.

Technical highlights

Resource footprint
RecogniContact was optimized for a minimum resource requirements.
The redistributable files that will be installed on a customer's PC comprise three redistributable files with a total size of less than 3 megabytes. These files already contain the database with place names and strings that are required to recognize country- and language-dependent contact information.

Performance
The time required to parse a contact information record is in the range of a few milliseconds.

Stand-alone solution - no Internet connection required
RecogniContact is a fully self-contained solution that performs the address parsing solely on the end user's computer. No connection to a server of any kind is required. Sensitive contact data need not be transferred to a web service provider via the Internet.

Integration with minimum effort
As a COM class, RecogniContact can be integrated into your Windows application with minimum effort. In a Visual Basic project, a basic integration of RecogniContact can be achieved as follows:

Dim RC As Object
Dim ParsedContact As Object

Set RC = CreateObject("RecogniContact.Parser")
RC.Initialize("<Name>","<LicenseKey>")

TextToParse = "LoquiSoft, Porzellangasse 7a/8, 1090 Vienna, www.loquisoft.com"

Set ParsedContact = RC.Parse(TextToParse)

...... now use ParsedContact.GetValue(<FieldID>)
...... to access the parsed values

Help
For complete documentation and code samples in other programming languages see the RecogniContact online help

 

Integrated database

RecogniContact contains a comprehensive database, that contains, among other things, the following information:

  • More than 200'000 place-names in Europe and the USA. They allow identifying the country for a postal address, even if the country is not specified explicitly.

  • 12'000 first names with gender information.

  • Country codes
    The international country prefixes of all countries in the world, from +1 (USA & Canada) to +997 (Bahamas)

  • Multilingual place names
    Vienna, Vienne, Wien, Wenen, …

  • Country specific regional information
    State (USA), provice (Italy), canton (Switzerland), Bundesland (Germany), county (Ireland), …

  • Strings in 13 languages for the following elements:

    • Country names:
      Germany, Deutschland, Allemagne, Duitsland

    • Job titles:
      Director, Direktor, Directeur

    • Common street identifiers:
      -street, -straße, rue, -straat

    • Post office box identifiers:
      P.O. Box, Postfach, Boîte postale, Postbus

    • Salutations:
      Mrs., Fr., Mme, Mevr

    • Company types:
      Ltd, GmbH, Sarl, BV

    • Strings used to structure contact information:
      Name: Nom: Naam: Namn: 

Recognition algorithm

Recognition algorithms that try to reduce the problem of contact information parsing to a few standard recognition patterns show their limitations very fast. The are unable to bear the challenges of robust and reliable contact and address parsing in some very common situations:

  • If an unexpected address supplement is used
  • If unexpected punctuation characters are used in the source text or white spaces are misplaced
  • If items are not separated by newline characters or by consistent separator characters
  • If elements are copied from spreadsheets in which the data is arranged in a tabular format

Input data as diverse and complex as contact and address information cannot be captured in a limited number of recognition patterns or regular expressions. This is particularly true with international address data. In practice, only a small fraction of all textual contact information complies with standards. It is impossible to create a comprehensive list of all address formats and conventions ever used in reality. And even if such a list were available, from a certain level of complexity merely pattern-based recognition would be very inefficient.

To meet these challenges, LoquiSoft has created a set of algorithms tailored specifically to the problem of address and contact information parsing.

RecogniContact as far as possible splits up contact data independently of standard address formats, specific separator characters, and consistent structure. With this strategy, RecogniContact achieves unmatched recognition results.


Quality measures

As in all software that processes semantic and language dependent information, for the problem of address parsing a certain residual error rate cannot be avoided, due to unknown or ambiguous information. (example: automatic spell-checking.)

To ensure the quality of RecogniContact's recognition results, LoquiSoft uses the following methods:

  • Test database
    During every software update, a test database with thousands of manually tagged contact data records is used as a basis to verify and improve RecogniContact's parsing algorithms.

  • Felxible recognition rules
    The set of recognition rules within RecogniContact is highly flexible. New rules, exceptions and exceptions of exceptions can be added without the risk of making the recognition algorithm slow, inefficient or overly complex to manage.

  • User feedback
    Whenever users of our product ContactCopy find recognition problems, they can report them to us with just a mouse click. This valuable feedback has helped us eliminate problems and improve our recognition algorithms since ContactCopy's original publication in 2007.

LoquiSoft – specialist for semantic software

Werner Noska, the founder of LoquiSoft, has done scientific work in the field of artificial intelligence and has more than 10 years of experience with semantic software, language data processing, and parsing technologies.

LoquiSoft has created solutions for the following customers: