RecogniContact – International Address Parser
Contact data and address parsing for your applications
RecogniContact is a software component for parsing address data and contact information.
Software producers can integrate RecogniContact into their applications.
- RecogniContact splits up text into fields:
First name, last name, street address, ZIP/postal code, place name, etc.
RecogniContact standardizes field values:
phone number formats, names of countries, etc.
RecogniContact adds implicit information:
Gender of a person's first name, country of a postal address, etc.
Wherever software users copy or transfer address and contact information manually from various sources, RecogniContact can help
reduce the effort to an absolute minimum.
- When your clients enter contact information or addresses to input masks or form fields
- When adding contact information to tables or databases
- When standardizing and normalizing addresses and contact data
Here are some examples of applications in which RecogniContact is currently used:
An application that allows copying the address data from the signature at the end of an email to a personal
contact database with just a mouse click.
- A tool that automatically transfers contact data you receive by email to the address field of an ERP solution
A tool that automatically collects sales leads on web sites and transfers them to a CRM database
A Form-printing software that allows users to fill in name and address fields of parcel labels and bank transfer forms.
Instead of filling in form values field by field, users simply copy the entire address block to the application.
Windows COM object
RecogniContact is available as a Windows COM object. It can be integrated into software projects with minimum effort.
Every modern Windows development environment allows integrating COM objects.
See the RecogniContact online help for detailled technical information.
Free 30-day trial license
A free RecogniContact trial version of is available on request.
Please add a short description of the project into which you plan to integrate RecogniContact.
Countries and Languages
RecogniContact splits up contact information without postal addresses for all countries of the world.
The following preconditions apply:
- The data is noted in Latin letters (as opposed to, for example, Greek or Cyrillic letters)
- Language dependent elements are specified in one of the 13 languages currently supported
For the following countries, RecogniContact also recognizes contact information including
postal addresses. For these countries, RecogniContact includes a comprehensive database
with place names, such that it can identify the country of a postal address even if the country is not explicitly
specified as a part of it.
RecogniContact splits up contact information including postal address data for the following countries:
- AT - Austria
- BE - Belgium
- CH - Switzerland (plus Liechtenstein)
- DE - Germany
- DK - Denmark
- ES - Spain (plus Andorra)
- FI - Finland
- FR - France (plus Monaco)
- GB - United Kingdom
- IE - Ireland
- IS - Iceland
- IT - Italy (plus San Marino and the Vatican)
- LU - Luxembourg
- NL - The Netherlands
- NO - Norway
- PT - Portugal
- SE - Sweden
- US - United States
RecogniContact recognizes all commonly used strings for structuring contact information (e.g. name:, address:, phone:, email:, etc.) in the following languages:
Contact information fields
RecogniContact extracts the following fields from text containing contact information:
- Person-related fields
- Name prefixes (Mr., Dr., etc.)
- First name or initial letter
- Second name or initial letter
- Last name
- Suffix - titles such as Ph.D., MBA or name suffixes such as Junior, Jr
- Position (technical director, marketing manager, etc.)
- Company/organization-related fields
- Company/organization name
- Street address
- ZIP/postal code of the street address
- Post office box address
- ZIP/postal code of the post office box address
- Place name
- Region information: state (USA), county (Ireland), province (Italy), canton (Switzerland), Bundesland (Germany), …
- Telephone numbers
- Fixed line
- Mobile phone
- Fax number
- Email address
- Website address
RecogniContact recognizes structuring element that are embedded in contact information (Name: Address:
Tel: Fax:) and uses them as a help to interpret the data.
RecogniContact understands structuring elements in 13 different languages (see above).
RecogniContact automatically identifies the country that contact information comes from.
It uses the following information for this purpose:
- ZIP/postal code format and place name (The integrated database comprises more than 200.000 place-names)
- Country codes in telephone numbers
- Country domains in email and web addresses
This information will be used to standardize phone numbers to a unified format or to add country information if it is missing in a postal address.
If a contact record contains a person's name, RecogniContact automatically adds the person's gender from the first name.
First names that don't allow a conclusion on the person's gender are taken into consideration:
Alex, Cameron, Chris, Sasha.
Mobile phone numbers
If a phone number starts with the prefix of mobile phone network, RecogniContact automatically assigns it to the mobile phone number field.
In addition to standardized formats for address and phone numbers, RecogniContact recognizes
all other commonly used conventions for each country. In particular, RecogniContact does not
require contact information elements to be separated by any specific separators, or separators
to be used consistently throughout the text.
This is particularly helpful
if contact data comes from sources (email messages, websites) where the items don't have any predefined structure or
- if addresses are copied from tabular sources like spreadsheets or tables on web sites.
RecogniContact was optimized for a minimum resource requirements.
The redistributable files that will be installed on a customer's PC comprise three
redistributable files with a total size of less than 3 megabytes.
These files already contain the database with place names and strings that are required to recognize country- and language-dependent contact information.
The time required to parse a contact information record is in the range of a few milliseconds.
Stand-alone solution - no Internet connection required
RecogniContact is a fully self-contained solution that performs the address parsing
solely on the end user's computer. No connection to a server of any kind is required.
Sensitive contact data need not be transferred to a web service provider via the Internet.
Integration with minimum effort
As a COM class, RecogniContact can be integrated into your Windows application with minimum effort.
In a Visual Basic project, a basic integration of RecogniContact can be achieved as follows:
Dim RC As Object
Dim ParsedContact As Object
Set RC = CreateObject("RecogniContact.Parser")
TextToParse = "LoquiSoft, Porzellangasse 7a/8, 1090 Vienna, www.loquisoft.com"
Set ParsedContact = RC.Parse(TextToParse)
...... now use ParsedContact.GetValue(<FieldID>)
...... to access the parsed values
For complete documentation and code samples in other programming languages see the RecogniContact online help
RecogniContact contains a comprehensive database, that contains, among other things, the following information:
- More than 200'000 place-names in Europe and the USA. They allow identifying the country for a postal address, even if the country is not specified explicitly.
- 12'000 first names with gender information.
- Country codes
The international country prefixes of all countries in the world, from +1 (USA & Canada) to +997 (Bahamas)
- Multilingual place names
Vienna, Vienne, Wien, Wenen, …
- Country specific regional information
State (USA), provice (Italy), canton (Switzerland), Bundesland (Germany), county (Ireland), …
- Strings in 13 languages for the following elements:
- Country names:
Germany, Deutschland, Allemagne, Duitsland
- Job titles:
Director, Direktor, Directeur
- Common street identifiers:
-street, -straße, rue, -straat
- Post office box identifiers:
P.O. Box, Postfach, Boîte postale, Postbus
Mrs., Fr., Mme, Mevr
- Company types:
Ltd, GmbH, Sarl, BV
- Strings used to structure contact information:
Name: Nom: Naam: Namn:
Recognition algorithms that try to reduce the problem of contact information parsing
to a few standard recognition patterns show their limitations very fast. The are unable to
bear the challenges of robust and reliable contact and address parsing in some very common situations:
- If an unexpected address supplement is used
- If unexpected punctuation characters are used in the source text or white spaces are misplaced
- If items are not separated by newline characters or by consistent separator characters
- If elements are copied from spreadsheets in which the data is arranged in a tabular format
Input data as diverse and complex as contact and address information cannot be captured in a limited
number of recognition patterns or regular expressions.
This is particularly true with international address data. In practice, only a small fraction of all textual
contact information complies with standards.
It is impossible to create a comprehensive list of all address formats and conventions ever used in reality.
And even if such a list were available, from a certain level of complexity merely pattern-based recognition would be very inefficient.
To meet these challenges, LoquiSoft has created a set of algorithms tailored specifically
to the problem of address and contact information parsing.
RecogniContact as far as possible splits up contact data independently of standard address
formats, specific separator characters, and consistent structure. With this strategy, RecogniContact
achieves unmatched recognition results.
As in all software that processes semantic and language dependent information,
for the problem of address parsing a certain residual error rate cannot be avoided, due to unknown or ambiguous information.
(example: automatic spell-checking.)
To ensure the quality of RecogniContact's recognition results, LoquiSoft uses the following methods:
- Test database
During every software update, a test database with thousands of manually tagged contact data records
is used as a basis to verify and improve RecogniContact's parsing algorithms.
- Felxible recognition rules
The set of recognition rules within RecogniContact is highly flexible.
New rules, exceptions and exceptions of exceptions can be added without the risk of making the
recognition algorithm slow, inefficient or overly complex to manage.
- User feedback
Whenever users of our product ContactCopy find recognition problems,
they can report them to us with just a mouse click. This valuable feedback has helped us eliminate problems and
improve our recognition algorithms since ContactCopy's original publication in 2007.
LoquiSoft – specialist for semantic software
Werner Noska, the founder of LoquiSoft, has done scientific work in the field of artificial intelligence and has more than
10 years of experience with semantic software, language data processing, and parsing technologies.
LoquiSoft has created solutions for the following customers: