Data sources

One of the most powerful features of the Mailkit platform is support for external data sources. Data sources can help automate many different activities – from updating the recipient list, retrieving dynamic content, retrieving product information, to managing mailings.

To start automatically updating your recipient lists and the content of your campaigns, you must first set up your data sources. Mailkit currently supports data sources in XML, RSS, JSON and CSV formats.

Some formats are of limited use, e.g. the RSS format can only be used to retrieve content into a template, while the CSV can only be used to update the recipient list. XML data sources are the most versatile and can be used for both updating the recipient list, passing content to templates, filling SQL database (product feeds) or managing mailings.

Management of data sources can be found in the main menu Profile –> Data sources, where you will find individual data sources divided into groups Active (actively used), Unused (sources that have not been used for a long time) and All.

Data source for updating the recipient list
How to prepare a data source
Data source import
Use of XML & RSS data sources in templates
Product data source
Delivery feed

Data source for updating the recipient list

You can easily create a new recipient list (or update an existing list) using an XML, JSON, or CSV data source. We strongly recommend preferring XML and JSON formats to CSV, as the first two are so-called structured formats and are less prone to errors, while a small change in the CSV file can lead to inconsistent data and data mixing.

Start by clicking the Add data source button and a dialog box will appear. The content of this box will be continuously updated as you select individual settings.

Name – the name of the data source. If the data source is used for a new recipient list, the recipient list will be given the same name. If the data source will be used to retrieve content, then this name will be used to address the source in the templates.
Description – description of the data source.
Source – URL address where XML or RSS is located, which should serve as a data source. We strongly recommend that you take into account the security of the data source and ensure that it is not accessible to unauthorized persons and uses https encryption whenever it is a source that may contain sensitive data.
Type – type of data source. Options are CSV, RSS, XML, SQL and JSON. Based on this option, you will then see additional setting options.
Target (available only for XML data sources) - allows you to select how the data will be used. Options are Template, Mailing list, and Delivery.
Mailing list (available only for CSV, JSON, XML recipient list sources) – allows you to select a list of recipients to be updated, or to create a new mailing list – it will be named the same as the data source. If you select “*Unsubscribe emails”, recipients will be added to the list of unsubscribed recipients and no messages other than transactional messages can be sent to them.

Authorization – check if access to the data source requires authorization with a name and password. Name and password can have a maximum of 64 characters!
Scheduled update – allows to set a schedule for updating the data source. Remember that it is not enough for the system to perform the update, but it is necessary that the data itself is updated with the same (or higher) frequency.
Auto update – choose if you want the update to occur before the campaign is sent. Keep in mind, however, that the update will delay the actual distribution of the campaign by the time it takes to process the data source. (NOTE: The "Auto update" function will be turned off after 1/11/2022)
Last update – information about the date of the last update of the data source.

Your data source settings will be saved by clicking the Save button. The data source will then be ready for the next step – in the case of a data source for the list of recipients, it is necessary to set the assignment of individual fields from the data source to the recipient's records.

How to prepare a data source

The data source does not have a fixed structure, but must follow the basic rules. Recipient list data sources can be in structured XML and JSON formats, or unstructured CSV. Data sources must be in UTF8 encoding and fully valid in accordance with the relevant standard – we draw attention to the need to handle the correct encoding of characters such as &, <,>, diacritics and special characters. In the case of the CSV format, special care is required, as it is an unstructured format in which columns can be easily added, swapped or shifted, and the system will not be able to recognize this change.

The data source file must be located on a URL accessible from Mailkit's servers and secured against third-party access because it is sensitive data. Security can be implemented either by restricting access only from Mailkit IP addresses (network 185.136.200.0/22) or by HTTP authentication using a name and password.

Updating recipient lists using data sources is incremental, i.e. new recipients are added, recipients who have changed are updated. If there are also recipients in the data source who are already in the recipients list and at the same time there has been no change (ie the data in the list coincide with the data in the data source), these recipients are skipped during the update. Therefore, as part of streamlining the automatic updating of recipient lists using data sources, we recommend using incremental data sources, i.e. those where the data source for updating contains only new recipients and recipients whose changes have occurred.

Because each client uses a different information system with different options, the data source system is built as universally as possible and does not prescribe a specific structure of the required data. We therefore leave it up to the clients to name the individual branches of the structure and then pair them with the recipient's fields according to their own needs. However, data sources are subject to certain technical limitations:

Fully valid XML, JSON or CSV
Attributes are not supported in XML format (eg. first_name="Jane" gender="F" country="USA")
UTF-8 character encoding recommended
Do not use the "," character as a delimiter for multiple values, but use the "|" character
The unique identifier of the record and at the same time the only mandatory field is the email address. If there are multiple records with the same email address, they will be overwritten

To make it easier to prepare your data source, we have prepared indicative examples of data sources and basic data common in the field of e-commerce.

XML

<?xml version="1.0" encoding="utf-8"?>
<contacts>
  <contact>
    <email>email@sample.com</email>
    <client_id>ID</client_id>
    <first_name>John</first_name>
    <last_name>Doe</last_name>
    <gender>m</gender>
    <mobile>+1xxxyyyzzzz</mobile>
    <street>One mailkit way</street>
    <city>Utopia</city>
    <zip>12345</zip>
    <state>California</state>
    <country>USA</country>
    <birthdate>12/31/2000</birthdate>
    <reg_date>01/31/2018</reg_date>
    <first_sale>02/14/2018</first_sale>
    <last_sale>03/18/2018</last_sale>
    <last_active>06/21/2018</last_active>
    <top_category>|ID|ID|ID|</top_category>
    <top_brands>|ID|ID|ID|</top_brands>
    <top_products>|ID|ID|ID|</top_products>
    <bonus_points>123</bonus_points>
  </contact>
</contacts>

JSON

[
	{
		"email":"email@sample.com",
		"client_id":"ID",
		"first_name":"John",
		"last_name":"Doe",
		"gender":"m",
		"mobile":"+1xxxyyyzzzz",
		"street":"One Mailkit way",
		"city":"Utopia",
		"zip":"12345",
		"state":"California",
		"country":"USA",
		"birthdate":"12/31/2000",
		"reg_date":"01/31/2018",
		"first_sale":"02/04/2018",
		"last_sale":"03/18/2018",
		"last_active":"06/21/2018",
		"top_category":"|ID|ID|ID|",
		"top_brands":"|ID|ID|ID|",
		"top_products":"|ID|ID|ID|",
		"bonus_points":"123"
	}
]

As already written, the only mandatory information is email and all other information is optional, but important. In general, the rule "the more, the better" applies, but also "nothing should be exaggerated". The data source should therefore receive the maximum available data on recipients that can be used for your current as well as future email campaigns. This example contains the following data and their roles:

email – without email there is nowhere to send emails, so clearly a necessary information. However, if your system generates a data source containing records about all clients, including those for which you don't have an email contact, nothing happens - Mailkit will silently ignore these records as well as records with addresses in invalid format.
client_id – internal identifier of the customer, e.g. client number or customer card number
first_name and last_name – name and surname of the customer, which is very important not only for possible addressing in the email, but also affects the delivery of messages, as it is used in the recipient's address. The name and surname in the sent email, instead of a email address only, work better and for some antispam filters it is also an indication of the relationship with the recipient. If your system cannot generate first and last name separately, you can use the Fullname field to split the First and Last Name fields (the first name must be first).
gender – the gender of the recipient can be used not only for addressing, but also, for example, for segmentation or simple distribution of dual variants of your campaigns – a different design for women and a different one for men. The gender value must be in the form m for men and f for women. The importance of knowledge of the gender of the recipient is often greatly underestimated and at the same time it can be easily obtained by automated methods already during registration, eg using the service genderize.io
mobile – the recipient's mobile phone in international form can be used to implement SMS campaigns and thus increase your intervention for recipients who do not respond to your email campaigns.
street, city, zip, state, country – street, city, zip, state, country are all data that can be used to segment campaigns, but also to personalize. For example, you can offer recipients to pick up goods at the nearest store or post office in emails.
birthdate – date of birth allows you to implement campaigns with birthday bonuses, birthday cards, etc. The date format should match the format you will then use. To calculate an anniversary, it must contain not only the day and month, but also the year.
reg_date and first_sale – the date of registration and the first purchase of your customer in your e-shop will allow you to implement annual campaigns or segment recipients according to these dates. The date format should match the formatting you'll then use in your campaigns.
last_sale and last_active – knowing the date of the last purchase and recent activity is important to your reactivation campaigns, satisfaction campaigns, and other after-sales campaigns. The date format should match the formatting you'll then use in your campaigns.
top_category, top_brand and top_products – the most popular categories, brands and products will help you segment as well as automate the content of your campaigns. In combination with the product data source, it then enables completely automated insertion into your campaigns of personalized products corresponding to the interests of the recipient from the product data source.
bonus_points – if you have a bonus program for your clients, it is good to show them the status of points in emails and thus remind them of their benefits and possibilities of drawing them.

This is only part of the possible data and each company has different data and each business is specific – that is why Mailkit works with data sources as universally as possible and any expansion will not affect the functionality. On the contrary - if you start with the basic data in the data source and only later enrich it with another one, all you have to do is leave the data source to show the new structure again and pair the new branches of the structure.

Data import from data source

Once you have the data source set up, you must assign the individual branches of the source to the contact fields. Click the Display structure button to assign values or view the current assignment. At this point, the data source is analyzed and its structure and available fields are displayed. After assigning all the required fields, click the Save button. After assigning the fields, it is possible to manually start the data import by clicking the Import button. If this source has been set to use a new recipient list, it will be created (with the same name as the data source) and the data from the data source will be imported in the background.

If you have set up scheduled updates for your data source, there will be regular updates according to this schedule. If you have chosen automatic updating, the update will always take place before the campaign that uses the data source is sent. Keep in mind, that this update may delay the submission of your campaign by several minutes, which may take up to update the data source. In general, we recommend that you prefer a scheduled update to an automatic one.

Using XML & RSS data sources in templates

Setting up XML & RSS data sources for use in templates is very similar to for use in recipient lists, but without the need to assign meanings to individual fields. The values are determined by a string of names in the template, so it is easy to set up any XML or RSS feed.

[% FOREACH data.DS_RSS_EXAMPLE -%]
<div>
<a href="[% URL -%]"><img src="[% ENCLOSURE -%]" alt="[% TITLE -%]"></a>
<a href="[% URL -%]">[% TITLE -%]</a>[% DESCRIPTION -%]
</div>
[% END -%]

The above is an example of the template code for which an RSS feed named EXAMPLE is used. The FOREACH statement creates a loop to parse and find all records. Each of the standard RSS tags is easily solved and embedded in HTML code, which allows data to be output to a template. For more information, see Email templates.

Product data sources

Data sources can also be used to transfer the product offer to Mailkit and then use product information in campaigns. This is where the power of data sources and programmable templates is manifested, which allows you to combine data from multiple sources and completely automate the personalization of content tailored to individual recipients.

For product information, it is possible to use any of the common product feed formats for Heureka, Google Merchant Feed (XML format with RSS 2.0 specification) and other comparators, or to generate your own feed with the necessary information. Because product feeds are very extensive and the speed of work with the data contained in them is important, these data sources are transferred directly to the SQL database and it is still possible to work with them.

To set up a product data source, select the SQL data source type and continue setting up the data source. Select one of the options for the data range:

Full dataset – when updating the data source, all original data in the SQL database in Mailkit are replaced by currently available data from the data source URL (the original content is deleted and replaced with a new one)
Incremental dataset – on the URL from which the data source is being updated, you can post a feed that contains only new or changed items. During the update, new items are inserted and the changed items are updated, the other items remain unchanged in the SQL database in Mailkit.

If you use one of the standard product feed formats, no additional setting options are available.

If you select “Custom feed format” as the source format, you will need to make additional settings. By clicking on “Display structure”, the system performs an analysis, on the basis of which the data type and the length of the longest record are determined for each data. The determination of data types and lengths (for char and varchar types) is always based on the analysis of the first 100 items in the data source.

Data types:

tinyint – integers (-128 to 127)
smallint – integers (-32768 to 32767)
mediumint – integers (-8388608 to 8388607)
int – integers (-2147483648 to 2147483647)
bigint – integers (-9223372036854775808 to 9223372036854775807)
varchar – text type for data of different lengths (product names, labels, URLs, etc.)
char – text type suitable especially for data with a fixed length or approximately the same length (e.g. postcodes, product codes, etc.)
decimal – decimal numbers
boolean

Pay special attention to the char and varchar data types, where the length of the longest record is determined based on an analysis of the first 100 items in the data source, as described above. If there is a record in other items whose length is greater than the set value, the record will be truncated during processing!

It is therefore necessary that the set value corresponds at least to the length of the longest record in the entire SQL data source. Also consider future updates of these data sources so that the set length is always sufficient for all records. However, this length should never be unnecessarily high, as this can affect the total number of columns that can exist in the SQL data source.

It is also important that all items have all fields in the data source (even if they are empty for that item). If a field was missing for the first 100 items, it would not be detected and processed and therefore it would not be possible to work with it.

For further work with the data source, it is also necessary to set the primary key (unique record identifier), and optionally define up to 7 indexed fields.

Once you have completed the necessary settings for the structure, all you have to do is save them, (optionally set up an authorization and/or scheduled update) and finally import the data.

If necessary, do not hesitate to contact our customer support, who will help you with setting up the product feed and its subsequent use.

Delivery feed

Delivery feeds are special data sources that are used to pass on structured information for the implementation of the campaign distribution. While usually the campaign uses a set list of recipients, to which its distribution takes place according to the set rules, in the case of the delivery feed, the campaign is sent only to the emails specified in the data source. This is an alternative to the API call mailkit.sendmail_mass, i.e. a way to receive highly structured data into Mailkit, e.g. from personalization systems or CRM, which are to be processed when sending the campaign. These feeds must then have a strictly defined structure in XML format, which is very similar to the structure of the mailkit.sendmail_mass API call.

<?xml version="1.0"?>
<deliveryFeed>
<feedItem>
<recipient>
<email>recipient email 1(mandatory)</email>
<first_name>First name (optionally)</first_name>
<last_name>Last name (optionally)</last_name>
<gender>M (optionally)</gender>
... other standard recipient fields
</recipient>
<subject>subject (optional)</subject>
<message_data>static message content (optional)</message_data>
<attachment>
<file_url>url (optionally – only for transactional messages)</file_url>
<file_url>url (optionally – only for transactional messages)</file_url>
<file_url>url (optionally – only for transactional messages)</file_url>
</attachment>
<content>

</content>
</feedItem>
<feedItem>
<recipient>
<email>recipient email 2(mandatory)</email>
<first_name>First name (optionally)</first_name>
<last_name>Last name (optionally)</last_name>
<gender>M (optionally)</gender>
... other standard recipient fields
</recipient>
<subject>subject (optional)</subject>
<message_data>static message content (optional)</message_data>
<attachment>
<file_url>url (optionally – only for transactional messages)</file_url>
<file_url>url (optionally – only for transactional messages)</file_url>
<file_url>url (optionally – only for transactional messages)</file_url>
</attachment>
<content>

</content>
</feedItem>
</deliveryFeed>

The recipient data in the delivery feed takes precedence over the recipient data stored in the recipient list and replaces these values during delivery. The delivery feed can not only control the distribution, but also serve as a way to update the list of recipients.