Skip to main content

Set up a Content Collector

Updated this week

To create personalized content recommendations in BlueConic dialogues, you start by collecting a content store, which is a pool of content to be recommended, using the BlueConic Content Collector Connection. This connection collects data about your content and stores it in a BlueConic content store, which feeds personalization in BlueConic.

Making smarter, more personalized content recommendations

You can use the content items collected through the connection in BlueConic dialogues to make smarter personalized content recommendations. BlueConic recommendations are powered by algorithms and filters you set in the content editor toolbar.

How to create personalized content recommendations using personalized BlueConic content recommendations; BlueConic content store

Configuring the Content Collector to collect items for recommendations

To create a Content Collector connection, select Connections from the main BlueConic menu and click the Add connection button. Search for 'content collector' and create the connection.

To set up the connection, select "Collect data from your channels" in the lefthand panel.

How to collect data in BlueConic to use for personalization, content recommendations, and individualized marketing; BlueConic content store

1. Select the channel(s) to collect content data from.
Optionally, define URL rules to specify which areas of the sites to collect content data from.

2. Manage which data is collected.
Next, paste an article URL into the Test URL field, and click 'Test' to review the metadata that would be collected for your content:

How to use the BlueConic content collector to enable individualized marketing and personalization; BlueConic content store

Note: BlueConic supports multiple date formats. Because some browsers interpret date formats differently, using the ISO 8601 date format is recommended. See this documentation for details.

BlueConic requires a number of metadata fields out of the box. You can select the checkbox next to other metadata fields to mark them as also required.

Default required fields:

  • ID, Name, URL, Publication Date, and Type are all required by default.

  • If any of these are not scraped via the configured selectors, the item will not be added to the Product Store.

  • For the Content Collector the page type must equal article.

If required metadata fields are not populated, the webpage will no be scrapped by the content collector. Make sure that the information on the page is in a supported format. The default BlueConic selector will automatically detect:

Providing metadata in one of these standard ways is not only good for the content collector, but it's also good for Google, Facebook, Twitter, and countless other platforms, because it allows these platforms to better understand your content.

Click Default to select an alternative method of retrieving metadata. You can retrieve information from metadata in the HTML, retrieve it from HTML on the page, or using a JavaScript selector.

How to add metadata to content collected for personalization in the BlueConic customer data platform; BlueConic content store

Click Add data field to add custom metadata fields. For example, if your content has a sponsorship association, or is tagged based on overarching stories or topics, influencers, or sources, you can use this data in recommendation placement filtering.

3. Set the algorithm time frame.
Some recommendation algorithms are based on a look-back time frame (for example, Viral articles or recent high CTR). You can configure that time frame here, based on hours or days:

How to set content recommendation algorithm time frames in BlueConic CDP personalization features using the BlueConic content store

For details on how each recommendation algorithm operates, see BlueConic recommendation algorithms.

4. Set request headers (optional).
To enable content recommendations to be collected from webpages that are under development or behind a login or paywall, you can add HTTP request headers here. Click "Add request header," choose a channel, and add the custom header name and value used to access the content.

How do I use request headers to collect content recommendations behind a paywall using HTTP request headers in BlueConic for personalization? BlueConic content store

Content personalization tips

  1. Make sure images are being scraped properly.

  2. Review article names to be sure appendages do not exist, for example " | Site.com" may be appended to every title depending on where it is being scraped from.

  3. Collectors can only retrieve data that is available via the page source. You can review the page source by right-clicking on the web page and selecting 'View Page Source.'

Learn more about BlueConic recommendation algorithms for personalization

For details on the recommendation algorithms that power personalized content recommendations in BlueConic dialogues, see BlueConic recommendation algorithms.


FAQs

What happens if I change my content collector settings?

  • Changing or updating the selector field for your content collector could result in unwanted data duplication. When collecting articles, the content collector checks the content store for a matching ID. If it finds a matching ID, it will update the existing item in the content store with the new metadata. For this reason, we do not recommend changing the ID selector on your content collector, as this can lead to duplicate items being created in your content store.

How can I test which content items are in the collector?

  • In Step 2 of the Collector, you can test to see which values are being returned. Paste a content item's URL into the Test URL field, click 'Test' to review the metadata that would be collected for its content, and then confirm whether all 'Required' metadata fields are returning values. If values are not being returned for required fields, then items are not being collected successfully.

Why is my Content Collector not scraping my data successfully?

Content Collectors can only retrieve data that is present in the original page source (the raw HTML returned by the server). If data is visible when using Inspect (i.e., it appears after JavaScript modifies the DOM) but is not included in the page source, it cannot be collected by the Content Collector.

To check what’s actually available to the collector, right-click the page and choose View Page Source. Only content you see there is eligible for collection.

Important limitations

  • Anything inside <style> or <script> tags is stripped by the collector.

  • We cannot scrape metadata from inline JavaScript objects (e.g., variables embedded in <script>), unless the data is provided in JSON-LD format.

  • Data that renders client-side (after the initial HTML is loaded) is not available to the collector unless it is also present in the original page source.

Why is a specific page not getting collected?

If a specific page isn’t being collected by the Content Collector, it’s usually due to a missing or incorrect required field (for example, the page “type” not matching the collector’s expected value or is missing).

How to diagnose

  1. Install and enable the BlueConic Chrome extension.

  2. Open the page you expect the collector to scrape.

  3. Open the browser console.

  4. Look for BlueConic messages. If the collector is skipping the page because of a required field, you’ll see a message similar to:

    [BC] Item will not be scraped due to incorrect type:: !== article

    This indicates the page’s type does not match the collector’s requirement (in this example, the collector expects article).

How can I collect content from a page that requires authentication using the Content Collector?

  • If the page you want to collect from is behind an authentication process (such as a login or redirect flow), BlueConic Support recommends working with your web team to identify what headers are being set or sent during authentication. Once identified, you can add these headers in Step 4 of the Content Collector settings under "Set request headers". This allows the Content Collector to access and scrape content from staging sites, gated pages, or pages requiring authentication.

Why aren’t my images rendering in recommendations?

BlueConic currently supports PNG and JPG/JPEG images for the image attribute.
If an image is served in another format (for example WebP, AVIF, SVG, GIF), the Content recommendations will not be able to render it.

Avoid dynamic “auto-format” transformations

Many CDNs and image services change the file type on the fly based on the user agent or query parameters. Examples include parameters like:

  • auto=format, f=auto, fm=auto, format=auto, tr=f-auto

These can cause the image URL to return WebP/AVIF to BlueConic, resulting in file-type mismatches and missing images.

Did this answer your question?