When developing notebooks in AI Workbench, optimizing performance is key to efficiency. You may encounter issues like long runtimes or memory limitations. This article outlines strategies to improve processing speed and manage memory effectively, ensuring your notebook scales smoothly.

Before you begin

Make sure you have the Notebook editor permissions.
Set up at least one AI Workbench notebook.
Reference the BlueConic Python API documentation.

Avoid retrieving profile properties you don't need

Use the properties parameter of the get_profiles method to retrieve only the profile properties you are using in your model.

segment_id = bc.get_blueconic_parameter_value("Segment", "segment")
profile_property_id = bc.get_blueconic_parameter_value("Profile property", "profile_property")for profile in bc.get_profiles(segment_id=segment_id,
                               properties=[profile_property_id],
                               progress_bar=False):
    # do something with the profile property values
    value = profile.get_value(profile_property_id)

Remove profiles that are no longer relevant

Use filters to apply on top of your existing segment configuration.
1. Example: If an RFM notebook only considers orders from the past year and a profile’s RFM scores are all 1 with no recent orders, the scores remain unchanged, eliminating the need for an update.

from datetime import datetime, timedelta
from dateutil import relativedelta# store datetime.now() in a global variable
# so that the value is the same across the execution
NOW = datetime.now()segment_id = bc.get_blueconic_parameter_value("Segment", "segment")
last_order_date_property = bc.get_blueconic_parameter_value("Last order date property", "profile_property")rfm_recency_property = bc.get_blueconic_parameter_value("RFM Recency property", "profile_property")# the last order date has to be in the last year
ONE_YEAR_AGO = NOW - timedelta(days=365)
last_order_date_filter = blueconic.get_filter    (last_order_date_property).in_range(min=ONE_YEAR_AGO)# ... or the RFM recency has to be higher than 1
rfm_recency_filter = blueconic.get_filter(rfm_recency_property).in_range(min=2)# retrieve all profiles that are part of the configured segment
# and match at least one of the filters
for profile in bc.get_profiles(segment_id=segment_id,
                               properties=[last_order_date_property],
                               required_properties=[last_order_date_property],
                               filters=[last_order_date_filter, rfm_recency_filter],
                               progress_bar=False):
    last_order_date = profile.get_value(last_order_date_property)
    time_since_last_order = relativedelta.relativedelta(last_order_date, NOW)
    
    # recency is the max of "10 – the no of months since the customer last purchased" and 1  
    recency = max(10 - time_since_last_order.months, 1)

Remove profiles that have not changed since the last successful execution

In a lead scoring model based on actions (e.g., subscribing, requesting a demo), scores only change if related profile properties update. The notebook can skip unchanged profiles since the last run.

Use the get_executions to retrieve the last few executions of the current notebook.
Retrieve all profiles that have changed since the last successful execution by using the start_date of the last successful execution as a filter on the lastmodifieddate profile property.
If the profile properties you are using in your model are all filled by web behavior, you can use the lastvisitdate profile property instead of the lastmodifieddate profile property.

# Returns the start date of the last successful execution of this notebook
def get_last_successful_execution_start_date():
    for execution in bc.get_executions(count=10):
        if execution.state == "FINISHED":
            return execution.start_date
    return Nonesegment_id = bc.get_blueconic_parameter_value("Segment", "segment")
profile_property_id = bc.get_blueconic_parameter_value("Profile property", "profile_property")# use the last successful execution of this notebook
# to add a filter based on the "lastmodifieddate" profile property
filters = []
last_successful_execution_start_date = get_last_successful_execution_start_date()
if last_successful_execution_start_date is not None:
    lastmodifieddate_filter = blueconic.get_filter("lastmodifieddate").in_range(        min=last_successful_execution_start_date    )
    filters = [lastmodifieddate_filter]# retrieve all profiles that are part of the configured segment
# and match the filters
for profile in bc.get_profiles(segment_id=segment_id,
                               properties=[profile_property_id],
                               filters=filters,
                               progress_bar=False):
    # do something with the profile property values
    value = profile.get_value(profile_property_id)

Avoid unnecessary profile update calls

If your notebook updates a profile score (e.g., engagement or propensity), compare it to the existing score to decide if an update is needed.

segment_id = bc.get_blueconic_parameter_value("Segment", "segment")
engagement_score_property = bc.get_blueconic_parameter_value("Engagement score property", "profile_property")with bc.get_profile_bulkhandler() as bulk_handler:
    for profile in bc.get_profiles(segment_id=segment_id,
                                   properties=["visits", "clickcount",                                    engagement_score_property],
                                   progress_bar=False):        # calculate a custom engagement score
        visits = profile.get_value("visits")
        pageviews = profile.get_value("clickcount")        previous_engagement_score = profile.get_value(engagement_score_property)
        new_engagement_score = pageviews / visits
        
        # check if the new engagement score if different from the previous engagement score
        # and if so, update the profile
        if new_engagement_score != previous_engagement_score:
            profile.set_value(engagement_score_property, new_engagement_score)
            bulk_handler.write(profile)

Avoid retrieving the same profile twice

For scenarios requiring aggregate calculations across segments or profiles, an initial approach might involve multiple get_profiles calls (e.g., per segment or for model training and application). A more efficient method is to retrieve all profiles at once and store them in memory (e.g., a Pandas DataFrame) or on disk (e.g., CSV or SQLite) for faster processing.

Use online algorithms

Processing large profile datasets in Python can lead to memory issues. Online or out-of-core algorithms process data in small batches to prevent this.

Use RunStats for mean, variance, standard deviation, skewness, kurtosis, min, and max calculations.
Use tdigest to estimate percentiles and quantiles.
For machine learning, scikit-learn offers out-of-core algorithms.

Example: Out-of-core percentile estimation for an RFM calculation

Instead of using months since the last order for "RFM frequency," a more advanced approach uses percentiles to balance bucket sizes. However, calculating percentiles for all profiles is memory-intensive, so we use the tdigest library for estimation, requiring two data passes.

Retrieve all profiles and update the T-Digest data structure, storing the profile data in a CSV file.
Use the T-Digest data structure to update the "RFM frequency" profile property values.

# install the tdigest library
!pip install --quiet tdigestimport csv
from datetime import datetime, timedelta
from tdigest import TDigest# store datetime.now() in a global variable
# so that the value is the same across the execution
NOW = datetime.now()segment_id = bc.get_blueconic_parameter_value("Segment", "segment")
last_order_date_property = bc.get_blueconic_parameter_value("Last order date property", "profile_property")csv_filename = bc.get_cwd() + "profiles.csv"
columns = ["profile_id", last_order_date_property]# percentile estimation
number_of_days_since_last_order_digest = TDigest()with open(csv_filename, "w") as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(columns)
    
    for profile in bc.get_profiles(segment_id=segment_id,
                               properties=[last_order_date_property],
                               required_properties=[last_order_date_property],
                               progress_bar=False):
        last_order_date = profile.get_value(last_order_date_property)
        number_of_days_since_last_order =           round((NOW - last_order_date).total_seconds() / SECONDS_IN_DAY)        # write the profile ID and number of days since the last order to a file
        # for later processing
        csvwriter.writerow([profile.id, number_of_days_since_last_order])        # update the T-Digest data structure to estimate the percentiles
        number_of_days_since_last_order_digest.update(number_of_days_since_last_order)number_of_days_since_last_order_digest.compress()    
    
rfm_recency_property = bc.get_blueconic_parameter_value("RFM Recency property",   "profile_property")# read the CSV file and use the T-Digest data structure to update the RFM recency
with bc.get_profile_bulkhandler() as bulk_handler:
    with open(csv_filename) as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            profile = blueconic.Profile(row["profile_id"])
            number_of_days_since_last_order = int(row["number_of_days_since_last_order"])
            
            # the RFM recency is based on the cumulative distribution of the recency values
            recency = math.ceil(number_of_days_since_last_order_digest.cdf                (number_of_days_since_last_order) * 10)
            
            # update the profile
            profile.set_value(rfm_recency_property, recency)
            bulk_handler.write(profile)

Update notebook code to retrieve large Timeline events

To further improve the performance of notebooks that retrieve a high volume of large Timeline events, make a small change to your notebook code to add the “event_properties” parameter to your TimelineEventsFilter:

bc.get_profiles(    segment_id = SEGMENT_ID,   # only retrieve the profile properties you are interested in   properties = ["email_open_time"],    timeline_events_filter = blueconic.TimelineEventsFilter(       # filter on the specific timeline event types you are interested in       event_type_ids = ["email_opened"],       # only retrieve timeline event properties you are interested in       event_properties = ["subject"]   ),   count = 1)

AI Workbench Overview

Model Recency, Frequency, and Monetary Value (RFM)

AI Workbench Notebook - Single cell Insight

BlueConic CDP Use Cases with AI and Machine Learning

Use the BlueConic Python Package in AI Workbench

Optimize AI Workbench Performance