When developing notebooks in AI Workbench, optimizing performance is key to efficiency. You may encounter issues like long runtimes or memory limitations. This article outlines strategies to improve processing speed and manage memory effectively, ensuring your notebook scales smoothly.
Before you begin
Make sure you have the Notebook editor permissions.
Set up at least one AI Workbench notebook.
Reference the BlueConic Python API documentation.
Avoid retrieving profile properties you don't need
Use the
properties
parameter of the get_profiles method to retrieve only the profile properties you are using in your model.
segment_id = bc.get_blueconic_parameter_value("Segment", "segment") profile_property_id = bc.get_blueconic_parameter_value("Profile property", "profile_property")for profile in bc.get_profiles(segment_id=segment_id, properties=[profile_property_id], progress_bar=False): # do something with the profile property values value = profile.get_value(profile_property_id)
Remove profiles that are no longer relevant
Use filters to apply on top of your existing segment configuration.
Example: If an RFM notebook only considers orders from the past year and a profile’s RFM scores are all 1 with no recent orders, the scores remain unchanged, eliminating the need for an update.
from datetime import datetime, timedelta from dateutil import relativedelta# store datetime.now() in a global variable # so that the value is the same across the execution NOW = datetime.now()segment_id = bc.get_blueconic_parameter_value("Segment", "segment") last_order_date_property = bc.get_blueconic_parameter_value("Last order date property", "profile_property")rfm_recency_property = bc.get_blueconic_parameter_value("RFM Recency property", "profile_property")# the last order date has to be in the last year ONE_YEAR_AGO = NOW - timedelta(days=365) last_order_date_filter = blueconic.get_filter (last_order_date_property).in_range(min=ONE_YEAR_AGO)# ... or the RFM recency has to be higher than 1 rfm_recency_filter = blueconic.get_filter(rfm_recency_property).in_range(min=2)# retrieve all profiles that are part of the configured segment # and match at least one of the filters for profile in bc.get_profiles(segment_id=segment_id, properties=[last_order_date_property], required_properties=[last_order_date_property], filters=[last_order_date_filter, rfm_recency_filter], progress_bar=False): last_order_date = profile.get_value(last_order_date_property) time_since_last_order = relativedelta.relativedelta(last_order_date, NOW) # recency is the max of "10 – the no of months since the customer last purchased" and 1 recency = max(10 - time_since_last_order.months, 1)
Remove profiles that have not changed since the last successful execution
In a lead scoring model based on actions (e.g., subscribing, requesting a demo), scores only change if related profile properties update. The notebook can skip unchanged profiles since the last run.
Use the get_executions to retrieve the last few executions of the current notebook.
Retrieve all profiles that have changed since the last successful execution by using the
start_date
of the last successful execution as a filter on thelastmodifieddate
profile property.If the profile properties you are using in your model are all filled by web behavior, you can use the
lastvisitdate
profile property instead of thelastmodifieddate
profile property.
# Returns the start date of the last successful execution of this notebook def get_last_successful_execution_start_date(): for execution in bc.get_executions(count=10): if execution.state == "FINISHED": return execution.start_date return Nonesegment_id = bc.get_blueconic_parameter_value("Segment", "segment") profile_property_id = bc.get_blueconic_parameter_value("Profile property", "profile_property")# use the last successful execution of this notebook # to add a filter based on the "lastmodifieddate" profile property filters = [] last_successful_execution_start_date = get_last_successful_execution_start_date() if last_successful_execution_start_date is not None: lastmodifieddate_filter = blueconic.get_filter("lastmodifieddate").in_range( min=last_successful_execution_start_date ) filters = [lastmodifieddate_filter]# retrieve all profiles that are part of the configured segment # and match the filters for profile in bc.get_profiles(segment_id=segment_id, properties=[profile_property_id], filters=filters, progress_bar=False): # do something with the profile property values value = profile.get_value(profile_property_id)
Avoid unnecessary profile update calls
If your notebook updates a profile score (e.g., engagement or propensity), compare it to the existing score to decide if an update is needed.
segment_id = bc.get_blueconic_parameter_value("Segment", "segment") engagement_score_property = bc.get_blueconic_parameter_value("Engagement score property", "profile_property")with bc.get_profile_bulkhandler() as bulk_handler: for profile in bc.get_profiles(segment_id=segment_id, properties=["visits", "clickcount", engagement_score_property], progress_bar=False): # calculate a custom engagement score visits = profile.get_value("visits") pageviews = profile.get_value("clickcount") previous_engagement_score = profile.get_value(engagement_score_property) new_engagement_score = pageviews / visits # check if the new engagement score if different from the previous engagement score # and if so, update the profile if new_engagement_score != previous_engagement_score: profile.set_value(engagement_score_property, new_engagement_score) bulk_handler.write(profile)
Avoid retrieving the same profile twice
For scenarios requiring aggregate calculations across segments or profiles, an initial approach might involve multiple get_profiles
calls (e.g., per segment or for model training and application). A more efficient method is to retrieve all profiles at once and store them in memory (e.g., a Pandas DataFrame) or on disk (e.g., CSV or SQLite) for faster processing.
Use online algorithms
Processing large profile datasets in Python can lead to memory issues. Online or out-of-core algorithms process data in small batches to prevent this.
Example: Out-of-core percentile estimation for an RFM calculation
Instead of using months since the last order for "RFM frequency," a more advanced approach uses percentiles to balance bucket sizes. However, calculating percentiles for all profiles is memory-intensive, so we use the tdigest library for estimation, requiring two data passes.
Retrieve all profiles and update the T-Digest data structure, storing the profile data in a CSV file.
Use the T-Digest data structure to update the "RFM frequency" profile property values.
# install the tdigest library !pip install --quiet tdigestimport csv from datetime import datetime, timedelta from tdigest import TDigest# store datetime.now() in a global variable # so that the value is the same across the execution NOW = datetime.now()segment_id = bc.get_blueconic_parameter_value("Segment", "segment") last_order_date_property = bc.get_blueconic_parameter_value("Last order date property", "profile_property")csv_filename = bc.get_cwd() + "profiles.csv" columns = ["profile_id", last_order_date_property]# percentile estimation number_of_days_since_last_order_digest = TDigest()with open(csv_filename, "w") as csvfile: csvwriter = csv.writer(csvfile) csvwriter.writerow(columns) for profile in bc.get_profiles(segment_id=segment_id, properties=[last_order_date_property], required_properties=[last_order_date_property], progress_bar=False): last_order_date = profile.get_value(last_order_date_property) number_of_days_since_last_order = round((NOW - last_order_date).total_seconds() / SECONDS_IN_DAY) # write the profile ID and number of days since the last order to a file # for later processing csvwriter.writerow([profile.id, number_of_days_since_last_order]) # update the T-Digest data structure to estimate the percentiles number_of_days_since_last_order_digest.update(number_of_days_since_last_order)number_of_days_since_last_order_digest.compress() rfm_recency_property = bc.get_blueconic_parameter_value("RFM Recency property", "profile_property")# read the CSV file and use the T-Digest data structure to update the RFM recency with bc.get_profile_bulkhandler() as bulk_handler: with open(csv_filename) as csvfile: reader = csv.DictReader(csvfile) for row in reader: profile = blueconic.Profile(row["profile_id"]) number_of_days_since_last_order = int(row["number_of_days_since_last_order"]) # the RFM recency is based on the cumulative distribution of the recency values recency = math.ceil(number_of_days_since_last_order_digest.cdf (number_of_days_since_last_order) * 10) # update the profile profile.set_value(rfm_recency_property, recency) bulk_handler.write(profile)
Update notebook code to retrieve large Timeline events
To further improve the performance of notebooks that retrieve a high volume of large Timeline events, make a small change to your notebook code to add the “event_properties” parameter to your TimelineEventsFilter:
bc.get_profiles( segment_id = SEGMENT_ID, # only retrieve the profile properties you are interested in properties = ["email_open_time"], timeline_events_filter = blueconic.TimelineEventsFilter( # filter on the specific timeline event types you are interested in event_type_ids = ["email_opened"], # only retrieve timeline event properties you are interested in event_properties = ["subject"] ), count = 1)