Русский

Anatomy of Banner system

This talk deals with a banner platform and its components and based on our experience of developing at least three high-performance banner systems. The major topics include principles of efficient display of banners, banner view limits and architectural structure and limitations of banner platform. Much attention is paid to synchronization of data between various system components. We also consider some issues related to banner platform statistical component: statistics that banner system can collect; dealing with large amount of data especially on unique clicks; methods of analysis of advertising campaigns efficiency. We also touch upon the issue of false clicks and police tools, system monitoring, as well as support and maintenance of typical banner system.
Who: Artem Volftrub
Where: HighLoad++ 2011
When: October 3-4, 2011

HL_01


Artem Volftrub: Hello!

My name is Artem Volftrub, and I am a project manager at Gramant. Today’s presentation will be devoted to banner systems. Lately, our clients have been regularly coming to our company with questions regarding the creation of banner systems. We’ve already done several large systems, and in other cases, have consulted for our clients. As a result, we’ve gained some experience that we would like to share with you.


Here are some systems which we’ve worked on lately:


value_commerce

• On-line advertisement system with sales tracking • 700,000,000 (seven hundred million) impressions a day • Sales oriented models • Developed completely from scratch


HL_03

• Banner system for IMHO VI • 50,000,000 (fifty million) impressions a day • Solution build based on DoubleClick DART Enterprise • Advanced statistics system


HH

• Banner-ad management system for the HeadHunter company group • 85,000,000 (eighty-five million) impressions a day • Complete development, from formation of requirements to implementation • Media planning tools • Flexible targeting systems • Limits


For starters, let’s look at how a banner system is built on the most basic level. For convenience, we can divide it into three components: the banner server, the advertising campaign management system (UI), and the statistics module.

HL_system

It’s possible to lay out the specifics of the components represented on the slide, because of course, they actually have a much more complicated structure. For example:

HL_04

Never the less, the principle composition of basic modules doesn’t stray from this.

Let’s move away from technical specifications for now and talk about banner systems business models. There are three.

The first business model is the selling of traffic. You have a resource or several resources which generate a lot of traffic. Media ads are placed on these resources. In this case, a highly effective ad (a sale) isn’t required, it’s just important that there was an information stream. That model works for various kinds of promos, brand support, etc. In essence, the key feature of these systems is the large number of impressions.

The next wide-spread kind of business model is the selling of visitor interest. Here, the actual impression isn’t important, but the viewer who sees the ad. The effectiveness of these ad campaigns is in the eventual monitoring the dynamics of sales, which is why choosing an audience must be approached very carefully. Very often in this case, impressions aren’t sold, but clicks.

The third kind, which as far as I know isn’t used in Russia, is when “sales” are sold. In this case, the banner system should be able to trace a purchase and connect it to an impression and a click. Here, impressions and clicks may not be worth anything.

The business model that the banner system works on influences its architecture. In the first case, if we are traffic oriented, then we need to think about capacity. In this case, the impressions’ design should be extremely simple, because the ad itself is what’s important, not the particular visitor who saw it.

The second type of banner system is of course the most prevalent, including in Russia. Here, it’s important for us to choose the visitor who the ad is going to be shown to. This is why it’s necessary that these systems have targeting, a limited number of impressions per visitor, a mechanism for combating false clicks, and, as a rule, a record of unique visitors, which significantly complicates the processing of statistics.

The most interesting component in the third type of banner system is the sales tracking module. Its main difficulty is the need for it to integrate with the advertiser’s site. This is not always possible, and you end up having to look for a different way of getting information on completed transactions. Additionally, billing is fairly difficult because: firstly, you need to calculate the commission for completed transactions; and secondly, advertisers, as a rule, use a fairly complicated model for calculating commission, which takes into account a product’s category, the size of the sale, the visitor’s region, time of purchase and other factors.

Now, let’s take a more detailed look at the technical features of each of these banner system components.

We’ll start with the main component—the banner impression module (the banner server).

I understand that for any system, this is the most highly-stressed component and at the same time is called mission critical component – the module’s ability to work is critical for the entire business.


Which are some requirements for banner servers?

1. Simplicity. The simpler its design, the better, because this reduces the probability of failure and increases speed, which is critical for us.

2. Loose coupling with other system components. Ideally, it shouldn’t depend on other components and certainly shouldn’t break down if external systems fail.

3. Independence from other banner servers. This is very important. It’s necessary that all of your banner servers are completely independent from one another; this influences its scalability.

4. Scalability. This is a basic requirement for banner servers, because if everything’s going well with your business, then capacity will grow fairly quickly. In that situation, you need to be able to quickly scale the system without losing service quality.

5. The “don’t be quiet” principle. In any situation, the banner server should return a response on request. This is necessary so that users or systems expecting a response from your server understand what is going on, even if there is an error and a banner cannot be displayed.


Information about advertising campaigns gets onto banner servers through the process we call synchronization. Advertisers create campaigns in the UI management system. After this, the data synchronizes with the banner servers because as we remember, the banner servers in on themselves are independent and shouldn’t look for information from other system components.

There are two strategies for updating banner servers. One is the «Pull-strategy» in which the server receives information on active advertisements either from the database or from some other intermediary component, API, etc.

HL_10


From our point of view, this isn’t a sound strategy, but it’s used in several situations never the less. The main advantage of this strategy is the possibility of very simply adding another banner server to the system. The only thing required is configuring the server and installing the system code on it. When it starts up, it will receive all of the necessary information on the advertising campaigns and will immediately be ready to serve visitor requests. Additionally, this method allows you to use different banner servers for various tasks. For example, one for displaying pictures, another for displaying flash animations, a third for responding to click requests, etc. Understandably, this complicates their capabilities, but this can be justified in any specific situation.

Never the less, the pull-strategy strongly complicates a banner server’s code, because in this case, the system’s synchronization mechanism should be a part of the banner server. This kind of approach does not allow information to be updated on all the banner servers at the same time from one point. Mainly, it doesn’t allow you to understand if the server is live or if it’s receiving information about changes to the advertising campaign.

The second strategy is the “Push strategy”, which is when we update the whole pool of the banner servers at once through a special synchronization module.


HL_Push


From our point of view, this approach is the best approach because in this case, we immediately get a single monitoring point. If a banner server doesn’t respond during an update, we will immediately know about it. Additionally, we can update all of the servers at the same time and configure updates by schedule and event. This is very important because the synchronization module can keep track of a changes made to the advertising campaign, unlike the banner servers.

It’s necessary to mention one more thing that has to do with the constraints which advertisers place on impressions and clicks for advertising campaigns: limits. Actual limit values are calculated based statistic data. In regards to inputting this information, it’s necessary to update the data on the banner servers in order to avoid over-rotation. In the case of the push-strategy, this can be done much faster.

In our opinion, a slight disadvantage of the push-strategy can be attributed to the need to register every new banner server in the synchronization module. Moreover, the synchronization module represents a single point of failure. However, it is fairly easy to configure monitoring on the synchronization module and it also supports automatically restarting after errors.

When talking about synchronization, it’s necessary to say that there are two forms of synchronization: complete and incremental.

Complete synchronization, which can be inferred by its name, updates all of a banner server’s content, whereas incremental synchronization only updates data that has changed since the last update. Incremental synchronization saves traffic and takes less time, but is more complicated in practice. We advise starting with complete synchronization, which is required in any case, and then adding incremental synchronization to the system if the update speed or amount information being transferred causes bottlenecking.

I’ll say a few words about storing media-data. There are two types of storage.

The first is when all of the information is saved to the actual banner servers’ file system. This is a fairly simple option which even grants it independence. During synchronization, we load banner files onto the server and store them a certain way. In this case, you can use standard UNIX systems.

The second option is separate storage. In this case, we save all of the content onto a separate machine, communicating with it via a standard protocol (ftp, webdav, etc.). The only additional work required is supporting the necessary protocol in the synchronization module, which however, is quite simple. An additional plus to that approach is the reduction in the amount of data from the banner servers, the ability to cluster storage or use a CDN, which provides fast content delivery to the user; this is especially vital in the case of video ads.

Let’s now move on to the topic of limits. This topic always pops up when we talk about a system which “sells audience interest”. The advertiser is interested in their ad being seen by the most possible potential clients, which is why the advertiser wants to limit the number of ad impressions for each visitor, as well as implement a daily impressions limit, in order to ensure the specified duration of an advertising campaign. The banner system in this case should very carefully track impressions’ dynamics and disable an ad campaign that has exhausted the limit, because every individual impression will not be by a paying customer. Naturally, managers don’t want there to be any unnecessary impressions at all. Developers understand that this will strongly complicate the system’s architecture and are beginning to bargain with them, finding out what delay is acceptable, and whether or not it’s possible to come to some kind of compromise.


It’s important to understand that there are two types of limits: limits, which are implemented on the ad campaign level (the number of impressions or clicks in a day and general number of impressions or clicks in the ad campaign) and limits which are implemented on the visitor level (displaying a specific banner no more than N times to every visitor). Different kinds of limits demand different approaches to implementation.

The first approach to implementing a limit is updating through statistics. In this case, we “rotate” an ad, gather statistics, see how we many we’ve rotated, and periodically transfer this information to the banner server. If the banner server sees that the limit was exhausted, it disables that banner. This approach simplifies the banner server’s work and decreases request processing time. This is because it doesn’t require receiving additional information about the actual limit values; those are transferred during the synchronization process.

Unfortunately, this approach cannot guarantee the absence of “over rotations”, because real-time statistic updates do not work. Never the less, we recommend this approach because, in our opinion, it provides a reasonable compromise between the difficulty of implementation and the errors (over rotations) that come up.

The second approach to implementing limits is using the public cache (key – value storage) for storing information on actual limit values. This approach allows us to have actual data, meaning we receive the present limit value and increase the impressions counter the moment a request is processed. All of the banner servers work with one storage system (distributed storage is possible) receiving information. This approach saves us from over-rotations, but adds an additional point of failure: data is saved to the memory and will be lost if there is any failure. An additional problem is the need to synchronize data in the cache with statistics; this data may diverge at any point for a variety reasons.

We recommend using a hybrid approach: use statistics and transfer this information to the banner server to calculate advertising campaign limits; use the cache for the unique visitors limit. With this approach, it is absolutely necessary to anticipate a situation with a cache failure; the banner server should not crash in this event.

Let’s talk a little about targeting. With this, everything’s fairly simple, but there are two things which we need to consider. The first is that according to our estimates, targeting takes a large part of the request processing time on the side of the banner server. This is why you need to use not only a full search, but a more complicated algorithm; for example, we use bitmasks for quickly locating a banner that meets specific conditions.

The second thing is that you need to try and limit the number of targeting parameters, because the more parameters you have, the more difficult it is for them to make a selection.

You certainly need to agree on whose side the values of targeting parameters are defined for specific requests. Under no circumstance should you allow a situation where they tell you, “a request from a visitor came in, there’s an IP-address. Define the region with the IP-address and target the region.” This is a really bad situation. A banner server should receive already prepared targeting parameter values at the time of the banner impression request. Otherwise, valuable time will be spent otherwise.

Let’s talk about the events that the banner server registers. There are two wide-spread mistakes which we regularly run in to. The first is that information about an event piles up in the banner server’s memory (because it happens so fast), and the second is that information about impressions and clicks is immediately sent to the database at the moment of a request.

The first situation is terrible in that if the banner server crashes, all of the collected information will be lost. The second situation is even worse, because not only the banner server can crash, but the database or the channel between them can, too. In this case, it’s entirely unclear how to register events. Additionally, addressing the database is fairly costly in terms of the operation’s time and resources, even if it happens asynchronously and doesn’t influence the request processing time.

The most appropriate approach is the following sequence: memory—disk file—rotation. Firstly, when accumulating a specific number of events, we gather the statistics onto the memory; then, we dump them onto the disk in order to reduce the number of disk operations; when we accumulate a specific number of events on the file, we rotate it; afterwards, registered events become available for processing by the statistics module. And, of course, there’s no need to perform any preliminary statistics processing on the banner server. This should be done by a separate system module which aggregates the statistics from all of the banner servers.

Here are some of the business problems we periodically get in our practice from managers.

How do you create an even distribution of impressions throughout the day? In this instance, the absence of targeting and limits make this fairly easy, based on the known capacity of the advertising space—the average number of every day incoming impression requests. If targeting and limiting are present, the task is performed based on statistical data.

How do you automatically change the weight of a banner in order to generate the necessary number of impressions or show more effective banners (banners with a high CTR—click through rate)?

In this event, it’s very important to look not only at the current CTR value, but the dynamics for a period of time (CTR for a period of time) so that very old data doesn’t influence the selection.
Let’s move on to statistics.

There are two fundamental types of statistics: simple statistics based on ad space (displays, clicks, CTR), and statistics based on unique visitors,when each visitor is assigned a unique identifier, which aggregates statistics. Presently, statistics based on unique visitors are always required; in any case, it’s been required in every system we’ve made, although this has created a lot of processing problems.

If you look at the correlation of data in a system that processes 50,000,000 impression requests every day, the amount of statistical data based on unique visitors is 2,000 times bigger than the amount of simple statistics. This happens because data without visitor identifiers easily aggregates (collapses) by advertising space in one line in the database. This cannot be achieved with unique visitors. An everyday increase in the amount of data in this case makes up 30-40 thousand recordings for simple statistics and 6-8 million for statistics on unique visitors. That amount of data requires additional manipulation of the database, for example, regular partitioning, archiving old data, etc. In the opposite situation, the system quickly loses functionality.

Why then do we all need statistics on unique visitors?

The thing is that these statistics allow us to receive a series of interesting reports which are useful for analyzing an ad campaign’s effectiveness. For example, we can see the frequency response—find out how an advertising campaign’s audience has expanded since it was launched. Another example is the ability to analyze the intersection of audiences from different platforms, by that I mean tracing the presence of one visitor on different platforms. Besides reports, many banner systems offer additional services like post-click analysis, which lets us trace a visitor’s actions on the advertiser’s site after they got there by clicking on a banner. Using post click analysis, you can trace the amount time the visitor spent on the site, the number of sites they viewed, the depth of their browsing, as well as specific actions, like adding products to the shopping cart or submitting orders.


Let’s examine a few things connected with statistic processing:

1. Statistic processing modules are highly desired in a separate component, situated on a separate server (joint statistic processing). This component’s tasks will be collecting logs from the banner servers, data filtering, summation and uploading to the database.

2. For uploading data into the DMBS, it’s better to use a standard method (SQL Loader Orcale, COPY in PostgreSQL etc.), which significantly shortens load time. For this, it’s possible that preliminary statistic processing will be required, converting it to a specific format.

3. Simple statistics and statistics on unique visitors need to be saved in different databases (or on different partitions), preferably on physically different machines.

4. In the event of a large quantity of data, denormalization and a display of the data that grants it the fastest design may be required for each report.

5. Reports on unique visitors need to be compiled in the background, saving only the final results.

6. It’s important to reduce the number of reports generated on unique data. The best option will be if these reports are automatically generated by the system once a day, and not per the user’s request.

Now let’s talk a little about monitoring. The main benefits that monitoring gives your system are well known:

Capacity forecasting

Early-stage problem diagnostics

Identification of the kinds of problems which allows you to create standard solutions, simplifying the support system.

HL_34

In our experience, we use three kinds of monitoring:


Monitoring on a physical level—availability of the server, SPU, memory, free hard disk space, and other parameters.

Monitoring on the application level—HTTP responses, banner relevance, relevance of statistics, queue sizes, etc.

Monitoring on the business-matrix level—calculating application logic; for example, this could be the average number of processed requests by the server for a period of time in comparison with historical data, the percentage of CTR, dynamics of change in targeting parameter values, etc.


For monitoring tools, we use Nagious, which we have developed a set of necessary criteria for. Aside from this, we use Cacti for trends, Tenshi for logs, graphs for analyzing logs and so forth.

The process is organized in the follow manner: information from applications and servers enter the monitoring system, which generates events based on specific criteria. These events are processed by a team of system operators. The majority of typical events already have “solutions” prepared ahead of time, which allows you to minimize system administrator invention and automate system operations.

In closing, I would like to emphasize one more time that the basis of good support is in advanced system monitoring and the presence of those prepared “solutions”, which allow you to solve the majority of problems that pop up in a system. And of course, it’s worth remembering that important bit about the updating procedure, which allows you to avoid a multitude of problems having to do with updating the system.

Thank you for your attention!

If you ever have any questions, please come to us!

pishite@gramant.ru


Адрес: 119021, Moscow, Russia, Pugovishnikov pereulok, 11, office#4
Park Kultury, Frunzenskaya
Ask a questionClose
Your name
Your email address
Your question

Fill up the captcha
Submit