HeadHunter: Banner advertising management system

Description

The HeadHunter corporate group has been operating in the on-line recruiting market since 2000. There is presently a wide variety of projects in the company’s portfolio including Career.ru – the leading youth Internet-portal dedicated to job searching, career planning and career consulting services, where anyone wishing to can receive personal answers to questions related to careers and other topics.

HeadHunter.ru hosts more than 2 million unique visitors each week and is the largest Russian Internet resource for job searching and hiring personnel and the high visitor numbers present a great opportunity to boost the company revenues and achieve their growth objectives. Further to a lengthy evaluation process with many different companies Gramant were selected due to their extensive knowledge of the adtech sector.

Head Hunter’s initial requirement however was not just a software development company as they were not at all certain of their requirements. They knew that they needed to make the most of the opportunity offered by the high number of unique visitors and that if they failed to do so than they risk losing out on a valuable opportunity. For company growth.

Gramant’s first task was to work closely with the stakeholders to gain a thorough understanding of their business requirements which could then be developed into a more formal and detailed specification. The value add and the main reason for the success of the project was attributed to this Consulting service which enabled Head Hunter to form a crystal clear vision of their goal. The consequent development of the system enabled them to achieve that goal and reap the substantial rewards of the increased revenues that resulted from this project.

Major Features

  • Simple scaling
  • Operability under high load conditions (the first version of the system was based on the projected traffic of 85 million hits a day)
  • Convenient media planning tools, allowing the selection of ad space with regard to occupation, as well as automatic site selection for ad campaigns within defined parameters
  • Detailed statistics analysis system which allows one to analyze ad campaign effectiveness and research an ad space’s audiences
  • Flexible configuration of settings for displaying ads to the end user (targeting)
  • Fast acting. The time to issue a banner upon request should not exceed 300 milliseconds, taking into account the time to analyze a user’s profile and select the most appropriate banner.

Technologies

  • Java, Grails
  • PostgreSQL
  • Memcached
  • WebDav

Interview

Development Trends of Banner Technology

One of our company’s main areas of activity is developing on-line ad systems. Since Gramant’s beginnings, we’ve completed several projects in this field. In recent years, the boom in RTB technologies, which has been developing in the West for 5 years already, has reached Russia. Presently, we, as specialists in the field of on-line advertising, are taking part in the development of several systems in the SSP/RTB/DSP segment.

Today, Gramant’s executive director, Anatol Filin, met with Artem Volftrub, who is heading a large number of our company’s banner projects, in order to discuss our experience with banner systems, as well as the global trend in their general development.

 

Anatol Filin: Hello, Artem!

If you recall our company’s history, our very first project was the development of a large affiliate banner network for the Japanese company ValueCommerce. We made a successful system and at some point, the client became an IPO (initial public offering). Presently, ValueCommerce successfully operates and its shares are traded on the Tokyo Stock Exchange.

Also, the banner-ad statistics system which was made for a DoubleClick engine for IMHO VI became one of our most significant projects. This is a fairly important American engine; the DoubleClick business itself was purchased several years ago by Google. The actual engine suited everyone; the front end and the statistics especially were quite primitive, which is why we wrote it from scratch. This project was actively developed, and in the versions that followed, functionality for advertising agencies was added and a post-click analysis module, which allowed them to trace user behavior after they were transferred to the advertiser’s site, was implemented.

Presently, we are developing a fairly large system. This isn’t the first year the project has been in place and is continuing to develop – it’s a banner system for well-known RUnet company HeadHunter. HeadHunter performs recruiting and many sites are part of it: the principal site, with a large number of visitors and a large number of displays; regional sites; and various other projects (carera.ru, portal education, and others). Money is earned on all HeadHunter platforms, and this doesn’t only happen by means of the company’s main business, but because of advertisements. Presently, our banner system serves all sites in the HeadHunter Group, as well as the site Rabota.Main.ru, which was built on the HeadHunter engine.

We have a fairly large amount of analytical experience in the field of banner systems; we made Specifications for video-banners as well as several systems related to contextual advertising. Besides that, we have several projects we can’t talk about now because they’re still in the development stage.

In practically all of the previously mentioned projects, you, Artem, served as a director. Thus, you are the ideal person to tell us about what we’ve learnt and where banner-ad technologies are generally headed.

Well, the banner system we made for ValueCommerce stands out as the largest and highest-loading; at that moment, when we practically worked it out, there were about 1 billion displays a day. There are different banner projects in our “back pocket”: they differ in scale and specifications. Naturally, our clients will be interested in the following question: how much of our experience as banner system developers can be used for their specific projects?

Artem Volftrub: Hello, Anatol!

Let’s talk firstly about the banner systems created for ValueCommerce and HeadHunter, because we created these two projects completely from the ground up and up to the point of industrial operation.

The banner system created for ValueCommerce has a slew of unique features. Firstly, the amount of data present in this banner system is not present in any other existing Russian banner system. This, of course, applies specific limitations to the system’s architecture. Additionally, the company’s business model was based on the idea of commission from end-sales (CPA), and not on displays or clicks, which is usually the case in Russia today. The system tracks the fact of a completed purchase on the advertiser’s end-site, and then connects it with the click on a specific banner on a specific site in order to then pay commission to the site and receive its percentage. The company’s revenue directly depends on the performance of this algorithm.

It was profitable to show the site that displays an advertisement to those users who could maximally increase the number of end-sales. That is to say, the problem of selecting a banner that would be shown to a specific visitor was shifted to the side of the platform, and in this way, the very system’s targeting was minimum. Factually, it consisted of choosing the categories of sites and an advertiser even before the start of the advertising campaign.

Besides that, it’s worth saying a few words about the existing billing in the ValueCommerce system. It was quite complicated since, at the completion of the accounting period, it was necessary to take into account all of the events, submit the calculation to the advertisers, and even pay commission to the site. The size of the commission was not fixed and depended on many factors: site categories, type of product, size of sale, etc. Additionally, products that were returned to the store, excluding purchases based on commission, needed to be factored.

Anatol: Billing is one of the most important components of any system; money is transferred through it. However, it seems to me that we’ve digressed from the description of the banner structure to the back-office. Let’s talk about other systems. What are the specifics of the banner system created for HeadHunter?

Artem: If you look at the banner system created for HeadHunter, it’s orientated just on the sales of displays and clicks (CPM, CPC), and in this way it differs dramatically from the ValueCommerce system.

In the event of selling displays (CMP) the platform’s general traffic (number of visitors) is important; in the event of selling clicks (CPC), the ratio of clicks to the number displays is important (CTR – click through rate), meaning that in this way, the effectiveness of the platform is evaluated by the guest, which is why it’s important to show ads to those who would be most interested; here is where targeting comes into play.

So, all of the pages on HeadHunter can be divided into two kinds: pages with a lot of anonymous visitors, like the main page, and pages where the system can already define some data on the user, like the job search page. In the first case, we don’t know anything about the visitor, except maybe their geographic location. On these pages for the most part, media ads are sold, for which the general audience size (scope) is much more important than the ratio of displays to clicks.

In the second case (the job search page), the system knows what specifications the visitor is interested in, what city they are looking for work in, what salary they are looking for, etc. This information is extremely useful for targeting ads, because if the visitor sees a banner with a vacancy which falls under their given criteria, there is a good chance that they will click on it.

Anatol: Everything’s more or less clear about targeting and the business model. But targeting is more like an addition to the system’s core. Are there any essential differences in the display engine?

Artem: Yes, the actual display engine, or Banner Engine, as we call it, is similar in both systems. Of course there are certain differences, but the actual mechanism for processing display requests, choosing banners and calculating statistics is very similar. A request containing the advertising space’s parameters, the identifier, information on the user, if there is any, and any other parameters goes to the server, the request is processed and the server returns a banner’s prepared html-code or metadata containing information about where to load an image from, where to send the user if they click it, how to fix the fact of the banner impression, etc, in response.

Anatol: If we talk about banner systems in general, what elements should be present?

Artem: If we’re not considering complicated cases, like the RTB-systems, which have been popular lately, banner systems can be divided into four main elements. The banner server itself (banner engine), which is in charge of banner impressions and recording events (displays and clicks); the advertisement campaign management module, a web interface which allows you to create advertisement campaigns, view statistics, evaluate the effectiveness of previous campaigns, create new advertising sites, etc.; the statistics collection and processing module, registered by the banner servers, which summarizes received data and prepares data for assembling reports; and finally, the fourth element, the module which is in charge of synchronizing advertising campaign data between the management system and banner servers, starts and stops campaigns.

Anatol: Let’s talk about the technology we use in banner systems. Conversations with potential clients often begin with the question, “What are you going to write in?” There exists a wide-spread opinion that programs which should work extremely quickly need to be written in C or C++. From my point of view, I can say that the first banner engine which I took part in the writing of was written in Perl. Then, after some time, when loads began to increase, Perl could no longer cut it. Everything that we do now in the field of banner systems is written in Java. Artem, as a development head, do you have any preference in terms of choosing a programming language? Can we say that there is an optimal language for writing banner systems, or does it not matter what they’re written in?

Artem: I don’t think this question has a direct correlation with banner systems; you can apply it to any high-load project. There are several factors which need to be considered when selecting technology. The most fundamental is what the most optimal language which the team developing the system can write in quickly and with high-quality. One more factor is the cost of hosting and maintenance. You can write a code that will work 3%, 5%, or even 10% faster, but an equivalent code in another language will be several times higher; the reason being that in this case, the cost of maintenance really includes looking for employees, the time necessary for making changes, difficulties in testing, etc.

An important criteria for a banner system is the ability to scale. If the server we wrote easily adjusts to scale and is relatively easily modified, then the programming language falls to the background.
If we speak about the banner systems we’ve made, then they’ve all been developed in Java. They work perfectly and easily adjust to scale. The system made for ValueCommerce “processed” about a billion events a day; in the case of the HeadHunter system – about a hundred million displays. This is quite a serious number.

Anatol: Still, if we look at the efficiency of an individual server, then how does the selection of a programming language influence it? Could we say, for example, “C is 20% better than Java”? Or is it the other way around?

Artem: In order to answer this question, I need to “write” two identical servers – one in Java, and the other in C, and test them with the same data. I don’t think anyone has ever done this, and it’s impossible, because the end code will still be different. From my point of view, you always need start from specific tasks, like processing banner displays for a maximum of 100 milliseconds. This particular task can be done in Java like it can in C. I suspect that it can also be done in PHP and Perl. Still, you cannot forget about such indicators, like the number of display requests at one time (usually within one second).

Anatol: So, when choosing a programming language for banner systems, it’s important to base your decision on all factors. Among these factors are the demands for efficiency, qualified developers and their cost, maintenance cost, which is the given case isn’t just server management and data monitoring, but also system development and updating.

Artem, tell us what operating system a banner system should work in: Windows? Unix?

Artem: I haven’t seen a high-load system working on a Windows server in a long time. Usually, if a system works in Windows, it’s related to some kind of corporate regulation: a political campaign, licensing, or discounts given by Microsoft. I don’t know statistics on Fortune 500 companies, but if you take a major Russian Internet-business that we’ve run into as a company, then they all use Unix-like operating systems for industrial use.

Anatol: Well, I’m afraid Microsoft advocates are coming our way :). Okay, let’s talk a little bit about developing banner technologies. If you look at our company’s latest projects, and in general at how the “banner eco-system” is evolving, then we’ve observed that in the past five years in the West, and about the last two years in Russia, there’s been a dramatic complication in banner infrastructure. There’s a transition in the direction of technologies under the “code name” Real Time Bidding.

Because we’ve already developed RTB systems, it would be interesting to discuss their main components with you, as well as how they fit into the new global perspective. Additionally, I’d like to understand how the banner systems we’ve previously developed fit into the new global perspective.

Artem: Let’s start with the fact that in the world today, where major Internet players have already emerged, making a new banner system that is able to attract a large number of clients from nothing is not a simple task. The main difficulty consists not only of technology, but in attracting traffic (site), which there is always a large shortage of, especially now.
If we look at those projects that we’ve taken part in, then the ValueCommerce system was created at a time when the market still hadn’t taken shape; as a result; hundreds of thousands of sites, which provided enough traffic, managed to be connected to the system. A feature of the system we developed for HeadHunter is that they only sell ads on their sites – a fairly narrow number of subject areas. For RUnet, they have a fair amount of traffic, and increasing that amount is not a priority.

Concerning new trends, it’s worth nothing that in recent times, there’s been a division in companies that work with advertisers – agencies and those who provide technology and traffic – negotiating with sites. There is much less, naturally, of the latter. As I’ve already said, the presence of “quality” traffic is one of the most serious problems on the Internet today, which is why a portion of Internet advertising is growing faster than the amount of Internet traffic.

On the other hand, sites, which generate traffic, usually aren’t able to completely sell to advertisers: pauses pop up between advertising companies, traffic with “unpopular targeting” stops, etc. Naturally, sites want to sell all their traffic, including residual traffic. If you put together all of the residual traffic from several sites, you end up with a fairly large amount.

So here is where RTB-systems enter the market. They direct residual traffic by carrying out auctions between advertisers. Connecting to an RTB system is fairly easy, there’s a universal protocol, Open RTB, through which information is exchanged.

Anatol: Artem, I would like to hear what RTB has brought us in terms of technology?

Artem: If you can divide “traditional banner systems” into the four main elements I talked about earlier (banner server, advertisement campaign management system, statistics processing module, and synchronization module), then in RTB, there are two large blocks: SSP (Sell Side Platform) and DSP (Demand Side Platform). SSP involves a banner server, an auction module, a system of managing connected sites, and synchronization and statistics modules. DSP consists of an advertisement campaign management system, an auction coordination engine, including an intelligent module for selecting ads subject to probable clicks, targeting and limits and cost of an impression, as well as, just like with SSP, statistics and synchronization subsystems.

Interactions between DSP and an auction occur through an Open RTB protocol. It’s important to note one more thing related to the business model. DSP always pays SSP for an impression, and sells clicks (CPC) and purchases (CPA) to their advertisers; thus, the forecast module, which should determine which impressions are more likely to lead to clicks and sales, is key in the work of a DSP.

Anatol: So, what we had done earlier has now moved into the area of SSP?

Artem: For the most part, yes; however, if we look at this system in general, then we can say that presently, the only elements which we didn’t run into were the auction module in an SSP and the forecast model in a DSP.

Anatol: A banner auction is, I think, an interesting occurrence.

Artem: …but quite simple. Generally, if we talk about similar systems, their difficulty was not in the large number of business-rules or lines of code, but in that it should work quickly and reliably. Also, with auctions, the logic of their implementation is quite simple: a request from every connected DSP system and the selection of a winner according to a fairly simple algorithm.

Of course, the SSP module isn’t only an auction. There is an additional function: several sites, for example, want to remain anonymous to advertisers so that their presence isn’t shown at the auction. There is also interaction with an anti-fraud module for cutting-off parasitic traffic, enriching request data with additional information on the user, which is received from the outer system (DMP – Data Management Platform), different configurations for connecting to the DSP (several SSP systems transfer just a part of the information they have to the connected DSP, extra data is given for additional payment) and many others.

Anatol: Apparently, there can’t be too many RTB systems. For example, stock trading occurs on financial markets; there aren’t that many of them, but there aren’t that few of them either. And of all the few big ones, there’s: the London exchange and the exchanges in New York and Tokyo. There used to be two in Moscow, but two turned out to be too many and they were combined. Evidently, there shouldn’t be much fewer RTB systems than only-banner systems. I think only a large company can allow itself an RTB system.

Artem: As far as I know, two RTB systems are working in Russia; Yandex made one, AdFox. the other. Still, several similar systems are in development and should appear in some time. Of course, there won’t be many of them. I already mentioned the problem of a lack of traffic. In the case of RTB, everything’s the same; the amount of sites isn’t endless and there aren’t enough for everyone.

Anatol: I understand. Then it makes sense that Yandex, being the largest Russian media resource in terms of “inventory” created an RTB in house.

Artem: Of course. Yandex gladly connects any DSP system to its exchange.

Anatol: Okay. If we look at the different components represented on this diagram, everything related to the SSP Core, meaning the core of the SSP system, we didn’t make it unidirectional. Site managers, and site management companies as well, deal with anti-fraud systems. What does “monitoring” mean on this diagram?

Artem: “Monitoring” means different things, of course, in different situations. In this situation, we’re talking about monitoring a business metric. The main idea is that a particular portion of display requests which enter the SSP in the Banner Engine is passed to the monitoring system, which, upon receiving data, can signal a potential problem, like a sudden drop in the number of displays or clicks, which talks about problems on any of the sites or an increase in parasitic traffic (problems with the anti-fraud module) accordingly.

Anatol: Direct advertisers are shown in the diagram, right?

Artem: Yes. Here, we’re talking about the sites that have their own advertisers, which sell ads directly using an RTB system as a standard, traditional banner system.

Anatol: A fairly typical story for Russia. Artem, you mentioned above that banner systems are very complicated things. What are you comparing this with?

Artem: I meant the following: in terms of the amount of code and the logic of their work, banner systems aren’t very complicated. In terms of implementation, they are, of course, complicated.

Anatol: As far as I remember, we didn’t make our first banner system for ValueCommerce correctly the first time around. Work on optimizing all of the components under high loads took a few years. This was valuable experience. None the less, could you tell us what, in your opinion, difficulties are present for developers and architects in banner systems. Where were there, shall we say, “break downs”?

Artem: Each has its own “flavor”. If, for example, we look at the system we created for HeadHunter, then the main difficulty of the banner system includes the difficult regulations of targeting and limits.

As the banner display response time was strictly limited (no more than 300 milliseconds), the actual request processing was made up of a fairly large number of operations (selecting a banner while taking into account difficult targeting regulations, recording display events, formulating a response, increasing counters, etc.), developers had to make certain efforts.

There is still one more interesting task, which we’re presently facing in development: this is forecasting displays based on targeting. Now, all advertisers want to estimate the amount of inventory and scope of audience, etc., before the start of an advertising campaign, so this forecast needs to be done “on the fly”, based on the targeting of specific parameters and parallel advertising campaigns. All of this is fairly complicated, but without this it’s practically impossible to imagine a modern banner system. Another interesting task is automatically changing the display priorities of separate banners, based on statistics of clicks, so that banners with a higher CTR receive a higher priority.

Anatol: Thank you for the interview, Artem! Until next time on the air 🙂

Artem: Thank you!