A startup: Online Hypermarket
Shopium combines data from different web shops in its own general catalog, which allows customers to easily search and order products. The system currently already automatically receives information from a multitude of web shops. User-friendly interface and social network integration is attracting a lot of new customers to the system, both shoppers and sellers.
- Automatic scheduled upload of web shop products’ YML-files
- Rich opportunities to change settings and edit the store directory shop
- Automatic linking categories and products to store goods categories and central catalog (CC)
- Full-text search for product names
- Comparison of several products by showing their descriptions (Wish List)
- Displaying of original shops’ catalogue, filter by category / directory / shop
- Social networks integration, login via social networks, share & like buttons
- Ability to edit the results of the automatic matching of the shop categories to the hypermarket central catalogue
- Java, Grails
Shopium: the interview with the developer
Anatol Filin: Let’s talk about the Shopium which we have been helping develop and grow for two years already. We learned a lot while creating this system. The purpose of our conversation is to understand what the platform that we built represents—as much in terms of business-functionality as in technical details—as well as some of the problems we faced. Additionally, I’d like a kind of sweeping outline of the system’s features, blocks and modules.
How many stores and products are there now on Shopium?
Ivan Veldyaksov: There are about eight-hundred stores and 4 million products (on 03/20/2013).
Anatol: Excellent! What is the system’s main function and who are the main clients? Generally speaking, why was Shopium created?
Иван: We have two kinds of clients in the system: the salesman, who wants to sell his products and post offers on Shopium; and customers, whose main “function” is to look at and buy products and place orders.
Anatol: So basically, the on-line hypermarket Shopium is actually a giant product showroom. For the stores, it’s a supplementary source of leads and sales. Is that right?
Anatol: It works out that if we look at B2B and B2C models, then Shopium is neither B2B nor B2C. B2C is a single store, and B2B is purely business-to-business, like an electronics distributor. In the case of Shopium, we have something like B2B2C. That is to say that Shopium is one business; the stores are another; and the clients, meaning the customers, go to Shopium for goods like they would to specific stores.
Ivan: I agree. In terms of Shopium “communicating” with the stores, it’s B2B; when individual stores interact with customers, it’s B2C.
Anatol: Okay. What are our basic functions and system models? What is the system able to do at the given moment?
Ivan: Let’s start with the individual store or merchant . Firstly, a store can upload their products and prices. We support YML format. YML is the format for product descriptions on Yandex.Market; we use the exact same one. We read it and upload the goods to the system.
We understand that a large portion of our incoming stores are on Yandex.Market, hence why store owners know how to prepare these YML files. They use them on Yandex, so they don’t mind using them on Shopium either.
One feature is the central product catalog. Ideally, all of the stores’ goods fall into one category or another in our central catalog. This makes it easier for the user to search for products.
What else about these main functions? It’s possible to add products manually and process customer orders.
For a store, these are probably the main functions.
For customers, we have a search engine—a full-text search of all of the stores’ products—and there’s a central catalog, which also makes searching for products easier. There’s a wish list, a product-comparison page, and there’s the “product card” feature, which is a special feature where the editors manually enter a product’s characteristics and add an image. This is the thing that groups together similar offers from other stores.
There is also an order-processing and shopping cart function. For customers, the Shopium hypermarket is similar to an ordinary web store; only in this case, there are a lot of stores.
Anatol: Okay. So, if we talk about roles, then it works out that the customer’s is the main role; additionally, there’s the administrators’ role and the editors’ role, right?
Ivan: There are customer, Shopium administrator, store administrator, and content-manager roles. The content-manager is the editor who makes the product cards and writes articles and news. All in all, there are 4 roles.
Anatol: Let’s first highlight the role of the client, the Shopium customer.
Ivan: In the Shopium system, a client can place an order, comment on stores and review products. They can also comment on the site’s articles and news. Additionally, they can rate products and stores.
Anatol: If you look at a store’s activities and administrator, what do they consist of? Does a store have its own admin panel? What can a store administrator do?
Ivan: Firstly, a store administrator can change merchant settings, upload their logo, enter a description, and set up their own showroom, meaning the store they manage. They can edit delivery methods and manually add products. Additionally, they see incoming orders, process them, and change order status, for example: “New Order” – “Processing”, “Processing” – “Delivered”, “Paid” . There are functions for editing their category matches with the central catalog’s categories.
Anatol: We’ll still talk about categories; it’s a very interesting feature. Tell me, if we’re looking at things realistically, how can we “add” goods onto the site? I mean, you mentioned that there were two possibilities for adding products: manually and by uploading a file. Does anybody actually use the manual method for adding products? Or does everyone upload files? Or maybe half-and-half?
Ivan: That’s an interesting question. It turns out that, in reality, almost nobody uses the manual add-product function; everyone “loads” products in YML. Moreover, the function for matching store categories with the central catalog’s categories is fairly interesting and complicated. A lot of time was spent on it, but again, almost nobody uses it. The most we get out of a store is they register and enter their YML. Afterwards, they forget about the Shopium system. I mean, nobody goes on it or edits their category matches with the central catalog’s (CC) categories. Why? That’s a different question.
Anatol: I think there’s some kind of explanation for this.
Ivan: So far, few orders have gone through Shopium stores, and the respective store owners are not interested in editing their category matches with the CC categories. Don’t they see a real benefit to this?
Anatol: Yes, exactly. I think the reason for this is that store owners, especially those of smaller stores, are always “overloaded”; many of them answer calls, process orders, and so forth by themselves. So, generally speaking, they’re “incapable” of anything else.
To close our conversation on YML: were the any problems with uploading files? Did any questions pop up that had to do with capacity or anything else?
Ivan: There were definitely difficulties with YML.
For example, a book store that had more than one hundred thousand products came to us. A technical problem came up because we used xml parser that originally loaded the entire file onto the RAM. Naturally, it didn’t immediately get to us because it froze when the large file got to 100,000 products. To fix this problem, we reworked the existing xml parser to read files little by little and not load them entirely onto the memory.
The second problem arose when adding these products to the database. It turned out that when “inserting” products, the waiting time increased for other responses from other system components. We ended up having to break the product insertion up into a thousand pieces so that requests were sent separately and the database managed to perform other requests in between.
Anatol: And what about the YML format? Does everything work as described?
Incidentally, YML is described on Yandex.ru in the “Technical Support” section. We got our instructions from there. Of course, there were some interesting moments. For example, in the YML format description, it’s written that a period should separate prices. However, let’s say a store sends a YML where a comma separates prices; according to the instructions on “Yandex”, the YML isn’t valid. But the store owner says that they successfully upload everything to Yandex. In that case, we have to adjust to Yandex, so we checked a file and replaced the periods with commas. Yandex was like a parameter for us – once it uploads to Yandex, then it should upload for us. And thus we made the corrections.
Anatol: Ok. Let’s talk about how the actual ordering process goes. Since Shopium is not just an on-line store, but a hypermarket, a lot of merchants enter it. How does this fact influence order placement?
Ivan: There are two scenes I see the most starting from when a client enters the Shopium site. The first one is that a client enters a product name—“iPhone”—into the site’s “search” line and ends up at the search results. On that page, he can either view the CC card or a specific store offer. If he clicks on the CC card, he’ll see our description with store offers and can then choose the cheapest one. If he “clicks” on a specific store, then naturally, he ends up on the merchant’s site with the description of the specified product. Everywhere a store offer is displayed, there is an “Add to Shopping Cart” button, so the standard web-store functions are present on Shopium.
Anatol: And when a customer buys a product on Shopium, does he see which specific store it’s from, or do all of the products looks like Shopium products.
Ivan: The customer sees the store. The store name and logo are always present on the product page, as well as the specific merchant’s phone number. When a customer “adds” a product to the shopping cart, all the products in the shopping cart are grouped into same-store blocks. We let customers understand that the purchase isn’t from Shopium, but from a specific store.
Anatol: : You said customers can choose the most inexpensive product. As far as I remember, when we talked about the project at the beginning of development, there was talk that we needed just that—not to avoid price competitions, that of course is impossible—but to at least increase the number of parameters a customer could pick from. This would mean that in several situations, a customer could pick a more expensive product.
Ivan: On the product cards where store offers are listed, they are not sorted by price by default. Things are sorted using an algorithm that takes into account some of the product’s characteristics. Firstly, the store’s rating is accounted for in the algorithm; then the region; afterwards, the number of completed orders; etc. So you see, price isn’t our first priority. When we look for goods in the universal all-text search, they are also sorted by the algorithm by default. Of course, it’s still possible to sort goods solely by price.
Anatol: You mentioned sorting “by region”. Does that mean we now have a specific match based on customer and seller regions?
Ivan: Next, he clicks the “Place Order” button, picks a delivery method and enters the address. Then, he confirms the order and waits for the courier to arrive, pays, and gets his product. It was originally planned that we would consolidate orders from different stores at Shopium; that way a courier would arrive once from all the stores within the framework of one Shopium order. But this didn’t work out because of difficulties with the delivery service. For now, orders are broken up by merchant, and separate couriers from each store go to the client.
Anatol: Tell me, do we have a specific tool for handling problems with orders, or do we simply track them?
Ivan: Yes, there is a “problem order” feature. We consider an order that is not responded to in one day a problem. A response in any event is when a client changes an order’s status or adds a message (or complaint) to it. This process is monitored by a Shopium administrator. The administrator has a special problem-orders page and there is an order filter. There are a lot of tools. When a system administrator sees that an order is a problem, he connects with the store and finds out why the order hasn’t been processed.
It’s possible to “complain” about an order. Within the framework of their order, a user can open a special window and write a message to an administrator. The administrator sees the complaint and connects with the specific store.
There’s no direct “user—store administrator” connection on Shopium. Communications are carried out through a Shopium system administrator. He sends the received message from the user to the specific store. When the store owner responds to the message, he responds firstly to the Shopium administrator, and then the received message is relayed to the user.
An interesting point having to do with orders: a lot of stores that receive orders from Shopium forget to change the order status in the store’s administrative panel on our site, meaning they don’t use our site’s order processing function. When they receive an order from Shopium, they start processing the order according to their own business-processes, bypassing the platform. This is a serious problem, and the Shopium administrator ends up reminding the merchant about the necessity to process the order (update its status) in our system.
To simplify order processing for store administrators, an order processing function in the form of a letter (link) was made, arriving at the store the moment an order is placed. When the user places an order, a letter is automatically sent to the store. To make updating order status 100% convenient for the store administrator, we created a table with status changes in the letter. An order’s status can be changed by clicking on the link of the necessary status in the table.
Anatol: That is really convenient.
It’s obvious that every store has its own business processes—some better, some worse. It’s hard to automatically switch over to a new system.
Okay, I understand everything about the customer. Now I want to know what abilities the platform’s administrator has. I mean, what does he see when he enters into the system? What tools does he have?
Ivan: The administrator has a large administrative panel with a lot of pages.
Let’s specify them:
The administrator sees all of the central catalog (CC) cards, he can look at and edit them. He can also approve pages that a specific content-manager edited.
The administrator sees the central catalog tree and can edit it—change category names, add sub-categories, change the tree in any way.
The administrators sees all of the orders, and among them—problem orders.
The administrator sees the store list. He can ban a store and look at a store’s orders.
He can also enter a store’s administrative section as a store administrator. This is an extremely convenient function, because situations occur when a store can’t sort out a situation in their admin panel, and in this case, the convenience is that our administrator can see the store interface through the eyes of that store’s administrator.
The administrator has the function of processing complaints, which are already being sent between the client and a specific store. He moderates news and article comments. Also, the administrator sees the general list of system users.
Not long ago, we added a feature where the administrator can make product ads. Besides that, he can link Shopium customers to advertisements.
Anatol: Can the administrator monitor the system’s general statistics?
Ivan: On the administrative panel’s main page, there are statistics on all of the orders—how many new orders, how many problem orders, how many completed orders, and then there’s the general sum of orders.
Anatol: I want to ask about SEO .
Initially, a broad SEO function was planned for Shopium, allowing the site to be conveniently and inexpensively optimized. Because of Shopum’s basic function—a showroom for customers—SEO should be properly organized in the system.
Ivan: A third-party company performed SEO for us. As a result of their work, an SEO report was made. What was in that report?
Recommendations on meta tags. We needed to add several meta tags to every page with content (a product description, article, news):tag Keywords, Descriptions, and Title. . Recommendations were given in the report on how to fill in tags using which key words. And, if this text was fixed, meaning static, on Shopium’s main page, “Shopium.ru: buy goods, on-line store price comparison, reviews” , then this text was dynamic for products, so the name of the product and descriptive words should be in the text.
The second recommendations consisted of linking system pages. Previously, a link to a page to view something (a product, news, article, category) looked like this—address and parameter—numerical ID. 99% of web-stores were at some point set up this way. The trend now is that links should be text. I mean, not “…/product/25080”, but “…/product/i-Phone-4G”, for example.
In the end, we made all of the links in the system text-based.
One more feature having to do with SEO is that we made it possible to download all of our products to an xml for any 3rd-party system or advertising space.
Anatol: Is there a tool in the administrative panel for editing, or “tuning” SEO?
Ivan: Meta tags can be edited from the administrative panel. Besides that, it’s possible to automatically calculate them by product name and description with an algorithm. You could also just assign it a value. If someone suddenly thinks up a “super meta tag” for a specific product, the value can be entered at any time and the product page will reflect that meta tag.
Anatol: So, we talked about Shopium’s universal administrative panel. Did we forget to mention any of its functions?
Ivan: I can say a few words about the content-manager’s administrative panel—it’s a subset of the Shopium head admin’s panel. The content manager can create everything having to do with content on it: articles, news, CC cards. There’s a text editor, like “WYSIWYG”. However, changes entered by the content manager aren’t instantly accepted; a system administrator has to approve them.
Anatol: Let’s move on to a complicated part of the Shopium system. I understand that any web system has a large “public” part, which the end user sees, in our case—the customer or business-user. This is the administrative panel and user interface. “Backstage” has the database and the mid-tier (the functionality of which usually isn’t too large, especially in the case of an on-line store).
However, in web systems, there’re usually a few complex components—thought processes, development, testing and optimization— which take up a lot of time and energy. Were there any of these components in the Shopium system?
Ivan: The most interesting and complicated task on our project was matching store categories with the central catalog’s categories. . For example, making a specific store’s watch from the “Watches” category end up in either the central catalog’s “Watches” category or some other appropriate category, like “Men’s Gifts”.
Anatol: Not everyone may understand that a store’s categories are different from our central catalog’s categories. The category tree in every store is completely random and is constructed in accordance with its products, business and brand.
Ivan: Stores usually have a narrow specialty. One, for example, only sells car rims. It will have the categories: 18 inch rims, 17 inch rims, and 16-inch rims, or something “NEW”, because it’s easier for the store to divide its goods that way.
Naturally, Shopium doesn’t have those kinds of divisions. All of those products correspond to the CC’s “Tires and Rims” category. Other categories show up. How should we solve this problem?
Firstly, there’s a manual matching tool, in the last scenario, which honestly, everyone is too lazy to use.
Secondly, there’s an automatic matching algorithm. The first version of the algorithm worked according to category names. If a store was in the “Rims” category, and in the Shopium CC there was a “Rims” category, the algorithm matched these categories.
But that approached didn’t turn out to be the best, because the category names in any given store rarely matched the Shopium CC’s category names word-for-word. Running into this problem, we tried to make imperfect matches. There are similar string algorithms which show that one string is similar to another by some percent. We tried to implement these algorithms, but then new problems sprang up—too many false positives showed up. For example: “watches” (chasi) and “underwear” (trusi). The words are similar in name, and we matched them from the point of view of the algorithm, but they are very different concepts.
Futhermore, we implemented a “keyword” category matching algorithm. How did this work? In every CC category, there were a small number of reference products that were matched correctly (either manually or by the automatic matching algorithm with a 100% match). Based on these products, we generated a cloud of keywords. From all of the products’ names and descriptions, we got rid of the endings, left only the root words and counted the hundred most popular keywords in each category.
And now, when we match stores to a category, we use the “key words” from the products in this store’s category, and the “keywords” from every CC category. Then, using a certain formula with weights, we compare the two sets of keywords. If we found matches above the threshold, we match the store’s category with the CC category. This algorithm gave much better results than the previous versions that only compared category name.
Anatol: I understand that when a search engine company, like Yandex, solves the product matching problem, it probably uses every instrumental algorithm generated for major search engines. There was already limited time and resources, incomparable to Yandex’s. Everything that we are developing is done within the framework of a fairly small budget. Still, I feel we managed to achieve pretty good results; when I search on Shopium, I don’t see that many matching errors.
Tell me, how was the image upload and system response time implemented in the Shopium system?
Ivan: Here’s something to talk about. When we started uploading YMLs with products, at first we decided to capture all of the “pictures” from the YML files when they were being uploaded. This process ended up taking too much time. If the store’s site returns, for example, one picture in half a second, then downloading a YML file with a hundred thousand products would take several weeks. Then, when analyzing YML, we started adding not the exact picture to the descriptions, but just its URL. When the user already gets to a certain product on the Shopium site, we see if a picture was downloaded; if not, then we offer a download link to it and the user downloads the picture themselves in a separate browser stream. The downloaded picture stays with us, and subsequent calls for that picture get the result from our server. This function is called LazyLoad . This way, our users load the picture they’re interested in themselves. So, when uploading YML files to add new goods to the database, instead of the “Insert” command, which takes unacceptably too long for large amounts of data, we use the PostgreSQL Copy command . The Copy command is needed for adding large amounts of data. It’s helped us a lot.
Anatol: And how was the logical selection of goods from a category implemented in the Shopium project?
Ivan: An SQL query with join on three-four tables plus around 10 criteria were used for choosing products from the CC categories. With an increase in the number of goods, it started taking a very long time to fulfill these requests, around 10-15 seconds. How do we solve this problem? We started creating indexes and caching. .
Anatol: Okay. And which caching instruments do you use on the project?
Ivan: Cashing is performed using a library for the Grails framework.
Anatol: That is to say that no additional servers are used for caching.
Ivan: No, the result of a completed function is written to the RAM. When the server reboots, this result is of course erased, but when the server starts up again, the cache is restored; + plus the cache is restored according to schedule.
Anatol: Okay. And how is our full-text search carried out? According to a product database, descriptions, stores, right?
Ivan: Yes. Everything that is used in search filters, meaning ratings, availability and price, we store to the full-text search index so that the result can be limited by a given client later.
Anatol: Thank you very much for the interview, Ivan!
It was very interesting.