In building an eCommerce site, it is easy to overlook and dismiss some of the physical connections a website has to the real world. However, as a software engineer responsible for the site’s success, it is extremely important to do so. Every catalog page should account for actual inventory on hand. Every order requires items to be packed and shipped. Every customer notification should be triggered by and reflect real world actions. Consumers and buyers do not care about the software abstractions of the real world and the challenges that come with them. They just want to browse, purchase, and receive their items. Central to that equation is providing customers with realistic expectations regarding delivery times. While the expectation as to what is a reasonable shipping time has changed over the years, the necessity of generating and communicating accurate estimates has been around since the days of mail-in catalogs. Failing to meet shipping expectations leads to unhappy and anxious customers. This was the problem Doheny.com found itself in. There was a disconnect between their digital and physical existence which sometimes caused their delivery times to be wrong.
The solution that never worked
Doheny’s sells pool supplies. This market is highly seasonal and customers are occasionally in frantic need of their ordered parts or chemicals. Incorrect delivery times are not an option if the Doheny’s website is to generate exceptional customer experiences that lead to happy customers. When Doheny’s chose Kadro as its new business and technology partner in late 2019, their Magento 2 site was already up and running. At the time, the website even had incorporated an attempt at a solution to the delivery time issue. The solution’s process went something like this. Each product on the site had a flag attribute that designated it as a Next Day Delivery product and those products received special messaging on the site. Tying the messaging to a product attribute required a product update which had to come from the system that managed all product data - Doheny’s Product Information Manager (PIM) system - which in turn had to wait for inventory data to be compiled in another system from Doheny’s various warehouses. By the time the data managed to make it into Magento, it was already out of date and had a reasonable chance of upsetting a customer with misinformation about their order. The solution had obvious and inevitable problems. That is why before Kadro even began working on the project a number of conversations around improving the delivery times feature occurred.
The solution that would never be
When the Kadro team was given the task for improving the delivery times feature of Doheny.com that project came with a built-in plan. The original plan was for Magento to be given access to zip code warehouse delivery estimates and stock data for each warehouse. This data would be periodically synchronized to Magento and used for determining what delivery time estimates should be shown to the customer. It will come as a surprise to no one, but that plan did not last a day before requirements changed. The inevitable question was, ‘Where do we get this data?’ The response was unexpected. Kadro was told that the data would not be made available directly, but instead an API would be created that would take a zip code and collection of products and return the delivery times for each zip code and product combination. A delta file would also be made available that would include the cross product of zip codes and products that had stock changes.This was not going to work with the original plan and project scope. Our team had to come up with a new solution.
And the solution that is
A naive approach to these new requirements would have involved querying the provided API for delivery times as they’re needed. That would have meant each product listing page, each product detail page, and each checkout page showing the shopping cart would be required to call the API. This approach would have ensured a deluge of similar requests against an API endpoint whose tolerance we did not know. Sometimes though, a terribly naive approach can be turned into a professional enterprise-level solution by simply adding a cache in front.
Unfortunately, we couldn’t just throw in a cache and call it a day. The first problem was the actual decision space involved. Doheny’s has a catalog of a little over 20,000 skus and there are about 41,000 zip codes in the United States. Since our decision space is the cross product of those two variables, we had about 820 million data points to worry about. On top of that, after getting a few sample delta files, we discovered that they had on average 100,000 – 500,000 data points that would need to be processed and updated every hour. Throwing that amount of data into Magento’s MySQL database is a recipe for a crispy fried server, and even then it’s not the right tool for the job. Therefore, as an alternative, our team decided to use Redis.
Magento has native support for Redis and Doheny’s was already using it for data and session caching. Magento even makes creating a new data cache pretty simple. However, we were concerned about shoving so much data into Magento’s existing caches. Also, Magento’s Redis adaptor code is written and optimized for how Magento plans to use it. Our usage was decidedly non-standard, so we made two key decisions: 1.We were going to set up a separate Redis service and 2. Write our own small Redis wrapper.
Setting up the service was actually the easiest part. Since our team could fall back on the API for missing data, the cache’s main goal was dealing with repeated calls and taking stress off the API. This is a classic case for a cache using the Least Recently Used (LRU) strategy. It works how the name implies. If data needs to be ejected, the data that hasn’t been used for the longest period of time gets the boot. With this tactic, we didn’t need to worry about how to store 820 million data points. We only need to store enough data to take the majority of the heat off the API. Our team talked over the requirements with Doheny’s hosting company who had a new service set up in QA and production within the day.
The Redis wrapper itself wasn’t too much of a problem either. We decided to build it on top of Zend_Cache which did most of the heavy lifting. After piecing together the code to make a SOAP call against the API, we had a working proof of concept taking data from the API, putting it into the cache, and then using the cache for subsequent calls. Things were going too well and were bound to come to a screeching halt. The delta files made sure of it.
While the code we’d written so far worked great for the smaller requests, it could not process the large delta files in anything close to a reasonable amount of time. While the file processing happened fast enough, Zend_Cache didn’t include support for a Redis mass insert and attempting to add the data one line at a time was too slow. With no other options readily available, digging into the Redis documentation led to the right answer. Redis does have an answer for this problem. You can convert the data into a specialized Redis protocol and pipe that in directly. After a few false starts as we failed to create properly formatted Redis protocol, our team wrote up something that we believed would work. We took the delta file and created a Redis protocol version before piping that into Redis. Once we had finished testing on a collection of smaller delta files, we decided to go for broke and threw the largest file we had at it. The test completed so quickly our team assumed something had gone wrong. Instead we were delighted to find out that things had worked great and for our purposes the Redis protocol was lightning fast.
For us, every production build comes with at least some degree of nervousness. We can be supremely confident in both the code and the process, but we’ve been doing this too long not to expect all manner of the unexpected. Combine that with a build using a new Redis service, custom Redis code, and a dash of Redis protocol and we were a bit on edge. Doheny’s was still a new partner and this feature had a lot of eyes on it. Kadro wanted to impress. So as we put out the new code and made the final connections, our team checked and double checked. The site came up, the first delta file was processed, and customers started ordering again. The dedicated Redis instance started to fill up. 10% no issues, 30% going well, 50% over the hump, 70% we clenched our teeth, and finally 100% full. The cache reached capacity without issue and happily tossed old unrequested data. The API calls stayed at a reasonable rate and the delivery messaging loaded quickly on the frontend. As if blessed by the god of programmers herself, everything worked and our nerves subsided. The new version of the old feature was live and improved. Good thing too, because Kadro already had new problems waiting to be solved. Which is what we do, overcome business and technical challenges and make merchant’s visions for selling online a reality.
Learn more about the Doheny's project by visiting the Doheny's Portfolio page on the Kadro website.