freistilbox Blog

Newer articles « Page 14 of 17 » Older articles

Temporary file directories now come in two flavours

We’re now providing web applications with two variants of temporary file directories: one that is shared between boxes and a faster one that is stored locally on each node of a freistilbox cluster.

Our main goal with freistilbox is giving website developers maximum performance at minimum effort. Storage access is an important aspect in web performance tuning and it’s a good idea to avoid expensive disk operations whenever possible. That’s why we decided to store temporary files created by web applications locally. It’s obvious that writing to a local disk is by far faster than shipping them over a network connection to a shared storage.

We chose that approach under the premise that temporary files are only created and used during a single content request, for example for uploading a file, aggregating CSS code or compressing data. Under this condition, it doesn’t matter when the next request is handled by another cluster box that has its own separate temporary file directory.

Support requests we got over the recent weeks were a clear indication that this premise was wrong. As it turns out, there are situations where temporary files are expected to persist beyond the lifetime of a single content request.

As an example, there was the customer who noticed that batch operations of the “views_export” Drupal module delivered incomplete data. We found out that the batch process saves intermediate results to the temp space. Since the batch runs were distributed over their boxes, so were the result files. At the end of the batch process, the box doing the final run only found the files that were created on this particular box and so returned corrupted data.

In order to make sure that even temporary files are visible to all boxes in a consistent way, we decided to relocate them to the shared file storage. The default temporary file directory available at ../tmp, relative to the document root, is now shared.

Obviously this has a significant impact on performance: the data still has to be written to and read from disk, but now on multiple separate storage servers; and the data transfer over the network comes on top. For those customers that don’t need a shared temporary file directory but depend on speedy file handling, we now also provide an alternative in the form of ../tmp_local.

Oh, and we also put a cleanup process in place that makes sure that temporary files stay true to their name: Files that haven’t been touched for a week are removed automatically.

Meet me at DrupalCamp Cork!

I’m on my way to Dublin where I’ll take the AirCoach bus to Cork. For the coming two days, the local university will be the venue for DrupalCamp Cork. Judging from the list of participants, Drupalcamp Cork is going to be a nice, small gathering of Drupal users from all over Ireland. I like it already.

Tomorrow, I’ll participate actively by giving a talk. I’ve submitted it as “Building a high-performance system stack” but I’ll shorten the title to “Supercharging Drupal”. In this talk, I’ll cover the most common ways of optimizing Drupal performance on the hosting layer.

I’ll also see if there’s an opportunity to demonstrate how easy it is to launch a website on freistilbox.

If you’re in Cork for DrupalCamp, be sure to say hello to me! Who knows, I just might invite you for a beer. And if that’s not enough: Order your new freistilbox cluster during the conference through me and get the first month for free!

freistilbox just got more powerful

One of the most important keys to website performance is caching. That’s why freistilbox includes multiple caching services, first and foremost the Varnish HTTP cache. On our load balancers, Varnish stores all the content your web application allows to be cached. During the lifetime of the respective cache content, Varnish answers incoming requests right from its memory cache instead of forwarding them to the application boxes. This speeds up delivery by orders of magnitude.

But what about these requests that can’t be cached? Of course, not everything can be delivered from the cache all the time. Before it can be cached, content needs to be generated by your web application at least once in a while. So a certain percentage of web requests must be processed by your application. Each request your application receives is assigned to a single Processing Unit (PU). This PU then executes your application code in order to process and respond to the request.

The number of simultaneous requests that can be handled is limited by the total PU available to your freistilbox cluster which in turn depends on the size and number of boxes that make up your cluster. Up until now, these were the effective PU limits for our freistilbox sizes:*freistilbox S: 5 PU

  • freistilbox M: 15 PU
  • freistilbox L: 35 PU
  • freistilbox XL: 75 PU

After a number of small freistilbox setups experienced occasional overload, we realised that with 5 PU, a single freistilbox S just doesn’t have enough capacity for production websites (except the most static ones). Since modern web browsers open multiple connections to fetch different assets in parallel, even only a single visitor could use up all five available Processing Units and block the website for everyone else. That’s why we recently started to advise customers not to use a single freistilbox S for production websites.

That didn’t feel right, though. There shouldn’t be even a single freistilbox configuration that isn’t up to the task. So we decided to increase the PU limits on freistilbox S, M and L. The new specs are as follows:

  • freistilbox S: 10 PU (+5)
  • freistilbox M: 25 PU (+10)
  • freistilbox L: 40 PU (+5)

With 10 PU, a single freistilbox S still won’t be powerful enough to run a busy community website but it should now have enough capacity to reliably serve a medium website that has a decent cache hit ratio.

Oh, and all existing freistilbox customers have already been auto-upgraded!

freistilbox with more power — for the same price. What’s not to love?

What do you think? Leave us a comment below!

DrupalCon Prague 2013

Last week, Markus and I returned from DrupalCon Prague back to our desks in Germany and Ireland, respectively. It was a fun event and I’d like to tell you about my personal highlights.

First of all, DrupalCon is the biggest event for the Drupal community and the perfect opportunity to see and meet all the people that make Drupal a great open source project. Actually, meeting people was the main reason I flew to Prague. Especially in terms of customer contact, talking in person can’t be beat. The more if they praise our services in front of a lot other Drupal business people. ;-) That’s why I had a great conference start at the CxO meeting on Monday.

The fun continued early Tuesday morning (sadly, too early for many) with Tutti fan’ Drupal, a musical play in which ‘The N00b’, a young, inexperienced web developer meets 'The Client’ who needs a website. The hilarious piece also featured The Drupal Community on a Bad Day, Drupalgeno and Drupalgena as well as The Drupal Community on a Good Day. I had a lot of laughs and learned that the highly sought-after Drupal Talent also includes fabulous singing voices.

Later, in his State of Drupal keynote, project founder Dries Buytaert explained his vision: Drupal is bigger than technology. It’s an idea. So, before going into detail about what’s happening around the next major release of Drupal 8, Dries listed what he sees as the most important drivers for our activities:

  • We’re changing the world.
  • We help individuals build a dream.
  • We give small organizations a big voice.
  • We give enterprises a new idea.
  • We inspire wonder and delight.
  • We admit no boundaries.,

Especially for me as someone who isn’t directly involved in Drupal development, it was highly interesting to see what technological changes Drupal 8 will bring. And I was amazed by the community support this new release enjoys: With more than 1600 contributors, Drupal 8 in its current pre-alpha stage already has more than twice the number of people involved than Drupal 7 when it was finally released!

The second reason I attended DrupalCon was because I had volunteered to curate its DevOps session track. For months, the DrupalCon content team had done a lot of work to make sure that conference attendees got to select from a wealth of high-quality talks on many different topics. I’d like to thank all speakers I got to work with before and during DrupalCon for their willingness to stand in front of a crowd and share their knowledge. After all, sharing is an integral part of DevOps culture.

During the week in Prague, Markus and I had many valuable conversations with our customers. Not rarely, we got critical feedback on our Drupal hosting platform. While criticism isn’t as easy to accept as praise, it’s essential for us in order to achieve better service quality, so we appreciate your constructive openness.

Although Prague was my third DrupalCon, it was the first time I attended Trivia Night on the last conference day. Organised by my new home team, Drupal Ireland, this entertaining event drew so many Drupalistas to the Hilton Hotel that we ended up sending away people because the room was stuffed with more than 100 people. Alan, you did a tremendous job as MC!

As it is tradition, at the end of this DrupalCon the location of next year’s DrupalCon Europe was announced and we’re looking forward to see what the passionate Dutch Drupal community has in store for us. Another important European event, the Drupal Developer Days, will take place in Szeged, Hungary; you’ll probably see us there, too.

Not so long ago, I had some doubts if attending DrupalCon for me still was worth spending the time and money. DrupalCon Prague got rid of them. I’ll see you in Amsterdam!

Fighting the 503 Server Error

We’re happy to move another entry on our new product roadmap to the Finished column: We’ve greatly improved the error handling on our load balancers.

Handling of application errors

Before this change, our load balancers delivered a terse 503 Server Error page for each and every condition that prevented the content requested from being delivered. Unfortunately, this included the situation when it wasn’t a part of the hosting platform failing but the web application. For example, if Drupal is put into maintenance mode or has issues connecting to its database, it delivers an error page with a HTTP error code 500 and an error message in the page body. But instead of delivering this page, our load balancers replaced it with their plain Server Error page. In other words, they made the issue worse by concealing its cause.

We’ve improved the load balancer configuration so that now, a 503 Server Error is only displayed when there is no way of delivering useful content. But if its just the application sending an error page, its content will be passed through to the visitor.

Trying everything to deliver

The most frequent cause of the dreaded 503 Server Error is that a load balancer has run out of healthy application boxes to which it can pass on incoming requests. Especially customers that with only a single box ran into this problem when that box got overloaded, even if only for a few seconds.

We’ve found a way to prevent ugly error messages even in this situation: A Varnish function named grace mode allows us to keep content remaining in the cache for a defined period of time after its expiry time. If a request can neither be answered with fresh cache content nor be forwarded to any box, Varnish will now try to deliver recently expired cache content (max. 1 hour over expiry time). Only if there isn’t anything left that can be delivered to the visitor within reason, an error message will come up.

Minimizing box downtime

We’ve also optimized the intervals in which our load balancers check if the application boxes in a freistilbox cluster are healthy. An unresponsive box is now detected and taken out of the load balancing pool within only 5 seconds. Previously, the delay was about 15s, so we’ve greatly reduced the amount of failed load balancer requests. And boxes that have recovered are also taken back into the pool fare more quickly, giving us a more stable load distribution.

Looking at our monitoring metrics, we’re quite happy with the results of these changes. We see far less failing requests, less spikes in box usage and overall more stable website operation.

We’d love to hear from you: Are you experiencing a positive change in your application’s stability? Please let us know in the comments!

Newer articles « Page 14 of 17 » Older articles