Books, Tech and Babies: Fieldwork Research, Ticketmaster

For my research, I looked at the ticket selling site, Ticketmaster. In days of old when any big event went onsale there were invariably backend issues, resulting in front end timeouts or full-on crashes. A couple of weeks ago, U2 put tickets for Dublin concerts on sale, and TicketMaster unveiled a new process to prevent such issues reoccuring.

Im going to look at this progression from old system, to the new system with the help of two different protocols. First, I am going to do a 5 whys on the process of site going down and what could be done around that. Second, I'm try the site, and be my own customer.

Ask - 5 Whys

Ask why in response to 5 consecutive answers

Ticketmaster crashed under high load
Large amounts of users overload web server and backend systems
Ticketmaster front end systems flood the backend with more users than it can handle
Web servers handling high user load is very easy, but one point of failure in backend would show a failure in front end
Backend failures are often costly, difficult and time consuming, they also take the entire site down, to overall lost revenue is extreme, and not just limited to one set of concerts :)

Oops, you need to start again - and by that stage you know the concert will be sold out :)

Ticketmaster's solution is a clever one, in times of high load they have a queueing system. I could imagine a thread based queuing system. The system is designed to handle a specific number of simultaneous transactions, and above that you simply go into a number of queues, and each person is addressed in turn.

Still not perfect, but waiting in line for a chance to buy a ticket is far better user experience than the site full on crashing :)

Try - Be your customers

Outline typical customer experiences

Previous customer experience was that in the event of a busy concert the website eventually would grind to a halt, and it was a matter of when and not if the site would crash. Knowing how these kinds of backend systems work, there is likely to have been at least one failure point. I have seen many systems where one part of the system does not perform under load, and when a specific point is reached, it either slowed down or crashed full stop. I worked on one system where when tested in isolation each end point was perfect under load of tens of thousands of users. But when together on the same set of servers they would crash with 100 users doing regular actions. The items put each other under stress, and importantly put their host machine under stress.

Ah, you nearly got to the end, but our payment gateway is down, sorry - start again and hope for the best :)

Under this new system at ticketmaster, they are clearly regulating what customers reach the critical points of the backend systems. So at a basic level, they have a funnel. So they have a trickle of users reaching the choke points in the backend. So their site stays up, and is performant for users buying tickets for the busy concert, and also importantly, other users of the site.

Another good idea, in terms of keeping the site up, was that for the U2 concert one had many ticket options. By having many different classes of ticket, everyone was not hitting the same end point in the backend, you had ordinary tickets, student tickets, VIP tickets, Premium tickets, I cant believe its not butter tickets, the list goes on. This means you have 5 or 6 main queues and not one epic queues. This is good traffic management, if a little confusing for the end user.

Books, Tech and Babies

Pages

Tuesday, 22 September 2015

Fieldwork Research, Ticketmaster

No comments:

Post a Comment