Monday, June 25, 2018

Guerrilla 2018 Classes Now Open

All Guerrilla training classes are now open for registration.
  1. GCAP: Guerrilla Capacity and Performance — From Counters to Containers and Clouds
  2. GDAT: Guerrilla Data Analytics — Everything from Linear Regression to Machine Learning
  3. PDQW: Pretty Damn Quick Workshop — Personal tuition for performance and capacity mgmt

The following highlights indicate the kind of thing you'll learn. Most especially, how to make better use of all that monitoring and load-testing data you keep collecting.

See what Guerrilla grads are saying about these classes. And how many instructors do you know that are available for you from 9am to 9pm (or later) each day of your class?

Who should attend?

  • IT architects
  • Application developers
  • Performance engineers
  • Sysadmins (Linux, Unix, Windows)
  • System engineers
  • Test engineers
  • Mainframe sysops (IBM. Hitachi, Fujitsu, Unisys)
  • Database admins
  • Devops practitioners
  • SRE engineers
  • Anyone interested in getting beyond performance monitoring

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. The room-booking link is on the registration page.

Tell a colleague and see you in September!

Wednesday, June 20, 2018

Chargeback in the Cloud - The Movie

If you were unable to attend the live presentation on cost-effective defenses against chargeback in the cloud, or simply can't get enough performance and capacity analysis for the AWS cloud (which is completely understandable), here's a direct link to the video recording on CMG's YouTube channel.

The details concerning how you can do this kind of cost-benefit analysis for your cloud applications will be discussed in the upcoming GCAP class and the PDQW workshop. Check the corresponding class registration pages for dates and pricing.

Sunday, May 20, 2018

USL Scalability Modeling with Three Parameters

NOTE: Annoyingly, the remote mathjax server often takes it's sweet time rendering LaTex equations (like, maybe a minute!!!). I don't know if this is deliberate on the part of Google or a bug. It used to be faster. If anyone knows, I'd be interested to hear; especially if there is a way to speed it up. And no, I'm not planning to move to WordPress.

Update of Oct 2018: Wow! MathJax performance is back. Clearly, whinging is the most powerful performance optimizer. :)

The 2-parameter USL model

The original USL model, presented in my GCAP book and updated in the blog post How to Quantify Scalability, is defined in terms of fitting two parameters $\alpha$ (contention) and $\beta$ (coherency). \begin{equation} X(N) = \frac{N \, X(1)}{1 + \alpha \, (N - 1) + \beta \, N (N - 1)} \label{eqn: usl2} \end{equation}

Fitting this nonlinear USL equational model to data requires several steps:

  1. normalizing the throughput data, $X$, to determine relative capacity, $C(N)$.
  2. equation (\ref{eqn: usl2}) is equivalent to $X(N) = C(N) \, X(1)$.
  3. if the $X(1)$ measurement is missing or simply not available—as is often the case with data collected from production systems—the GCAP book describes an elaborate technique for interpolating the value.
The motivation for a 2-parameter model arose out of a desire to meet the twin goals of:
  1. providing each term of the USL with a proper physical meaning, i.e., not treat the USL like a conventional multivariate statistical model (statistics is not math)
  2. satisfying the von Neumann criterion: minimal number of modeling parameters
Last year, I realized the 2-paramater constraint is actually overly severe. Introducing a third parameter would make the statistical fitting process even more universal, as well as simplify the overall procedure. For the USL particularly, the von Neumann criterion should not be taken too literally. It's really more of a guideline: fewer is generally better.

Sunday, April 22, 2018

The Geometry of Latency

... AKA hyperbolae.

Here's a mnemonic tabulation based on dishes and bowls:

Hopefully this makes amends for the more complicated explanation I wrote for CMG back in 2009 entitled: "Mind Your Knees and Queues: Responding to Hyperbole with Hyperbolæ", which I'm pretty sure almost nobody understood.

Saturday, April 21, 2018

Virtual cloudXchange 2018 Conference

Our abstract has been accepted for presentation at the FREE cloudXchange online event to be held by CMG on June 19th at 10am Pacific (5pm UTC). [Extended slides]

Exposing the Cost of Performance
Hidden in the Cloud


Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany

10am Pacific Time on June 19, 2018

Whilst offering lift-and-shift migration and versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of various EC2 Auto Scaling policies.

Wednesday, March 14, 2018

WTF is Modeling, Anyway?

A conversation with performance and capacity management veteran Boris Zibitsker, on his BEZnext channel, about how to save multiple millions of dollars with a one-line performance model (at 21:50 minutes into the video) that has less than 5% error. I wish my PDQ models were that good. :/

The strength of the model turns out to be its explanatory power, rather than prediction, per se. However, with the correct explanation of the performance problem in hand (which also proved that all other guesses were wrong), this model correctly predicted a 300% reduction in application response time for essentially no cost. Modeling doesn't get much better than this.

Footnotes

  1. According to Computer World in 1999, a 32-node IBM SP2 cost $2 million to lease over 3 years. This SP2 cluster was about 6 times bigger.
  2. Because of my vain attempt to suppress details (in the interests of video length), Boris gets confused about the kind of files that are causing the performance problem (near 26:30 minutes). They're not regular data files and they're not executable files. The executable is already running but sometimes waits—for a long time. The question is, waits for what? They are, in fact, special font files that are requested by the X-windows application (the client, in X parlance). These remote files may also get cached, so it's complicated. In my GCAP class, I have more time to go into this level of detail. Despite all these potential complications, my 'log model' accurately predicts the mean application launch time.
  3. Log_2 assumes a binary tree organization of font files whereas, Log_10 assumes a denary tree.
  4. Question for the astute viewer. Since these geophysics applications were all developed in-house, how come the developers never saw the performance problems before they ever got into production? Here's a hint.
  5. Some ppl have asked why there's no video of me. This was the first time Boris had recorded video of a Skype session and he pushed the wrong button (or something). It's prolly better this way. :P

Wednesday, February 21, 2018

CPU Idle Is Not Like White Space

This post seems like it ought to be too trite to write but, I see the following performance gotcha cropping up over and over again.

Under pressure to consolidate resources, usually driven by management and especially regarding processor capacity, there is often an urge to "use up" any idle processor cycles. Idle processor capacity tends to be viewed like it's whitespace on a written page—just begging to be filled up.

The logical equivalent of filling up the "whitespace" is absorbing idle processor capacity by migrating applications that are currently running on other servers and turning those excess servers off or using them for something else.