Monday, June 29, 2009

PDQ 5.0 Test Suite or ... How I Spent My Weekend

I was planning to blog about the amazing time I had at Velocity 2009 last week, when this landed in my mailbox (edited for space and privacy):

Subject: Seeking help with PDQ-R ...
Date: Thu, 25 Jun 2009 15:51:21 -0500

My name is James and I've been trying to learn to properly use PDQ after reading two of your books, "Guerrilla Capacity Planning" and "Analyzing Computer System Performance with Perl::PDQ." I'm still getting a grip on PDQ-R. ... I decided to set about of re-creating the queue circuit in the study with PDQ-R as an exercise. ...
The output of my code yields:
[1] "Manual response time for class 1 is 0.864179 seconds"
[1] "PDQ-R response time for class 1 is 0.313637 seconds"
[1] "Manual response time for class 2 is 6.105397 seconds"
[1] "PDQ-R response time for class 2 is 3.552873 seconds"
[1] "Manual response time for class 3 is 4.535833 seconds"
[1] "PDQ-R response time for class 3 is 4.535833 seconds"
If you could give my code a look over and give me some hints I would really appreciate it.

It turns out that James N. had discovered a bug (gasp!) in PDQ, which is why we have users. (jk) The above output refers to a simple model of a database system comprising 3 resources (call them: cpu, disk1 and disk2) and 3 transaction streams (work1, work2, work3) and no limit on the queue lengths, i.e., an open queueing network or circuit. Here's what my rendition looks like:

# PDQ-R model

library(pdq)

# Request rates of the 3 transaction streams into the DBMS
Xsys<-c(50/150, 80/150, 70/150)

# Service demands at each resource
Dcpu<-c(0.096, 0.615, 0.193)
Ddk1<-c(0.088, 0.683, 0.763)
Ddk2<-c(0.119, 0.795, 0.400)

# Start PDQ code with Init call
Init("James' DB Model");

# Define the 3 transaction workloads
workname<-1:3
for (w in 1:3) {
workname[w] <- sprintf("work%d", w)
CreateOpen(workname[w], Xsys[w])
}

# Define the 3 resources
CreateNode("cpu", CEN, FCFS)
CreateNode("dk1", CEN, FCFS)
CreateNode("dk2", CEN, FCFS)

for (w in 1:3) {
SetDemand("cpu", workname[w], Dcpu[w])
SetDemand("dk1", workname[w], Ddk1[w])
SetDemand("dk2", workname[w], Ddk2[w])
}

Solve(CANON)
Report()

To hunt down the problem, I rewrote the PDQ-R model in C, just in case there were any translation problems with SWIG-ing PDQ/lib into PDQ-R, Perl PDQ, PyDQ, etc.

/*
multiclass-open.c

Created by NJG on Thursday, June 25, 2009
Updated by NJG on Sunday, June 28, 2009
*/

#include < stdio.h >
#include < stdlib.h >
#include < string.h >
#include < math.h >
#include "PDQ_Lib.h"


int main(void) {

extern void exit();
extern char s1[];

char *p; // dummy pointer for names
char *devname[3];
char *workname[3];
int i, j, k, n, s, w;
double actualtR[4][3];

// Expected RT values
double expectR[4][3] = {
{0.174, 1.118, 0.351},
{0.351, 2.734, 3.054},
{0.340, 2.270, 1.142},
{0.865, 6.122, 4.546}
};

// Request rates of the 3 transaction streams into the DBMS
double Xsys[] = {50.0/150, 80.0/150, 70.0/150};

// Service demands
double Dcpu[] = {0.096, 0.615, 0.193};
double Ddk1[] = {0.088, 0.683, 0.763};
double Ddk2[] = {0.119, 0.795, 0.400};

// Name the workloads
for(w = 0; w < 3; w++) {
resets(s1);
sprintf(s1, "work%d", w+1);
if ( (p = (char *) malloc(strlen(s1) * sizeof(char)) ) != NULL) {
strcpy(p, s1); // copy into assigned storage
workname[w] = p;
}
else {
printf("malloc failed!\n");
exit(-1);
}
}
free(p);

// Name the resources
for(k = 0; k < 3; k++) {
resets(s1);
if (k == 0) sprintf(s1, "%s", "cpu");
if (k == 1) sprintf(s1, "%s", "dk1");
if (k == 2) sprintf(s1, "%s", "dk2");
if ( (p = (char *) malloc(strlen(s1) * sizeof(char)) ) != NULL) {
strcpy(p, s1); // copy into assigned storage
devname[k] = p;
}
else {
printf("malloc failed!\n");
exit(-1);
}
}
free(p);

/************************** Start PDQ code **********************/
PDQ_Init("Multiclass Test Model");

// Create workloads
for(w = 0; w < 3; w++) {
s = PDQ_CreateOpen(workname[w], Xsys[w]);
}

// Create resources
n = PDQ_CreateNode("cpu", CEN, FCFS);
n = PDQ_CreateNode("dk1", CEN, FCFS);
n = PDQ_CreateNode("dk2", CEN, FCFS);

// Assign demands
for(w = 0; w < 3; w++) {
PDQ_SetDemand("cpu", workname[w], Dcpu[w]);
PDQ_SetDemand("dk1", workname[w], Ddk1[w]);
PDQ_SetDemand("dk2", workname[w], Ddk2[w]);
}

PDQ_Solve(CANON);

printf("Expected Response Times\n");
for(i = 0; i < 4; i++) {
for(j = 0; j < 3; j++) {
printf("%4.3f\t", expectR[i][j]);
}
printf("\n");
}
printf("--------------------------\n");

printf("Actual Response Times\n");
for(i = 0; i < 4; i++) {
// System response times for QNM
if (i == 3) {
for(w = 0; w < 3; w++) {
printf("%4.3f\t",
actualtR[i][w] = PDQ_GetResponse(TRANS, workname[w]));
}
}

// Residence times per resource
if (i < 3) {
for(w = 0; w < 3; w++) {
printf("%4.3f\t",
actualtR[i][w] = PDQ_GetResidenceTime(devname[i], workname[w], TRANS));
}
}
printf("\n");
}

} // main

This code also compares actual (meaning, computed by PDQ) with expected values (embedded as 2-d array) of residence times due to each workload at each resource. The "expected" values can come from any one of a number of sources such as: measurements, other models, other tools, etc. This forms the basis of the test code approach.

The problem seen by James turns out to arise from a conflict between the way resource utilizations are computed for the new multi-server queues (released in PDQ 5.0.1) and multi-class workloads. When the PDQ lib is corrected, the agreement can be observed in this output:

[njg]~/PDQ/Test Suite/C-PDQ% ./mulclass-open
Expected Response Times
0.174 1.118 0.351
0.351 2.734 3.054
0.340 2.270 1.142
0.865 6.122 4.546 <--
--------------------------
Actual Response Times
0.175 1.118 0.351
0.352 2.728 3.048
0.340 2.274 1.144
0.866 6.120 4.543 <--

The last line in each of the above tables corresponds to the "manual" values that James was reporting in his email.

The PDQ test cases had not kept up with new code developments; one of the hazards of only having severely punctuated time to work on PDQ. The new release PDQ 5.0.2 should be available for download later this week. I'll send out an email notice at that time.

No comments: