====== Data acquisition for longitudinal product usage study ======
This project was the solution that would collect long term data of the usage of preproduction (beta test) stoves. The project aimed to track the usage of 10 stoves in the homes of the beta tester over a period of several months. These testers were widely disperse, located in 3 different states. Due to the limited budget of this project, only opensource or free software was used for the development process.
===== Hardware =====
{{ usb-5201pop.png?300|}} To record the data a [[http://www.measurementcomputing.com/cbicatalog/USB-5201.asp?dept_id=414&pf_id=1718|Measurement Computing USB-5201]] was used. This product was selected for the following reasons.
* Relatively inexpensive
* Reusable
* Completely independent embedded system
* Stable
* On power outage automatically resumes recording of data
* Records data to removable compact flash card
* Records months worth of data on each card
* Allowed the data to be collected through mail
* Records data to CSV
===== Data processing and storage with Perl and MySQL =====
Excel was considered as a data storage mechanism, because the shop I was working at used Excel for almost everything. However, it did not take long to discover that Excel is limited to [[http://support.microsoft.com/kb/120596|65,536 rows]] (which has increased to 1,048,576 rows in Excel 2007). Due to the millions of records that were going to be collected an evaluated Excel was unacceptable.
The obvious solution seemed to be MySQL, because it can easily store the millions of records that were going to need to be recorded and analyzed.
To process and insert all of the data from the plain text files created by the data acquisition devices a series of Perl scripts were created to do all the heavy lifting. Perl was chosen due to the ease of development, copious modules, and powerful regular expression capability.
===== Windows GUI made with VB.NET and Perl=====
This project was created for a Windows shop. Due to this fact most of the engineers were not comfortable with using the command line to accomplish tasks. This was simply solved by creating a GUI Frontend with VB.NET using the free [[http://www.microsoft.com/express/vb/|Visual Basic Express Edition]] (note: For my own work I use Visual Studio, but the express edition of VB was used because it was completely free for the company to use)
==== Inserting data into MySQL using Perl ====
{{capture_26052008_033503.png|}}
This is where the plain text CSV data file from the device is inserted into the database. Sometimes literally millions of rows of data were inserted into the database at a time. The Perl script does have a percent done indicator.
=== csv.pl ===
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use DBI;
use POSIX;
my $driver = "mysql"; # Database driver type
my $database = "data"; # Database name
my $user = "root"; # Database user name
my $password = "PASSWORD"; # Database user password
# Connect to database
my $dbh = DBI->connect(
"DBI:$driver:$database",
$user, $password,
{
RaiseError => 1,
PrintError => 1,
}
) or die $DBI::errstr;
my $sth = $dbh->prepare("
INSERT INTO temp
(unit, dt, lf, br, lr, rr , rf, broil, cbake, bake)
VALUES (? , ? , ?, ?, ?, ?, ?, ?, ?, ?)
") or die $dbh->errstr;
my $file = $ARGV[0];
my $csv = Text::CSV->new();
my $line = 1;
my $small_line = 1;
my $done_per = 0;
my $line_count = 0;
open (CSV, "<", $file) or die $!;
#print "unit \t time \t\t\t\t 0 \t\t 1 \t\t 2\t\t 3\t\t 4\t\t 5 \t\t 6 \t\t 7\n";
print "Processing unit $ARGV[1], ";
while () {
if ($csv->parse($_)) {
my @columns = $csv->fields();
#try to find the number of points
if ($line == 4)
{
my @total_lines = split(': ', $columns[0]);
$line_count = $total_lines[1];
print $line_count;
print " lines: \n";
}
if ($line > 5){
$columns[1] =~ m/(\d{2})\/(\d{2})\/(\d{4})\ (\d{2})\:(\d{2})\:(\d{2})/;
my $month = $1;
my $day = $2;
my $year = $3;
my $hour = $4;
my $min = $5;
my $sec = $6;
my $condate;
my $new_hour = $hour - 1;
$condate = "$year-$month-$day $new_hour:$min:$sec";
my $condatepr = "$year-$month-$day-$new_hour-$min-$sec";
#this is the % done indicator
$small_line = $line -5;
my $old_done_per = $done_per;
$done_per = ($small_line / $line_count) * 100;
#filter data to remove bad points
if ($columns[2] < 0) {$columns[2] = 0};
if ($columns[3] < 0) {$columns[3] = 0};
if ($columns[4] < 0) {$columns[4] = 0};
if ($columns[5] < 0) {$columns[5] = 0};
if ($columns[6] < 0) {$columns[6] = 0};
if ($columns[7] < 0) {$columns[7] = 0};
if ($columns[8] < 0) {$columns[8] = 0};
if ($columns[9] < 0) {$columns[9] = 0};
#insert the data
$sth->execute($ARGV[1], $condatepr, $columns[2], $columns[3], $columns[4], $columns[5], $columns[6], $columns[7], $columns[8], $columns[9]) or die $dbh->errstr;
#this prints out the status
if ( ceil($done_per) > ceil ($old_done_per ) )
{
print ceil($done_per);
print "% done : " ;
print $small_line;
print " of ";
print $line_count;
print "\n";
}
}
$line++;
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
$dbh->disconnect;
exit 0;
==== Excel Report generation with MySQL and Perl ====
{{capture_26052008_033504.png|}}
This tab facilitates the creation of reports. The reports were created as Excel workbooks, with one work sheet for each selected stove. Thanks to the power of MySQL very complex queries could be ran in very little time, and a rich report could easily be generated.
=== report sample ===
^ Unit|1|
^ start time|11/17/2006 14:22|
^ end time|12/13/2006 18:53|
^ total time recorded(sec)|2262617|
^ time unit was being used (sec)|44535|
^ % of time unit was being used|1.968296004|
^ time bridge element was on (sec)|25665|
^ % of time bridge was on|1.134305983|
^ % of time bridge was on when cooktop or oven where on|57.62883126|
^ simultaneous oven and cooktop time (sec)|2595|
^ % of time of simultaneous out of total time|0.114690202|
^ % of time of simultaneous usage while oven is in use|5.826877737|
^ % that bridge was on when any cooktop element was on|74.01586157|
^ time when any cooktop element was on (sec)|34675|
^ LF on time (sec)|26225|
^ LR on time (sec)|12980|
^ RR on time (sec)|12570|
^ RF on time (sec)|0|
^ broil on time (sec)|0|
^ cbake on time (sec)|615|
^ bake on time (sec)|0|
=== sql_report.csv ===
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Date::Manip;
use Spreadsheet::WriteExcel;
use Time::Format;
my @units_to_process = split(',', $ARGV[0]);
my $cooktop_on = $ARGV[1];
my $oven_on = $ARGV[2];
my $report_out = $ARGV[3];
# Create a new Excel workbook
my $workbook = Spreadsheet::WriteExcel->new($report_out);
#initilize timezone
Date_Init("TZ=EST");
my $driver = "mysql"; # Database driver type
my $database = "data"; # Database name
my $user = "root"; # Database user name
my $password = "PASSWORD"; # Database user password
# Connect to database
my $dbh = DBI->connect(
"DBI:$driver:$database",
$user, $password,
{
RaiseError => 1,
PrintError => 1,
}
) or die $DBI::errstr;
foreach (@units_to_process) {
#find the amount of time that ANYthing was on
my $find_any_on_time = $dbh->prepare("
SELECT count(*) FROM temp
where (LF > ? OR BR > ? OR LR > ? OR RR > ? OR RF > ? OR broil > ? OR cbake > ? OR bake > ?)
AND unit = ?;
") or die $dbh->errstr;
$find_any_on_time->execute($cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$oven_on,$oven_on,$oven_on,$_) or die $dbh->errstr;
my $any_on_time = 0;
while (my @any_on_array = $find_any_on_time->fetchrow_array) {
$any_on_time = $any_on_array[0];
print "any on time : $any_on_time \n" ;
}
#find the amount of time that ANYthing was on - minus the oven
my $find_any_on_time_not_oven = $dbh->prepare("
SELECT count(*) FROM temp
where (LF > ? OR BR > ? OR LR > ? OR RR > ? OR RF > ?)
AND unit = ?;
") or die $dbh->errstr;
$find_any_on_time_not_oven->execute($cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$_) or die $dbh->errstr;
my $any_on_time_not_oven = 0;
while (my @any_on_time_not_oven_array = $find_any_on_time_not_oven->fetchrow_array) {
$any_on_time_not_oven = $any_on_time_not_oven_array[0];
print "any on time not oven : $any_on_time_not_oven \n" ;
}
#find the amount of simultaneous burner and oven use
my $find_sim_use = $dbh->prepare("
SELECT count(*) FROM temp where (LF > ? OR BR > ? OR LR > ? OR RR > ? OR RF > ?)
AND ( broil > ? OR cbake > ? OR bake > ?) AND unit = ?;
") or die $dbh->errstr;
$find_sim_use->execute($cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$cooktop_on,$oven_on,$oven_on,$oven_on,$_) or die $dbh->errstr;
my $sim_time_use = 0;
while (my @sim_time_use_array = $find_sim_use->fetchrow_array) {
$sim_time_use = $sim_time_use_array [0];
print "sim time use: $sim_time_use \n" ;
}
#find the total time possible
#this will be used to find (ussage time / total time) * 100
#this will yeild a % use time
#find earliest time
my $find_first_time = $dbh->prepare("
SELECT min(DT) FROM temp where unit = ?;
") or die $dbh->errstr;
$find_first_time->execute($_) or die $dbh->errstr;
my $first_time = 0;
my $first_time_ex;
while (my @time_array1 = $find_first_time->fetchrow_array) {
$first_time = ParseDate($time_array1[0]);
#convert date so excel likes it
$first_time_ex = $time{'mm/dd/yyyy hh:mm:ss', $first_time};
}
#find last time
my $find_last_time = $dbh->prepare("
SELECT max(DT) FROM temp where unit = ?;
") or die $dbh->errstr;
$find_last_time->execute($_) or die $dbh->errstr;
my $last_time = 0;
my $last_time_ex;
while (my @time_array2 = $find_last_time->fetchrow_array) {
$last_time = ParseDate($time_array2[0]);
#convert date so excel likes it
$last_time_ex = $time{'mm/dd/yyyy hh:mm:ss', $last_time};
}
#find count of records that this statement is true
#this will be used to find the usage time
#bridge
my $sth = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE br > ? and unit = ?
") or die $dbh->errstr;
$sth->execute($cooktop_on,$_) or die $dbh->errstr;
my $br_on = 0;
while (my @row_array = $sth->fetchrow_array) {
$br_on = $row_array[0];
}
#find the time between the dates
my $delta = DateCalc($first_time,$last_time);
my @values = split(':', $delta);
#print "$values[2] weeks $values[3] days $values[4] hours $values[5] mins $values[6] seconds \n";
#find the total number of seconds that between the times
my $total_sec = ($values[2] * 604800) + ($values[3]*86400) + ($values[4] * 3600) + ($values[5] * 60) +( $values[6]) ;
print "total : $total_sec seconds \n";
# Add a worksheet
my $worksheet = $workbook->add_worksheet("Unit $_");
$worksheet->set_column(0, 0, 60);
$worksheet->set_column(1, 1, 20);
$worksheet->write(0, 0, "Unit");
$worksheet->write(0, 1, $_);
$worksheet->write(1, 0, "start time");
$worksheet->write(1, 1, $first_time_ex);
$worksheet->write(2, 0, "end time");
$worksheet->write(2, 1, $last_time_ex);
$worksheet->write(3, 0, "total time recorded(sec)");
$worksheet->write(3, 1, $total_sec);
$worksheet->write(4, 0, "time unit was being used (sec)");
$worksheet->write(4, 1, $any_on_time * 5);
$worksheet->write(5, 0, "% of time unit was being used");
$worksheet->write(5, 1, ($any_on_time * 5/$total_sec) * 100);
$worksheet->write(6, 0, "time bridge element was on (sec)");
$worksheet->write(6, 1, $br_on * 5);
$worksheet->write(7, 0, "% of time bridge was on");
$worksheet->write(7, 1, ( $br_on * 5 / $total_sec ) * 100);
$worksheet->write(8, 0, "% of time bridge was on when cooktop or oven where on");
$worksheet->write(8, 1, $br_on * 5 /($any_on_time * 5) * 100);
$worksheet->write(9, 0, "simultaneous oven and cooktop time (sec)");
$worksheet->write(9, 1, $sim_time_use * 5);
$worksheet->write(10, 0, "% of time of simultaneous out of total time");
$worksheet->write(10, 1, ($sim_time_use * 5) / $total_sec * 100);
$worksheet->write(11, 0, "% of time of simultaneous usage while oven is in use");
$worksheet->write(11, 1, ($sim_time_use * 5) / ($any_on_time * 5) * 100);
$worksheet->write(12, 0,"% that bridge was on when any cooktop element was on");
$worksheet->write(12, 1, $br_on * 5 / ($any_on_time_not_oven * 5) * 100 ) ;
$worksheet->write(13, 0,"time when any cooktop element was on (sec)");
$worksheet->write(13, 1, ($any_on_time_not_oven * 5) ) ;
#LF on time
my $lf_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE lf > ? and unit = ?
") or die $dbh->errstr;
$lf_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $lf_on;
while (my @lf_row_array = $lf_q->fetchrow_array) {
$lf_on = $lf_row_array[0];
}
$worksheet->write(14, 0,"LF on time (sec)");
$worksheet->write(14, 1, ($lf_on * 5) ) ;
#LR on time
my $lr_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE lr > ? and unit = ?
") or die $dbh->errstr;
$lr_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $lr_on = 0;
while (my @lr_row_array = $lr_q->fetchrow_array) {
$lr_on = $lr_row_array[0];
}
$worksheet->write(15, 0,"LR on time (sec)");
$worksheet->write(15, 1, ($lr_on * 5) ) ;
#RR on time
my $rr_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE rr > ? and unit = ?
") or die $dbh->errstr;
$rr_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $rr_on = 0;
while (my @rr_row_array = $rr_q->fetchrow_array) {
$rr_on = $rr_row_array[0];
}
$worksheet->write(16, 0,"RR on time (sec)");
$worksheet->write(16, 1, ($rr_on * 5) ) ;
#RF on time
my $rf_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE rf > ? and unit = ?
") or die $dbh->errstr;
$rf_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $rf_on = 0;
while (my @rf_row_array = $rf_q->fetchrow_array) {
$rf_on = $rf_row_array[0];
}
$worksheet->write(17, 0,"RF on time (sec)");
$worksheet->write(17, 1, ($rf_on * 5) ) ;
#broil on time
my $broil_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE broil > ? and unit = ?
") or die $dbh->errstr;
$broil_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $broil_on = 0;
while (my @broil_row_array = $broil_q->fetchrow_array) {
$broil_on = $broil_row_array[0];
}
$worksheet->write(18, 0,"broil on time (sec)");
$worksheet->write(18, 1, ($broil_on * 5) ) ;
#cbake on time
my $cbake_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE cbake > ? and unit = ?
") or die $dbh->errstr;
$cbake_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $cbake_on = 0;
while (my @cbake_row_array = $cbake_q->fetchrow_array) {
$cbake_on = $cbake_row_array[0];
}
$worksheet->write(19, 0,"cbake on time (sec)");
$worksheet->write(19, 1, ($cbake_on * 5) ) ;
#bake on time
my $bake_q = $dbh->prepare("
SELECT count(*)
FROM temp
WHERE bake > ? and unit = ?
") or die $dbh->errstr;
$bake_q->execute($cooktop_on,$_) or die $dbh->errstr;
my $bake_on = 0;
while (my @bake_row_array = $bake_q->fetchrow_array) {
$bake_on = $bake_row_array[0];
}
$worksheet->write(20, 0,"bake on time (sec)");
$worksheet->write(20, 1, ($bake_on * 5) ) ;
}
# Disconnect from database
$dbh->disconnect;
exit 0;
==== MySQL database access from VB.NET ====
{{capture_26052008_033505.png|}}
Access to MySQL from within the VB.NET application was provided pimarily to allow the database to be cleared after entering test data. However, any query could be run.
==== Graph generation ====
{{capture_26052008_033506.png|}}
Again, due to the massive amount of data, Excel was unable to create graphs as the engineers typically did. The solution was to create the graphs with yet another great opensource application [[http://www.gnuplot.info/|gnuplot]].
=== output graph ===
{{daqgraph.png?800|}}
=== data_graph.pl ===
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my @units_to_process = split(',', $ARGV[0]);
my $report_out = $ARGV[1];
#open file to write
open (MYFILE, '>out.txt');
my $driver = "mysql"; # Database driver type
my $database = "data"; # Database name
my $user = "root"; # Database user name
my $password = "PASSWORD"; # Database user password
# Connect to database
my $dbh = DBI->connect(
"DBI:$driver:$database",
$user, $password,
{
RaiseError => 1,
PrintError => 1,
}
) or die $DBI::errstr;
foreach (@units_to_process) {
#find count of records that this statement is true
#this will be used to find the usage time
my $sth = $dbh->prepare("
SELECT *
FROM temp
WHERE unit = ?
order by dt ASC;
") or die $dbh->errstr;
$sth->execute($_) or die $dbh->errstr;
my $seconds_on = 0;
while (my @columns = $sth->fetchrow_array) {
$columns[1] =~ s/ /-/;
print MYFILE "$columns[1] \t $columns[2] \t $columns[3] \t $columns[4] \t $columns[5] \t $columns[6] \t $columns[7] \t $columns[8] \t $columns[9] \n";
}
}
close(MYFILE);
# Disconnect from database
$dbh->disconnect;
exit 0;