For most analyses, it is desirable to only analyze log files from significant users, where we consider a significant user to be someone who uses GIMP to get actual work done. That is, we would like to be able to filter out log files from people who merely try out ingimp and don't use it for real work.
This page explores how we can define the concept of a significant user. Feel free to add your own perspective and ideas on how to define this concept.
[edit] Significant User As Someone With 2+ Image Saves
One way to define a significant user is someone who applies changes to an image and saves the image. We can increase our confidence that this is a significant user by requiring an image to be saved within two different sessions. We'll explore this notion step-by-step.
Let's split logs into those that have a "save" command in them and those that don't.
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
save_log_num AS "Number of Saved Logs",
ROUND(save_log_num * 100.0 / total_num_logs) AS "Saved %",
no_save_log_num AS "Number of Logs With No Save",
ROUND(no_save_log_num * 100.0 / total_num_logs) AS "No Save %",
total_num_logs AS "Total Number of Logs",
save_log_num + no_save_log_num AS "Saved + No Save Log Count (Sanity Check)"
FROM
(SELECT
COUNT(DISTINCT interaction_log.log_num) AS save_log_num
FROM
event_record,
interaction_log
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS logs_with_save_table,
(SELECT
COUNT(DISTINCT interaction_log.log_num) AS no_save_log_num
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT log_num AS save_log_num
FROM
event_record
WHERE
event_record.event_type_id = 7 -- This is the ID for an image save event
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS logs_with_no_save_table,
(SELECT
COUNT(DISTINCT interaction_log.log_num) AS total_num_logs
FROM
interaction_log
WHERE
user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS total_logs_table
</query>
<view type="table" caption="Logs That Do (or Do Not) Have an Image Save in Them"/>
Logs That Do (or Do Not) Have an Image Save in Them
Number of Saved Logs
Saved %
Number of Logs With No Save
No Save %
Total Number of Logs
Saved + No Save Log Count (Sanity Check)
2716
53.0
2368
47.0
5084
5084
At the time of this writing, about half the logs have saves in them and half don't.
Next, let's plot the number of "saves" in a log file versus the number of commands applied.
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
num_doc_events AS "Number of Commands",
num_saves AS "Number of Saves"
FROM
((SELECT
interaction_log.log_num,
COUNT(interaction_log.log_num) AS num_saves
FROM
interaction_log,
event_record
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
interaction_log.log_num
) UNION
(SELECT
DISTINCT interaction_log.log_num,
0 AS num_saves
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT log_num AS save_log_num
FROM
event_record
WHERE
event_record.event_type_id = 7 -- This is the ID for an image save event
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS save_counts_table,
((SELECT
event_record.log_num,
COUNT(event_record.entry_num) AS num_doc_events
FROM
event_record,
interaction_log
WHERE
interaction_log.log_num = event_record.log_num
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
event_record.log_num)
UNION
(SELECT
interaction_log.log_num,
0 AS num_doc_events
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT event_record.log_num
FROM
event_record
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS command_counts_table
WHERE
command_counts_table.log_num = save_counts_table.log_num
</query>
<view type="scatter" caption="Number of Commands vs. Number of Saves Per Log File"/>
Number of Commands vs. Number of Saves Per Log File
From the plot above, it appears that as the number of commands goes up, so do the number of saves. Let's calculate basic summary statistics for the number of commands in a log file when there are no saves and at least one save:
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
ROUND(AVG(num_doc_events)) AS "Mean",
ROUND(MEDIAN(num_doc_events)) AS "Median",
ROUND(STDDEV(num_doc_events)) AS "Std Dev"
FROM
((SELECT
interaction_log.log_num,
COUNT(interaction_log.log_num) AS num_saves
FROM
interaction_log,
event_record
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
interaction_log.log_num
) UNION
(SELECT
DISTINCT interaction_log.log_num,
0 AS num_saves
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT log_num AS save_log_num
FROM
event_record
WHERE
event_record.event_type_id = 7 -- This is the ID for an image save event
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS save_counts_table,
((SELECT
event_record.log_num,
COUNT(event_record.entry_num) AS num_doc_events
FROM
event_record,
interaction_log
WHERE
interaction_log.log_num = event_record.log_num
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
event_record.log_num)
UNION
(SELECT
interaction_log.log_num,
0 AS num_doc_events
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT event_record.log_num
FROM
event_record
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS command_counts_table,
interaction_log
WHERE
command_counts_table.log_num = save_counts_table.log_num AND
command_counts_table.log_num = interaction_log.log_num AND
num_saves = 0
</query>
<view type="table" caption="Summary of Number of Commands Applied With No Saves"/>
Summary of Number of Commands Applied With No Saves
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
ROUND(AVG(num_doc_events)) AS "Mean",
ROUND(MEDIAN(num_doc_events)) AS "Median",
ROUND(STDDEV(num_doc_events)) AS "Std Dev"
FROM
((SELECT
interaction_log.log_num,
COUNT(interaction_log.log_num) AS num_saves
FROM
interaction_log,
event_record
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
interaction_log.log_num
) UNION
(SELECT
DISTINCT interaction_log.log_num,
0 AS num_saves
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT interaction_log.log_num AS save_log_num
FROM
event_record
WHERE
event_record.event_type_id = 7 -- This is the ID for an image save event
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS save_counts_table,
((SELECT
event_record.log_num,
COUNT(event_record.entry_num) AS num_doc_events
FROM
event_record,
interaction_log
WHERE
interaction_log.log_num = event_record.log_num
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
event_record.log_num)
UNION
(SELECT
interaction_log.log_num,
0 AS num_doc_events
FROM
interaction_log
WHERE
interaction_log.log_num NOT IN (
SELECT
DISTINCT event_record.log_num
FROM
event_record
)
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
)) AS command_counts_table,
interaction_log
WHERE
command_counts_table.log_num = save_counts_table.log_num AND
command_counts_table.log_num = interaction_log.log_num AND
num_saves > 0
</query>
<view type="table" caption="Summary of Number of Commands Applied With At Least One Save"/>
Summary of Number of Commands Applied With At Least One Save
Mean
Median
Std Dev
191.0
36.0
504.0
Clearly there appears to be a link between saving a document and the number of commands applied in the document, which makes it a good indication of doing significant work. Let's require a user to have at least two log files with saves in them to be considered a "significant user." We'll filter them out and see what percentage meet this classification. The SQL:
SELECT
users_who_saved_table.user_id,
users_who_saved_table.num_logs_with_save
FROM
(SELECT
interaction_log.user_id,
COUNT(interaction_log.user_id) AS num_logs_with_save
FROM
interaction_log,
(SELECT
DISTINCT interaction_log.log_num AS save_log_num
FROM
interaction_log,
event_record
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS logs_with_save_table
WHERE
interaction_log.log_num = logs_with_save_table.save_log_num
GROUP BY
interaction_log.user_id
) AS users_who_saved_table
WHERE
num_logs_with_save > 1
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
num_sig_users AS "Number of Significant Users",
ROUND(num_sig_users * 100.0 / total_num_users) AS "% of ingimp Users"
FROM
(SELECT
COUNT(users_who_saved_table.user_id) AS num_sig_users
FROM
(SELECT
interaction_log.user_id,
COUNT(interaction_log.user_id) AS num_logs_with_save
FROM
interaction_log,
(SELECT
DISTINCT interaction_log.log_num AS save_log_num
FROM
interaction_log,
event_record
WHERE
interaction_log.log_num = event_record.log_num AND
event_record.event_type_id = 7 -- This is the ID for an image save event
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS logs_with_save_table
WHERE
interaction_log.log_num = logs_with_save_table.save_log_num
GROUP BY
interaction_log.user_id
) AS users_who_saved_table
WHERE
num_logs_with_save > 1
) AS significant_users_table,
(SELECT
COUNT(user_id) AS total_num_users
FROM
user_list -- a view
) AS user_count_table
</query>
<view type="table"/>
Number of Significant Users
% of ingimp Users
233
27.0
Using this filter, approximately 27% of the ingimp installations meet these criteria.
Now let's see how many log files these users are responsible for:
This form gives you a chance to modify the query and quickly see
the results but it does not update the wiki page..
<query>
SELECT
num_sig_logs AS "Number of Logs by Significant Users",
ROUND(num_sig_logs * 100.0 / total_num_logs) AS "% by Significant Users"
FROM
(SELECT
COUNT(log_num) AS num_sig_logs
FROM
interaction_log,
(SELECT
users_who_saved_table.user_id
FROM
(SELECT
interaction_log.user_id,
EXTRACT(EPOCH FROM (MAX(interaction_log.log_date)-MIN(interaction_log.log_date)))/60/60/24 AS num_days_diff
FROM
interaction_log,
(SELECT
DISTINCT event_record.log_num
FROM
event_record
WHERE
event_record.event_type_id = 7 -- This is the ID for an image save event
) AS logs_with_save_table
WHERE
interaction_log.log_num = logs_with_save_table.log_num
AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
GROUP BY
interaction_log.user_id
) AS users_who_saved_table
WHERE
num_days_diff >= 1
) AS sig_users_table
WHERE
interaction_log.user_id = sig_users_table.user_id) AS sig_logs_table,
(SELECT
COUNT(DISTINCT interaction_log.log_num) AS total_num_logs
FROM
interaction_log
WHERE
user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
) AS total_logs_table
</query>
<view type="table"/>
Number of Logs by Significant Users
% by Significant Users
3796
75.0
At the time of this writing, ~25% of the users are creating ~75% of the log files, again suggesting this metric as a good metric for describing significant users of ingimp.