Defining Significant Users

From StatsJam

Jump to: navigation, search

For most analyses, it is desirable to only analyze log files from significant users, where we consider a significant user to be someone who uses GIMP to get actual work done. That is, we would like to be able to filter out log files from people who merely try out ingimp and don't use it for real work.

This page explores how we can define the concept of a significant user. Feel free to add your own perspective and ideas on how to define this concept.

[edit] Significant User As Someone With 2+ Image Saves

One way to define a significant user is someone who applies changes to an image and saves the image. We can increase our confidence that this is a significant user by requiring an image to be saved within two different sessions. We'll explore this notion step-by-step.

Let's split logs into those that have a "save" command in them and those that don't.

Logs That Do (or Do Not) Have an Image Save in Them
Number of Saved LogsSaved %Number of Logs With No SaveNo Save %Total Number of LogsSaved + No Save Log Count (Sanity Check)
271653.0236847.050845084

At the time of this writing, about half the logs have saves in them and half don't.

Next, let's plot the number of "saves" in a log file versus the number of commands applied.

Number of Commands vs. Number of Saves Per Log File

From the plot above, it appears that as the number of commands goes up, so do the number of saves. Let's calculate basic summary statistics for the number of commands in a log file when there are no saves and at least one save:

Summary of Number of Commands Applied With No Saves
MeanMedianStd Dev
37.04.0121.0
Summary of Number of Commands Applied With At Least One Save
MeanMedianStd Dev
191.036.0504.0

Clearly there appears to be a link between saving a document and the number of commands applied in the document, which makes it a good indication of doing significant work. Let's require a user to have at least two log files with saves in them to be considered a "significant user." We'll filter them out and see what percentage meet this classification. The SQL:

SELECT
  users_who_saved_table.user_id,
  users_who_saved_table.num_logs_with_save
FROM
  (SELECT
    interaction_log.user_id,
    COUNT(interaction_log.user_id) AS num_logs_with_save
  FROM
    interaction_log,
    (SELECT
      DISTINCT interaction_log.log_num AS save_log_num
    FROM
      interaction_log,
      event_record
    WHERE
      interaction_log.log_num = event_record.log_num AND
      event_record.event_type_id = 7 -- This is the ID for an image save event
      AND user_id NOT IN (SELECT user_id FROM ingimp_developer_ids)
    ) AS logs_with_save_table
  WHERE
    interaction_log.log_num = logs_with_save_table.save_log_num
  GROUP BY
    interaction_log.user_id
  ) AS users_who_saved_table
WHERE
  num_logs_with_save > 1

And the stats that make use of that SQL:

Number of Significant Users% of ingimp Users
23327.0
Using this filter, approximately 27% of the ingimp installations meet these criteria.

Now let's see how many log files these users are responsible for:

Number of Logs by Significant Users% by Significant Users
379675.0
At the time of this writing, ~25% of the users are creating ~75% of the log files, again suggesting this metric as a good metric for describing significant users of ingimp.

Personal tools