Creating a Custom Report using Python

This tutorial describes a report that will read data in your file and calculate the average age at which males and females were married and the average age of fathers and mothers when their children were born. It reproduces the AppleScript report using the Python language. The report output is to a built-in GEDitCOM II report. When you are done this tutorial, you should be able to create your own custom reports using Python by changing the type of data collected and the format of the output report.

A major advantage of scripting GEDitCOM II with Python (instead of Apple Script or Ruby) is that the script can use the Python module for GEDitCOM II. The sample script uses that module to make the scripting easier.

Listing 1
#!/usr/bin/python
#
# Generational Ages Report Script
# 20 JUN 2010, by John A. Nairn
#	
# This script generates a report of average ages of all spouses when
# they got married and when their children were born.
#	
# The report can be for all spouses in the file or just for spouses
# in the currently selected family records.

# Load GEDitCOM II Module
from GEDitCOMII import *

################### Subroutines (described below)

################### Main Script

# Preamble
gedit = CheckVersionAndDocument("Generation Ages to Report (Python)",1.6,2)
if not(gedit) : quit()
gdoc = FrontDocument()

# choose all or currently selected family records
whichOnes = GetOption("Get report for All or just Selected family records",\
None,["All", "Cancel", "Selected"])
if whichOnes=="Cancel" : quit()

# Get of list of the choosen family records
if whichOnes==""All":
    fams = gdoc.families()
else:
    fams = GetSelectedType("FAM")

# No report if no family records were found
if len(fams)==0:
    Alert("No family records were selected")
    quit()

# Collect all report data in a subroutine
CollectAges(fams)

# write to report and then done
rpt = ScriptOutput(GetScriptName(),"html")
WriteToReport()
rpt.write()

Main Script

Listing 1 shows the entire main script, although crucial components of the script are done in subroutines that are given below. This section describes the logic of the main script.

The script starts with comment lines beginning in "#". It is a good idea to start all scripts with comments. If you share your scripts with other GEDitCOM II users or revist a script written a while ago, these comments can document use of the script.

The Load GEDitCOM II Module section must start all Python scripts that use the GEDitCOMII module. It loads all methods in the module. Some Python scripts may want to use Apple ScriptingBridge objects or even call Cocoa Foundation classes. If these will be used this section can import them as well using:

    from Foundation import *
    from ScriptingBridge import *

The preamble section gets started and verifies the script can run. All the work is done in the CheckVersionAndDocument() subroutine. This subroutine checks the that a document is open and if the current GEDitCOM II version is new enough for this script. It will return a reference to the GEDitCOM II application object if successful, or None if it fails. This script exits on failure. If it is OK to continue, the next line grabs a reference to the current front document (in gdoc using the FrontDocument() command).

You will often want your reports to have an option to be on the entire file or on just the selected records. To run a report on a subset of the file, a user selects the records first and then runs the script. The next three sections let the user choose the report target. First, the GetOption() command displays a box with three buttons for "All", "Cancel", or "Selected" to report on the entire file, to abort the script, or to report on the currently selected records, repectively. The "All" option, which is first, is the default option (user can hit return to use that option). GetOption() returns the text of the selected option.

Once the user decides which records to use, the next section compiles all needed records into a list variable (fams). This report is reading ages of fathers and mothers and thus only needs to look at family records. If the user selects "All", the list is found by reading families() from the front document (gdoc). If "Selected" is chosen instead, the script fetches family records within the currently selected records of the front document. This common task done by the GetSelectedType() command.

Finally, once all family records are in the fams list variable, the length of that list is checked (len(fams)). If it has no elements, there is no need to proceed and the script exits with a message that "No family records were selected" (using Alert() command). Otherwise the script continues.

The final section is the main part of the script, but all work is done in two subroutines. First the CollectAges() subroutine extracts all needed age information from the provided list of family records and stores the results in global variables. Next, a WriteToReport() subroutine formats the report for output to the user. This report creates and html document. The ScriptOutput() class from the GEDitCOMII module helps in this task and also takes care of displaying the report with the rpt.write() command.

CollectAges() Subroutine

Listing 2
# Collect data for the generation ages report
def CollectAges(famList):
    global numHusbAge,sumHusbAge,numFathAge,sumFathAge
    global numWifeAge,sumWifeAge,numMothAge,sumMothAge
    global gdoc

    # initialize counters
    numHusbAge=sumHusbAge=numFathAge=sumFathAge=0
    numWifeAge=sumWifeAge=numMothAge=sumMothAge=0
    
    # progress reporting interval
    fractionStepSize=nextFraction=0.01
    numFams=len(famList)
    for (i,fam) in enumerate(famList):
        # read family record information
        husbRef = fam.husband()
        wifeRef = fam.wife()
        chilList = fam.children()
        mdate = fam.marriageSDN()
        
        # read parent birthdates
        hbdate = wbdate = 0
        if husbRef != "":
            hbdate = husbRef.birthSDN()
        if wifeRef != "":
            wbdate = wifeRef.birthSDN()
        
        # spouse ages at marriage
        if mdate>0:
            if hbdate>0:
                sumHusbAge = sumHusbAge + GetAgeSpan(hbdate,mdate)
                numHusbAge = numHusbAge+1
           
            if wbdate>0:
                sumWifeAge = sumWifeAge + GetAgeSpan(wbdate,mdate)
                numWifeAge = numWifeAge+1
                
        # spouse ages when children were born
        if hbdate > 0 or wbdate > 0:
            for chilRef in chilList:
                cbdate = chilRef.birthSDN()
                if cbdate > 0 and hbdate > 0:
                    sumFathAge = sumFathAge + GetAgeSpan(hbdate,cbdate)
                    numFathAge = numFathAge + 1
                if cbdate > 0 and wbdate > 0:
                    sumMothAge = sumMothAge + GetAgeSpan(wbdate,cbdate)
                    numMothAge = numMothAge + 1
                    
        # time for progress
        fractionDone = float(i+1)/float(numFams)
        if fractionDone > nextFraction:
            ProgressMessage(fractionDone)
            nextFraction = nextFraction+fractionStepSize

This subroutine (see Listing 2) collects all data on ages from the information in your file. It is where most of the work of this script is done; the work is done by interaction with your data through GEDitCOM II's scripting objects and their properties.

The first section defines and initializes global variables. These variables will be accessed elsewhere in the script to format the report, which is why they are defined as global. The variables fractionStepSize, nextFraction, and numFamsare local variables used for tracking progress of the script and are discussed more below.

The for loop enumerates over all family records passed to this subroutine. The loop starts by reading data from the family record - namely references to the husband and wife records (in husbRef and wifeRef), a list of all children records (in chilList), and the marriage date (in mdate). The marriage date, like all dates in this script, is read as a serial day number (using built in SDN properties), which is a day number starting with 1 back around 4000 B.C.. Serial day numbers are ideal for date calculations such as finding years between dates. These SDN attributes return the serial day number for a date or return 0 if the date is either not known or if the date in the file has an invalid date.

The next section reads the parents' birth dates. From above husbRef and wifeRef are references to the parents in this family or either could be an empty string meaning the record does not have that spouse. For each spouse that is in the family record, this section reads their birth serial day numbers using properties of their individual records, otherwise the dates will be zero.

The next two sections do the date calculations for this script. First are the calculations for ages of each parent at the time of marriage. This calculation can only be done if both a spouse's birth date and the family's marriage date are known. Thus if both serial day numbers are greater then zero, the age is calculated (using a utility method called GetAgeSpan()). The global variables numHusbAge and numWifeAge count the number of age calculations done. The sumHusbAge and sumWifeAge variables hold a sum of all ages. When this subroutine is done, the sum variable divided by the num variable will be the average age.

The age at child birth section is similar. It contains a loop over all children in the family. For each child, it looks for their birth date. If a birth date is found, the ages of each parent with a known birth date are added to global variables analogous to the num and sum variables in the previous section. This entire section is enclosed in a conditional that says to do these calculations only if at least one parent birthdate is known.

The last section of the loop informs the user of the script progress using the ProgressMessage() command.

When the repeat loop is done, the global variables (e.g., numHusbAge, sumHusbAge, etc.) will contain all data needed to output the report. The subroutine ends and returns control to the main script. The next section explains formatting of the output report.

Listing 3
# Write the results now in the global variables to a
# GEDitCOM II report using <html> style
def WriteToReport():
    # begin report with <h1> for title
    fname = gdoc.name()
    rpt.out("<h1>Generational Age Analysis in " + fname + "</h1>\n")

    # start table, add caption and header, stary body
    rpt.out(MakeTable("begin","caption",\
    "Summary of spouse ages when married and when children were born"))
    rpt.out(MakeTable("head",["Age Item","Husband","Wife"],"body"))
    
    # rows for ages when married and when children were borm
    InsertRow("Avg. Age at Marriage", numHusbAge, sumHusbAge,\
    numWifeAge, sumWifeAge)
    InsertRow("Avg. Age at Childbirth", numFathAge, sumFathAge,\
    numMothAge, sumMothAge)

    # end the <tbody> and <table> elements
    rpt.out(MakeTable("endbody","end"))

WriteToReport() Subroutine

Formatting a report for output in GEDitCOM II means to format the data using html elements all enclosed within a single div element. You can use any html methods you want. Here the report title is put in an h1 section element and all results are placed in a table element. The subroutine to create this report is in Listing 3.

The processes is made easier by the ScriptOutput class in the GEDitCOMII module (the rpt variable) along with some other convenience methods. The ScriptOutput class handles common elements; the script only needs to add the script-dependent lines.

The process is straightforward, assuming you understand html elements. A name for the report is put into an h1 element; the name includes the file name. All data is in a three-column table where the first column labels the data and the other two columns give results for husbands and wives. The table starts with a caption for the table. The data is converted to a table using the MakeTable() function. The body of the table has two rows to report results for average ages at marriages and average ages when children were born. These rows are formatted using a custom InsertRow() subroutine. Finally, the table elements are closed and the report is done.

InsertRow(rowLabel, numHusb, sumHusb, numWife, sumWife) Subroutine

InsertRow()
# Insert table row with husband and wife results
def InsertRow(rowLabel, numHusb, sumHusb, numWife, sumWife):
    tr = [rowLabel]
    if numHusb > 0:
        tr.append("{0:.2f}".format(sumHusb / numHusb))
    else:
        tr.append("-")
    if numWife > 0:
        tr.append("{0:.2f}".format(sumWife / numWife))
    else:
        tr.append("-")
    rpt.out(MakeTable("row l r r",tr))

This subroutine formats each row of the table. The input parameters are a label for the row and numerical results to be averaged and displayed in the table. The only catch is that numHusb or numWife might be zero if no individuals suitable for averaging were found in the CollectAges() subroutine. Since we do not want to divide by zero, this special case is trapped and the table cell is loaded with "-" rather then a calculated average. Average ages are displayed using two digits after the decimal by using the format() function. The results are put into a table row with the second and third cells right justified by using the MakeTable() function.