Development and implementation of R packages for standardization of data and computation of demographic indicators for DemoData MS-SQL database
1. Background and objectives:
The Making Family Planning Count 2.0 project is a three-year initiative of the United Nations Population Division that will improve the quantity, quality, transparency and accessibility of family planning indicators and underlying population data and will expand scientific knowledge on family planning and population issues and will make the resulting outputs accessible to policymakers, planners and other users. The consultancy is related to the Component 3: Improved web-based dissemination of family planning indicators and underlying population data used to generate them.
This component will also strengthen on-going efforts from the previous grant Making Family Planning Count 1.0 to respond to the global community’s demand for detailed family planning and population data by providing access to all empirical evidence used to inform the estimation process, and to expand further the historical repository of demographic data and census and survey reports collected by the Population Division over the last 60 years, including estimates and projections of the annual number of births, total fertility, age-specific fertility rates, population structures by age and sex, life expectancy and mortality rates by age and sex.
This effort involves further evaluation of these data, and to apply various data transformation procedures to standardize the different types of data available in order to enable the computation of vital rates by age and sex, and the application of a variety of demographic methods to prepare internally consistent demographic estimates. This step requires the development of R packages for the routine application of these methods within the IT production environment of the Population Division and the DemoData database used as centralized repository of empirical demographic data, and used for statistical modeling and to derive robust time trend estimates for various demographic indicators. The R packages will ensure that the data and materials to generate modeled estimates and projections are fully transparent and freely accessible.
2. Scope of work
The consultant will work closely with the UN Population Division staff, and assist the Division in improving a set of tools to use with the DemoData platform within the five months available.
The scope of work will focus on the further development of two R packages (DemoTools and DDSQLtools) initiated in 2017 by the Population Division for data quality evaluation, data standardization and computation of demographic indicators, and interaction with the DemoData SQL database used by the Population Division.
The DemoData project established a SQL database designed to store the complex and often unstructured empirical data typical for empirical demographic information. The database is also designed to allow efficient storage and retrieval of demographic data. The usefulness of the system depends on the extent to which it covers all available empirical data sources and the data are checked for quality issues and standardized and the demographic indicators are correctly computed.
Empirical demographic data imported from electronic databases (such as United Nations Demographic Yearbook), collected from paper-based statistical tabulations, or from tabulations available at websites of national statistical offices or relevant ministries need to be checked for quality issues, standardized and adjusted. Additional demographic indicators need to be computed, for instance, from the data on births and deaths by age and sex, vital rates by age and sex and other indicators need to be calculated and imported into a relational database (DemoData).
The following components have to be developed and implemented to work with demographic data (e.g., population counts, births and deaths by age and sex) retrieved from DemoData in a structured database format (e.g., long format with one value per record and codified metadata attributes in columns), either from a standalone ASCII flat file or through a public web API:
a. Enhance online documentation, including create vignettes with sample use of functions, and harmonize arguments and parameters for DemoTools R package for demographic data quality evaluation, data standardization and adjustment.
b. Implement unit testing for all DemoTools R functions against expected reference textbook examples.
c. Expand DemoData SQL tools (DDSQLtools R package) with an enhanced set of R functions to interact with MS-SQL and DemoData API for data checking and validation, and application of appropriate set of evaluation and transformation functions based on logical conditions and user-defined criteria.
This scope of work requires skills in R programming and substantive knowledge of demographic methods, and data analysis. The task requires additional advanced scientific computing skills, and computer programming staff time, both of which are not currently available in the Population Division.
The R language has become a de facto standard among statisticians for the development of statistical software. R is widely used for statistical software development and data analysis. R is part of the GNU project, and its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems, including MS-Windows. R can be used in a client-server environment, as well for web-based statistical applications. It is currently used within the Population Division for various demographic computational tasks involving demographic estimates and projections, and data visualization.
For each milestone implementation, intermediate development versions will be deployed on-site for inputs and testing by staff of the Population Division. The Consultant will ensure the quality of the new implementation through regular code review, compliance with UN requirements, and quality assurance validations that the implementation is error-free. The consultant will incorporate users’ feedback as much as possible within the scope of this consultancy.
3. Tasks
The following activities will be undertaken by the Consultant under guidance and in collaboration with the UN Population Division staff:
Tasks #1 (50 days):
A. Improve internal documentation, including harmonize arguments and parameters for DemoTools R functions for demographic data quality evaluation, data standardization and adjustment, and enhance online documentation.
B. Implement initial unit testing for top level DemoTools R functions against expected reference textbook examples.
C. Develop for DDSQLtools some Read/Write functions to interact with DemoData SQL API, and enhance the set of R functions to check and validate demographic data from DemoData.
Tasks #2 (60 days):
A. Create vignettes with sample use of functions.
B. Implement further unit testing for error handling and argument checking for functions of DemoTools.
C. Expand DDSQLtools for the application of appropriate set of evaluation and transformation functions from DemoTools based on logical conditions and user-defined criteria.
4. Duration of the contract
The duration of the contract will be from 15 August 2018 to 31 December 2018; within this period, the Consultant will work for 110 days.
5. Duty Station or Location of Assignment
The contractual work will be carried off-site. Interactions with the Population Division will be done by e-mail, video-conference and telephone.
6. Travel
Not applicable.
7. Expected outputs and delivery dates
Tasks 1: Completion by 15 October 2018 of Tasks 1A, 1B, 1C. The Consultant will provide a progress report (in Word or similar text format) with the source code and documentation of the two updated R packages (DemoTools and DDSQLtools) delivered through email or a web repository.
Tasks 2: Completion by 31 December 2018 of Task 2A, 2B, and 2C. The consultant will provide the final report (in Word or similar text format) with the source code and documentation of the two enhanced R packages (DemoTools and DDSQLtools) delivered through email or a web repository.
NOTE: All work is considered to be the sole property of the United Nations.
8. Fee and payment schedule
Consultant’s fees will be in accordance with the standard UN rates. Contract duration will be for 110 days from 15 August 2018 through 31 December 2018. The Consultant will receive a total remuneration of $41,800 for 110 days @ $380 per day. The fee will be payable in two installments upon successful completion of the assigned tasks.
Installment 1: $ 19,000 upon successful completion of tasks 1 by 15 October 2018.
Installment 2: $ 22,800 upon successful completion of tasks 2 by 31 December 2018.
9. Performance Indicators
The consultancy work will be evaluated according to the following criteria:
a. Timeliness of the development work provided;
b. Relevance of the development work for the work program of the Division, with particular reference to the progress on implementation of the project “Making Family Planning Count”;
c. Quality of the software programs developed in terms of code review, functionality for their purpose and stability in everyday use (error trapping, usability).
d. Quality, completeness and clarity of the written documentation submitted, as assessed by the supervisor; receptivity to staff feedback.
10. Qualifications
Advanced university degree (Master’s degree or equivalent) in demography or in sociology, statistics or economics with an emphasis on demography. A minimum of seven years of experience in population analysis and research. At least two years of professional experience with the development of R functions or packages for the analysis of demographic data is desirable.
11. Duty station
The consultancy work will be carried out off-site. Interactions with the Population Division will be done by e-mail, Skype video conference and telephone.