Basic Excel Trainer

Revisi per 3 Maret 2014 17.16 oleh Catriona (bicara | kontrib) (Created page with "<translate> = Building Beautiful Data in Basic Excel (Trainer Manual) = <!--T:1--> == Preface == === Background and context === The Australia Indonesia Partnership for Emergin...")
(beda) ← Revisi sebelumnya | Revisi terkini (beda) | Revisi selanjutnya → (beda)

Building Beautiful Data in Basic Excel (Trainer Manual)

Preface

Background and context

The Australia Indonesia Partnership for Emerging Infectious Diseases (AIP-EID) Program aims to enhance the capacity of the Indonesian animal health system to prevent, detect and control emerging infectious diseases, including disease in humans that originate from animals. An assessment of Indonesia’s animal health information needs and capabilities was conducted from May to August 2012. A desire to increase the speed of data collection and analysis, and to integrate multiple different data sources was identified as part of the assessment. In response to the assessment, a single integrated database (iSIKHNAS), with multiple flexible portals for data submission and data access, including email, SMS, Web, direct software connections and custom applications on handheld devices is being developed. Excel is already widely used for local data management, and will be used (in conjunction with email) as a key part of the data management, capture and transfer system of iSIKHNAS. This Basic Excel training course is the first of a series of Excel training courses that will develop skills in data management and Excel use.

Issues specific to training in Excel

Excel skills and concepts can be described in a manual, presented in a lecture format, or demonstrated 'live' via on-screen projection, but they are unlikely to be completely assimilated until the participants have had the opportunity to perform the skill or task themselves on a computer, ideally supported by a facilitator - learning Excel is a hands-on process. Participants will have different learning styles, and no one method of presenting Excel concepts will suit all participants. Some people like to see a concept demonstrated before trying it themselves, whilst others will be frustrated and bored with this (relatively slow) process and would prefer to follow steps detailed in a manual.

Trainers may find that their approach to teaching the same Excel course varies quite considerably between groups, depending on:

  • the number of participants
  • the number of facilitators
  • the overall level of experience of the group with the topics
  • the overall preferred learning styles of the group
  • the time available for teaching.

How to use this manual

This manual supplements the material provided in the Particants Manual. It provides background material, and suggests training approaches for each session. The trainer should be very familiar with the contents of both the Participant and Trainer Manuals.

Proposed course structure/timetable

Day 1
Session Content
i Welcome ceremony

Introductions

1 Principles of Data Management
2 Principles of Data Management
3 Basic Excel Skills
4 Basic Excel Skills


Day 2
Session Content
1 Cleaning Messy Data
2 Cleaning Messy Data
3 Preparation of Useful Outputs
4 Wrap up/consolidation/debrief

Principles of Data Management

The purpose of this activity is to determine the level of expertise each participant has with Excel, in order to pair up those with more expertise with those who have less experience.

Desired outcomes of this activity:

  • description of how each participant uses Excel
  • initial assessment of each participant's Excel experience/ability
  • suggested pairings of participants.

There are many ways to approach this exercise. We suggest that it be based on an interactive discussion. Trainers can use this as an opportunity to help participants become more comfortable and familiar with each other, and with the facilitators. It is an opportunity for participants to share their expectations (and any concerns) regarding the workshop. This session may relieve any fears they might have of being able to cope with the material, as the less experienced people will be paired with more experienced people. Those who have existing skills should feel that their knowledge and experience is valued, and can be encouraged to develop their own teaching and explaining skills, so as to take something back to their workplace. Trainers could also make this session more active and physical than the later, hands-on sessions, which will be seated and intense. For example, participants could be lined up in a (loose) order of experience with Excel, then numbered off '1,2,1,2,...'. The 1s take one step to the left, and stand still. The 2s march around in a semi-circle until they have reversed their positions (the last person in the 2 line is now next to the first person in the 1 line). The end result should be that the most experienced Excel people are paired with the least experienced. The ones in the middle will be much the same, and they should be able to work synergistically together (that is, the pair can learn better together than either one can alone).

If two trainers are present, we suggest that Trainer A leads and facilitates the discussion, and Trainer B records brief notes about how Excel is being used, as this is used in the next section.

Use of Excel

This segment is the closest thing to a presentation or lecture in this course. However, it can still be made relevant and personal, and involve some of (your) hands-on Excel. It could be approached more as a focussed discussion than as a lecture.

Based on the findings/outcomes of the interactive discussion, the trainer will summarise and group the ways that people are using Excel.

These usually fall into three main categories:

  • spreadsheet
  • word processor
  • database.

Not everyone will know what is meant by these terms, so take some time to explain what we mean in this situation, and to provide examples of such use. Explain the advantages/disadvantages of using Excel for these purposes.

Spreadsheets

Computer spreadsheet applications began as electronic versions of paper-based accounting worksheets, which organised data into rows and columns for comparison and analysis.

Examples of the use of spreadsheets in animal health management include:

  • calculations (a spreadsheet can perform both simple and complex calculations)
  • budget development
  • simple disease or economic modelling
    • exploring 'what-if' (hypothetical) scenarios.
Demonstration

Where possible, you might like to demonstrate some 'quick and dirty' demonstrations of these examples. Make sure to mention to the participants that in these demonstrations, you are not teaching them skills, and you not expecting them to follow how you did things, or to repeat them themselves without further explanation!

Here we describe one possible approach - feel free to use another of your own choosing.

Begin calculating the numbers of people who will be able to attend a workshop dinner (if there is one - if not, think of another example). Have the list of participant names all prepared, then enter a 1 or 0 against them to signify attendance. Then total the column to give the number of attendees (using Excel as a calculator). Next, begin to develop a budget - suggest three types of restaurants, and ask participant preferences (enter 1,2, or 3 in the next column). Have a price for each type of restaurant. Depending on how the exercise is progressing, you could ask if participants have any suggestions about how to proceed for budget development. Explain that there are quite a number of ways of achieving the same outcome, and that some are quite simple, and others more complex, allowing exploration of 'what-if' scenarios. Tell them that in the NEXT workshop, you will be able to work together on some of the more complex approaches, but for now, you will do it very simply, using only those skills they will explore in this workshop. Make a copy of the preference list in the next column, because you may need it again later. Then, use 'find-and-replace' to replace the restaurant code with the appropriate price. Add up the total - now we have a budget! You can also explore 'what-if' scenarios, using the same approach with different prices on the copied restaurant preference code column. Participants may wish to tell you that there are better ways to do this. Ask them to describe these, and to explain why they are better. Do they other participants feel the same way? Have different suggestions? Note whether the skills will be covered in this, or in a subsequent workshop.

Discussion

Benefits of using an electronic (rather than paper-based) spreadsheet for these purposes are:

  • the data are easily edited
  • can use formulae to do calculations based on the spreadsheet data
    • can change an entry, and the results will be automatically and instantly updated
    • eliminates some repetitive tasks
  • formatting of the columns and rows can be used to make the spreadsheet more understandable
    • formatting can easily be changed.

Excel is an electronic spreadsheet. Spreadsheet tasks (calculations based on lots of figures) are what it is designed to do, and what it does best. Of course, as we have already discussed, it is also used for other purposes.

Word processor

Using Excel as a word processor implies that Excel is being used only for its presentation and formatting abilities, and not (or not primarily) for its organisational and calculating abilities. In a typical scenario where Excel is being used as a word processor, the data presented in the table is first entered and summarised elsewhere, then the summarised data are copied to a new worksheet to be formatted as required. The formatted table may be printed for inclusion in a hard-copy report, or it might be copied to a 'true' word processor.

Demonstration

Be careful with this one. It is very easy to spent far too much time on formatting/presentation. Pay attention to the time!

  • Advise the participants that you are required to prepare a report about the workshop. One of the outputs required is a summary of the gender breakdown, of both participant and facilitators.
  • Have the participants work out what the numbers will be.
  • You will need three rows (Facilitators, Participants, Totals), and three columns (Males, Females, Totals).
  • Develop a heading for your table - make is simple, but make sure it includes the key elements of who, what, when, where!
  • Ask the participants to advise on how the table should look.
    • Spend only a little time on this, but make it fun!
    • Do NOT, under any circumstances, merge any cells. If you are asked to do so, apply formatting - align across the selection.
Discussion

Benefits of this approach are:

  • appropriately-formatted outputs can be created.

Disadvantages of using Excel only as a word processor include:

  • inefficiencies when changes to the data occur
    • unless the summary table data are linked to the underpinning dataset, changes in this dataset means that the table data must be manually updated
    • it is very easy to forget to do this, or to make transcription errors
    • if there are many tables in a report, or if changes to the dataset occur frequently, updating the tables can be very time-consuming
  • word processors are generally better at word processing than are spreadsheet programs.

Databases

We will not do a demonstration of this section - the skills required are beyond the scope of this beginning course. In addition, placing the course in context, we are not planning to train people to use Excel as a database - iSIKHNAS will take care of the storage and management of great quantities of data, and of managing relationships between data. We are training people in Excel so that they can quickly, easily, and accurately manipulate data for submission to iSIKHNAS, where necessary, and to add value to data extracted from iSIKHNAS, by preparing useful, informative, and well-presented outputs.

Discussion

Databases are organised electronic collections of data. They are designed to allow efficient storage, management and retrieval of data. A relational database is structured so that each piece of data is saved in only one place, which makes it easier and quicker to update than if there were many copies of the data to modify.

Examples of some cases where Excel might be used like a database include recording (storing):

  • laboratory test results
  • disease observations.

Nevertheless, Excel is not good at some of the things that a relational database does well, like:

  • storage of large amounts of information (there are restrictions to the number of records that can be stored)
  • supporting multiple users who wish to view or change data at the same time
  • maintaining data integrity
    • identifying data errors (an error in a single cell can make the entire spreadsheet unusable, but it can be very difficult to track down the problem, or identify how and when the problem was caused)
  • handling relationships between pieces of data.

Features of a relational database include:

  • reduction in, or elimination of, data duplication
  • copes with relationships between pieces of data (for example, a record in one table may be linked to many records in another table)
  • easy to change data format
  • data can be added or removed easily
  • easy to modify (edit) data
  • easy to maintain data integrity
  • access to data can be restricted
  • can monitor who has done what, and when, with data (audit trail for changes)
  • data can be shared easily and multiple users can access it at the same time.

Excel is NOT a database. However, there are many good reasons to use Excel in conjunction with a relational database:

  • Excel is relatively easy to use
    • new users can get started using Excel quickly
    • many people have at least some familiarity with Excel
  • Excel is widely available
  • data from a database can easily be exported to Excel, where
    • the data can be analysed
    • calculations can be performed
    • tables can be created (based on the underpinning dataset, so using more than Excel's formatting capabilities)
    • graphs can be produced
  • Excel can be used to prepare data for submission to a database.

Other

This section can be omitted, depending on the skill level of the group, and the time available.

Explain that there are many Excel templates that can be downloaded from the internet; these are an example of just two of many possible categories:

  • project management
  • agenda development.

Activity - examine an Excel workbook

Groups could be based on single or multiple pairs.

This exercise can be demonstrated by Trainer A. Trainer B should circulate amongst the participants, providing help where needed.

The time required for this exercise will be entirely dependent on the overall prior experience of the group, so could vary widely.

However, everyone should do the entire exercise, even if they think they are very good at Excel. If you do have people who are experienced, so shouldn't really be in the course, have them review the material and suggest items that have not been covered, or note down their favourite Excel tips.

Activity - inspect sample outputs

Pivot tables are one of Excel's most powerful and useful features. They allow users to create accurate, sophisticated and well-presented tables and charts without having to understand or create formulas of any kind.

However, they do require that the underpinning data is correctly structured. As the structure required for pivot tables is essentially the same as that required for export of data from Excel to other systems (for example, to a centralised database), the habits developed in preparing data for pivot tables make it easy to then use the data elsewhere, and so reduces the need for double-handling of information.

In this exercise, the participants are not learning how to create a table or chart from the beginning. However, they should begin to learn to manipulate an existing table or chart, and think about how they could use pivot tables in their own work. This exercise gives them something to look forward to - by the end of the workshop, they will be able to set up their own pivot tables and charts!

Objectives for good data management

Based on the previous group activities, derive some 'big-picture' objectives for data management. Use discussion to try to elicit these, checking in with each group to see what they observed.

Look for:

  • single handling of data
    • What do we mean by this? What are some examples of handling data multiple times?
  • automated analyses, to produce many different results quickly
  • creation of outputs for
    • reports (common need)
    • decision-making (can anyone think of examples?).

Pivot table requirements

Ask the participants to suggest how data must be managed/structured for pivot tables:

  • data must be organised in a table format - in rows and columns
  • there must be a header row
  • each heading entry must be unique
  • if the dataset is to be uploaded elsewhere, there may be restrictions on the format of the header entries
    • for example, no spaces, and there may be a maximum length
  • there must be no completely blank rows or columns
  • no merged cells
  • each record should be on a single row
  • each column should contain the same type of data
    • for example, all dates should be expressed the same way
  • single piece of information per column
    • separate column for each piece of data that will be analysed
  • source data must be separated from other data
    • preferably on its own worksheet
    • at the very least, separated by a row and a column from anything else
  • no hidden rows (they probably won't suggest this; could be left for advanced course)
  • no totals or subtotals rows or columns
  • no columns that are calculated based on other columns
    • can lead to errors in pivot table calculations
    • best to let pivot tables do any required calculations

Computer Management

Note that although we will mostly be discussing Excel, good data management has to include the environment in which we use Excel, so we will now talk briefly about computer management.

Organisation of documents

Computers are very powerful and have excellent searching capabilities. However, it can be frustrating when you are certain that you have worked on a particular file, but you just cannot find it. Have you ever found yourself working on the wrong version of a document? Or perhaps you have had the experience of working on a document that has been opened from an email message, carefully saving it, but then not being able to find it afterwards? Another frustrating experience can be when you are asked to take over someone else's projects, but you find that you cannot understand how the materials are organised, and have to open each and every file to find out what it contains.

You may like to take a few moments to ask participants if they have experienced any issues such as these, and what they have done to manage the problem. Guide the discussion; we want to come to the conclusion that all these problems can be avoided by consistently using a logical structure for naming, organising and managing your electronic files.

Ask where electronic files can be stored - yes, the physical location!

A number of options will be suggested, including:

  • hard drive of the participant's personal computer
  • USB stick or other external, removable device
  • shared drive (of participant's organisation)
  • 'in the cloud'.

Discuss the pros and cons of these approaches. Is it safe to have only one copy of a file? What problems can arise if files are stored in multiple places (i.e. multiple copies of a file exist)?

Security, accessibility, and (where multiple copies of files exist), synchronisation and version control should be mentioned. Note that we will be discussing these in more detail later.

Relate the discussion back to the participants' own situations: They are being trained in Excel as it is anticipated that they will some way be involved in working with Excel files that will be used to organise data for submission to iSIKHNAS, or that will be used to summarise material that has been extracted from iSIKHNAS. Check that this is the case - if not, ask to be reminded what they will mainly be doing with Excel. Then ask where the Excel documents that are part of this work will be saved. Ask whether participants are in control of organising their Excel (and other) files - do they decide how the documents should be named, and any directory structure that is in use?

Our assumption is that most participants will be saving their files to their hard drives, and that they are in charge of naming files and organising their folder structures. If this is generally not the case, we will revise these notes.

The key message is that a well-organised electronic filing structure coupled with consistent and logical file and folder-naming practices will make:

  • finding files easier (for you, and anyone else who needs to find a file on your machine)
  • keeping track of the current version of a file easier
  • backing up data easier.

Is it safe to assume that everyone knows what a folder is, and is familiar with Windows Explorer? It may not be - yet people might not be comfortable admitting that they are not completely familiar with these things. So, rather than assume that everyone knows, it might be best to demonstrate. However, be wary of demonstrating this if your own computer's filing system is messy! Also, if it is displayed in a language that is difficult for the participants to understand, it might be better to come up with some proposed structures on a whiteboard or flipchart.

Explain that a 'folder' is just an electronic version of a paper file folder.

Exercise - file and folder management

This can be done an interactive exercise. Here is one possible approach, but you could use any of a number of different activities to achieve the same results:

Tell the participants that you want to make a list of some principles of good file and folder management, and that you would like to begin with the group's own experience and suggestions.

  1. Have two pairs work together to form a larger group of four.
  2. Give each group time to discuss and make their list of file and folder management principles/rules.
    • Circulate among the groups.
    • Make sure that everyone knows what they are supposed to be doing. Give some examples e.g. organise documents in folders; keep files names short.
    • Use the discussion time to lead people towards at least the major ideas listed below, so there are some outputs of the exercise.
  3. Ask the groups to pick their best idea, and write it on the special paper provided.
    • Demonstrate that the writing must be LARGE, and brief, so that everyone will be able to see from the back of the room.
  4. Have one member of the first group come up to the front, read and explain their idea, and pin/stick it to a board.
  5. Repeat this with each group in turn.
    • If a group's best idea is the same as one that has been used, ask them to give their second idea.
  6. Go around the groups again, until all ideas have been used.
    • Use a different person from the group each time, to give everyone practice in speaking in front of a crowd.
  7. When all ideas have been presented, ask the participants to look them over, and see if there are any that have the same theme, or seem to belong together.
    • Ask for a volunteer to come up and move the papers so that ones that are similar can be grouped together.
    • Can all papers be grouped with at least one other paper?
  8. Ask the participants to suggest a name for each grouping.
  9. Review each grouping. Is there any duplication? Does anything need to be rephrased? Is there a logical order for the items?
  10. Make a final call for additional items.
  11. When satisfied, let the participants know that these will be typed up and sent to them electronically, so that they can incorporate these principles into their work. Advise them that you will look forward to seeing whether they will want to suggest any changes to the list when they come back for the next session of Excel training.
Outcomes - file and folder management

This list should not be regarded as 'the answer'. We should plan to revise and refine this list, based on participant experiences and suggestions. Nevertheless, this should be enough to start with:

  • Directory structures
    • Keep folders together (even though libraries might suggest you do not need to).
    • Keep your folder hierarchy under 'My Documents'.
    • Do not repeat words or concepts between levels (for example, do not have SEROLOGY/AI_SEROLOGY; instead, use SERO/AI).
  • Folder names
    • Have a consistent method for naming folders.
    • Keep folder names short.
    • Use abbreviations in folder names (for example, AI not AvianInfluenza).
    • Use broad categories, then use subdirectories for categories within those categories (hierarchical structure).
    • Avoid categories such as 'Miscellaneous'.
    • Use the same folder and subfolder structure for similar projects.
  • Folder contents
    • Keep files (documents) separate from programs.
    • Store like with like.
    • Keep the number of files in a folder to a maximum of 25.
    • Do not save unnecessary (temporary or interim) files.
    • Separate completed from ongoing work.
    • All files should be organised into a folder at some level.
  • File names
    • Have a consistent method for naming files.
    • Use common names for files (ones that are easy to associate with the contents of the file).
    • Make filenames as short as possible, but make sure they are still meaningful. Use acronyms, initials,and well-known abbreviations.
    • where appropriate, incorporate date information into the name in a way that allows sorting by date (year-month-day; Group-meeting-20130422).
    • Name files from general to specific.
    • Avoid punctuation including commas, colons, slashes, ampersands, and quotation marks.
    • Do not use spaces in filenames. Instead, use underscores, hyphens, or capitalisation (for example, AI_poultry, AI-poultry, AIPoultry).
    • When including version numbers into a filename, use at least two digits to improve sorting, and consistency (for example, AI-sero-summary_v01.xlsx)
  • Shortcuts
    • Use shortcuts and links instead of multiple copies of files.

[* Libraries (Should we include this in this course?)]

Discussion - location of downloaded files

Although we now have a good understanding of how to manage our files, we need to discuss how files are received. Ask the participants how they receive electronic files. This is likely to be either via email, or by downloading from the internet.

Ask the participants if they know where files downloaded from the internet are stored on their computer, and if they know how to change this. Ask if a participant would like to demonstrate this on their own laptop, and project this to the class. If more than one type of internet browser is used, have these demonstrated (each by a different person).

Go through the same process with documents received via email. If the participants mostly use internet mail, this will be the same as above, and can be omitted.

Saving files, backing up, and version management

Sometimes, the hard work we have been doing on our computers can be lost. Depending on time constraints, you may like to ask the participants if they have ever experienced any computer disasters. See if some common themes emerge - for instance, creating a document, but not saving before disaster strikes. Or having a hard disk fail, and losing the only copy of important documents. Or perhaps losing an external hard drive, or USB stick, again containing the only copy of documents. Some users might comment on saving work frequently, but saving mistakes, then not being able to go back to better, earlier copies of documents. Others might describe difficulties that arise when the wrong version of document is edited.

Ask if the participants have any strategies in place to address these problems.

If version management has not previously been discussed (for example, during the file and folder management exercise), take a few moments to go over this now. In many cases, files with version numbers do not also need dates, but if they do have both, the date should go after the version number in the name, to facilitate sorting. If a document is likely to undergo major changes, major and minor versions can be accommodated by (for example) v1.00, v1.01,v1.02 ... v2.00 etc. This results in multiple files, but it is very clear which is the most recent file.

Using document versions does mean that people can quickly and easily determine whether they are working with the appropriate version of a document. However, there are still issues of backing up these versions, and making the appropriate version available to people who need to work on it. Further, a document is generally saved regularly during a working session (but not with a new version number each time), and sometimes, one can save mistakes. When this occurs, work time can be lost, because the mistakes have been saved, and the obvious solution is to go back to the previous version and do the work again.

Comment that there are many ways to deal with these problems, but due to time constraints, we will mention only one, Dropbox. Dropbox is a file hosting service that offers cloud storage and file synchronisation. The free service provides a minimum of 2 GB online storage. You can sign up for Dropbox at www.dropbox.com.

Dropbox provides:

  • Safe secure backup (requires internet access).
    • Saved files are available from any computer, and multiple platforms.
  • 30 day availability of saved state of files.
    • Not only is the current version of a file available, but all saved versions of it are preserved. This means that you can save your work frequently during a session and, providing you have internet access, you will be able to go back to any one of those saved states to retrieve your work, if necessary.
  • Folders can be kept private, made public, or shared with a select list of people.
    • This is very useful for collaborative work on documents.
  • Mobile versions - Android and iOS apps are available.

Computer viruses

Computer viruses are a fact of life for Windows users. If time, discuss participants experiences with viruses.

Be aware!

Participants may be quite aware of dangerous practices. Without asking whether people actually do visit these sites, or engage in risky behaviour, ask the group to help build a list of risky computer practices.

If there is time, you could run this as an activity, in the form of your choice. Here is a list of potential suggestions:

  • clicking on links in emails or on websites
    • the link revealed by hovering does not match the text displayed
  • pop-up pages
    • some legitimate sites rely on pop-ups, so difficult to avoid all
  • downloads of executable (program) files
    • when downloading programs, it is a good idea to check whether others have experienced problems using the software, or downloading from that site
  • gambling or other similar sites
Anti-virus management

We strongly recommend installing a single anti-virus program and using it. This means that it should be both run and updated regularly.

We have had good experiences with (free versions of) antivirus programs such as AVG, or Avast.

Other products that search out spyware (Spybot) and malware (Malwarebytes) can also be useful. Again, these must be used and updated regularly.

Basic Excel Skills

The actual Excel skills they will be learning are reasonably straightforward. Far more important are the discussion and decisions that must be made before Excel is touched.

The participants will be given four real data sheets that have been completed in the field, then faxed to the office for processing.

The first part of this activity will involve interactive discussion, followed by small group activity.

Give the participants a few minutes to look at the four scanned pages of raw data. Ask them to consider:

  • why the information has been recorded
  • who will be using it
  • where it will be used
    • locally – will all use or outputs be based on the spreadsheet we create?
    • centrally – do we need to create something that can be uploaded or used by someone else?
  • how it will be used e.g.
    • as a record of activity, for payment/invoicing purposes
    • to collect information about disease problems
    • other?

Explain that we need to know why information is collected, and how it will be used, in order to make the best decisions about how to collate it electronically, as this step affects the outputs that can be created, and the cost of processing the data.

After the participants have had time to discuss this (in pairs, or groups of pairs), bring the groups together and get their ideas, recognising that some may be very familiar with this example and know exactly how the information is used.

Ask the participants how a spreadsheet created for local use only might differ from one designed for for uploading data into a central database. The key point here is that a database will have specific rules about how the data is to be presented to it. Excel can be a very useful tool to get data ready for submission to a central database, but it is important to know what these rules are.

Some data is the same for all records (e.g. Desa, Kecamatan, PPK/HP, Jenis Ternak) on a scanned page. Ask the participants how this information should be handled in their spreadsheet.

This is a good time to connect the workshop with the overall project – note that the plan is not to use Excel for long-term storage in a spreadsheet. It is anticipated that data collated in Excel will be submitted to the central database for storage. Excel will also be used to review and analyse data that has been extracted from the central database.

The next step is to have the participants go ahead and construct their spreadsheets and begin entering data. It is not necessary that all material in the four pages is entered, but all of the skills listed should be demonstrated and used.

Cleaning Messy Data

Participants are asked to

  1. review an existing workbook
  2. determine an appropriate data structure for their revised version
  3. identify problems with the original workbook - data structure and content
  4. make the appropriate revisions.

After giving the participants a brief overview of the exercise, let them work on the first three tasks in their pairs. Then bring everyone back together, and discuss their findings.

We suggest that the final data structure for the revised dataset is derived via the whole group discussion, so that all groups are working towards the same objective. You may wish to demonstrate the new skills, then allow everyone to work on making the revisions, with the facilitators circulating to provide help as needed.

To assist you, a workbook has been developed that shows the various steps in progress. People will do things in different orders, so their workbooks are unlikely to be exactly the same as the one provided. Neverthless, it should be a useful reference for you.

Here is a suggested list of data fields. Note that order is not important, and that you can decide to do this differently:

Field Data type Comment
EpiReg Text
Nomor Integer
SampelTanggal Date
Kecamatan Text Categorical
Desa Text Categorical
Petani Text
Kandang Integer
UsaiMinggu Integer
Titre Integer
Imunitas Text Categorical
TerakhirVaksinasi Advice needed Most have a range of days, but some entries difficult to interpret 'IF>96 day when 3 Month'

Note that we do not need to work with the GPS data - this can safely be omitted. However, the advanced course will cover techniques for converting coordinates expressed as degrees, minutes and seconds to decimal degrees.

The changes made on each of the new worksheet tabs will now be described:

Step1

  1. RESULT TEST worksheet copied, named 'Step1'.
  2. DAFF_AI_SERO_REVISED.xlsx workbook saved (CTRL+S).
  3. New column inserted at A.
  4. Format cells A7:A556 as text.
  5. Enter '0712466' (Epi Registration) into cell A377, copy it down to A556.
    • Make sure to choose 'Salin sel', not 'Isi seri'.
  6. Delete rows 369:376
  7. Enter '0712465' (Epi Registration) into cell A7, copy it down to A368.
    • Make sure to choose 'Salin sel', not 'Isi seri'.
  8. Delete rows 1:4
  9. Make a copy of the worksheet, place it at the end, and name the copy STEP2.
  10. Save the workbook.

Step2

  1. Delete the following columns:
    • D ('GPS POSITION')
    • B ('No')
  2. Insert a row above (existing) row 3.
  3. Insert a column between (existing) A and columns.
  4. Enter the new column headers in the appropriate cells in row 3.
    • The order of the columns doesn't necessarily match the order of the listed fields. Match the fields to their appropriate data; we can reorganise the order of columns later.
  5. Delete rows 1:2.
  6. Make a copy of the worksheet, place it at the end, and name the copy STEP3.
  7. Save the workbook.

Step3

The use of formulas to do many of the following tasks will be demonstrated in the advanced course. At this level, participants will need to do things manually, which can increase the likelihood of making mistakes. Note that errors also can occur when formulae are used.

  1. Copy the date 11/06/2012 to all H. Muhtar and Ruslan samples (B2:B243).
  2. Copy the date 13/06/2012 to all H.M. Yasmin samples (B244:B363).
  3. Copy the date 12/06/2012 to all remaining rows (B364:B543).
  4. Unmerge each of the merged entities in column E (Kandung), and then copy the shed identifier to each of the cells that had been part of the merge.
    • For example, E2:E16 contained '2200 (shed 1)'.
    • Each cell in E2:E16 needs to contain 'Shed 1'.
    • Watch out for (and correct!) the incorrect shed number in cells E77:E91. Listed as 7, but should be 6.
      • Make sure to discuss how data entry errors should be documented and handled.
    • Be careful to 'Salin sel', not 'Isi seri', when copying items containing numbers.
  5. Make a copy of the worksheet, place it at the end, and name the copy STEP4.
  6. Save the workbook.

Step4

  • The final step is to remove the ' Week' from all of the UsaiMinggu data (D2:D543).
  • Using Autofilter, we see that the entries are very consistent - all have a number, a space, then 'Week'.
  • This means that we can use either of two techniques to make the changes required:
    • 'Teks ke Kolom'
    • 'Temukan dan Ganti'
  • Having completed this step - you are done! However, make sure to take the time to apply an autofilter, and check the data in each field.
    • Is it necessary to format this dataset in any way? Why or why not?

Preparation of Useful Outputs

You will guide participants in the creation of their first pivot tables. The process begins with an examination of the processed data from the previous exercise. A portion of this data has been moved to another worksheet and hidden; it will be retrieved later and added back into the dataset upon which the pivot tables are based.

Note that there is a deliberate error in AIpivot.xlsx; this should be picked up by participants who are following the recommended guidelines of inspecting their data using the tools available to them. Examine cells E334:E348 on the AIData1 worksheet.

Drag and copy error

Ask the participants: How might this have happened? Should this be corrected? How and where should such a correction be recorded?

In addition, the last cell of the worksheet (J543 found using 'CTRL+END, or END then HOME) is NOT the same as right hand bottom corner of the (contiguous) range of data (J363, found using CTRL+A, which selects the range, or END+RIGHT ARROW, then END+DOWN ARROW).

Ask the participants: Why might this be? Does this matter?

  • This happened because data was deleted from the original dataset. Excel, however, still remembers the original extent of the range. If extensive formatting had been applied to this larger range, then more memory is being consumed than is really necessary, and the file size can be unnecessarily large.
  • Good practice in Excel can include routinely selecting then deleting all columns to the right of the data block, then doing the same for all rows below the data block. This removes any unnecessary formatting, and can reduce the file size.

If the participants have not noticed these issues by the time they are ready to begin building pivot tables, draw their attention to these points, and discuss.

To assist you, a series of workbooks from this exercise has been created:

AIpivot1.xlsx: contains the three worksheets (unnamed range, named range, table)

AIpivot2.xlsx: contains three pivot tables, each created from one of these sources, and each presenting different aspects of the data

AIpivot3.xlsx: reveals the hidden worksheet and adds the data to the end of the original material, on all three worksheets. The pivot tables are then refeshed, but no other changes are made.

AIpivot4.xlsx: source of pivottables 1 & 2 changed to also be AIData3. This should provide some clues as to the source of the data integrity problem. Worksheets AIData2 and DataTambahan have been deleted.

AIpivot5.xlsx: adds a fourth pivottable, which suggests that the problem data is in the additional records. Most of the Paleteang samples and all of the Padaidi samples are coded as seropositive.

AIpivot6.xlsx: contains the corrected data.

The participants should identify that the addition of the extra data to PivotTable3 has caused some puzzling results. There are many seropositive records; these occur in the additional data.

At this point, the participants have been provided with the necessary skills to examine this real data problem. We suggest that they spend some time working in pairs, applying (practising) their skills to come up with answers as to how the problem may have occurred.

The problem is in the formula applied to determine whether a sample should be coded as seronegative or seropositive. The development and use of formulae will be covered in the advanced course, but participants need to be aware that formulae may exist in the data they receive from other sources, and that this can cause problems.

Here, the problem is due to an error in the formula used in the additional dataset.

The formula in the original dataset is of the form: =IF(I363<16,"Seronegative","Seropositive") The formula in the additional data is of the form: =IF(H364<16,"Seronegative","Seropositive")

This is a small, but crucial difference. Column I contains the titre data; Column H is the sample number, which is clearly incorrect.

Some participants may have produced results that are even more startling:

Alternate PivotTable3 data

This can occur when Excel notices that the formulae in the column is inconsistent, and changes all of the cell entries to match the added data - the result is that ALL of the data are incorrectly coded. However, this error depends on the version of Excel used, the error reporting settings, and how the pasting was done (if the header row was included, then deleted, as outlined in the manual, this probably won't occur). If the formulae had been checked prior to building the pivot table, this would not have occurred.

The solution to the problem is to copy the correct formula to all cells in the Imunitas column in AIData3, then refesh all pivot tables based on this data.

The message to participants is that data preparation should include an assessment of any formulae in the data. This will be covered in detail in the advanced workshop.

If there is time, the participants can explore adding pivot charts. This, too, will be covered in detail in the advanced workshop.