Written Essay Evaluation Method

Candace Caraco, TA, Department of English

View PDF

Grading student papers for a course in any discipline presents a series of challenges different from grading other kinds of assignments. Typically, a wide range of responses will be acceptable, and every paper (unless it is plagiarized) will have some merit. Consequently, grading essays demands a teacher’s close attention to insure that each paper is judged by the same standards. A method for evaluating essays that breaks the grading process into parts can help an instructor work more consistently and efficiently. By assessing papers based upon the three general categories of ideas, argument, and mechanics and style, categories easily adapted for each discipline and assignment, an instructor can more easily recognize and comment on an essay’s strengths and weaknesses and so face that daunting pile of twenty, forty, or even one hundred essays with less trepidation. Furthermore, if teachers make clear to students how this method works, fewer students will be confused about their grades or apt to charge that papers are graded in an arbitrary or purely subjective way.

Before applying the three categories for evaluation, think through what it is you want an assignment to accomplish. Grades should reflect the most significant strengths and weaknesses of an essay, so a teacher should carefully consider ahead of time what expectations he or she has for a paper and especially what he or she most wants students to do for a particular assignment. For example,

  • Do the instructions to students require specific tasks, such as agreeing or disagreeing with an author, outlining a book’s argument for review, or analyzing a particular section of a work?
  • What is it that students should show they understood?

More generally, a teacher may also consider the following:

  • Has the student presented ideas in a logical order?
  • Is the essay written in clear, grammatically correct prose?
  • Has the student offered explanation or examples to support generalizations?

For any given assignment, your criteria for success may vary in the details; whatever they are, make a list of them. Ideally, students would receive a copy of this list before they begin writing their essays.

The problem with such a list of criteria, however, is that it can quickly grow unwieldy. While we need some specific questions as a checklist for student writing success, we can benefit from a streamlined evaluation system. The ideas/argument/mechanics and style format is a simple way to group criteria, both for yourself and your students. Once you have a set of criteria for an essay to succeed, you can decide how these questions fit under the three headings. A general breakdown of these questions might look like this:

IDEAS

  • Does the student understand the accompanying reading or the principles behind the experiment, etc.?
  • Does the student offer original interpretations?
  • Do the student’s explanations of terms, ideas, and examples demonstrate an ability to grasp the main points, paraphrase them, and apply them?
  • Does the student answer the question(s) assigned?
  • Does the essay demonstrate an understanding of a subject, or does it wander from one subject to the next without offering more than superficial remarks?

ARGUMENT

  • Can we easily determine what the author’s main point is?
  • Does the essay provide a series of points that add up to an argument supporting the main point (thesis)?
  • Does the essay proceed logically from point to point?
  • Does the student provide examples and explanations to support his or her generalizations?
  • Does the essay contain contradictions? Is the paragraph structure logical?

MECHANICS AND STYLE

  • Is it clear what the student’s point of view is?
  • Does the student control tone? Is the essay free of grammatical errors?
  • Is the essay punctuated appropriately?
  • Do citations and bibliography follow the correct format?
  • Is the prose clear or do you puzzle over individual sentences?
  • Are words spelled correctly?

What I am suggesting is essentially adapted from the methods of two English professors, Charlene Sedgwick and Steve Cushman. Sedgwick’s “ENWR Handbook” offers guidelines for evaluating freshman composition papers by assessing focus, organization, style, and mechanics; Cushman has in the past recommended that graders for his upper-level literature courses weigh mechanics and style (together) as one-third of a grade, and ideas and argument as the other two-thirds. Though instructors for non-English courses may want less emphasis on writing skills per se in an essay grade, I would argue that papers for all courses should be evaluated at least in part for their grammar, punctuation, and prose style because these fundamentals of writing are everywhere necessary for readers to understand writers. And a teacher in any discipline can easily tailor the three categories of ideas/argument/mechanics and style to the conventions of the course and its academic discipline.

Simplified (and Platonized) then, these three categories translate into the following grade scale: essays with good ideas that are logically organized into an argument and written in clear and mechanically clean prose receive an A; essays lacking in one category (e.g., have poor organization) receive a B; essays weak in two categories receive a C; and essays that manage none the three general criteria garner a D or fail. What constitutes an “A” within any given category will also depend upon the course level and the assignment, but in a very general way, if a student’s essay can answer “yes” to all of your questions for a category, then the student should have an “A” for that portion of the grade. (More explicit criteria appear in “Responding to Student Writing” by Stella Deen in the November, 1995, Teaching Concerns.)

Particularly for new teachers, it is sometimes helpful to read through several essays to see what an average paper for a class looks like. Checking to see if several papers have similar difficulties can also help you detect unclear instructions in the assignment or a content issue that may require further class discussion: if we have been unclear in some way, then we should be prepared to cut our students some slack when evaluating that part of the assignment.

However much we simplify the process, grading essays will never be as simple as marking multiple choice exams. Most student essays are some combination of good ideas and slight misunderstanding, clear argument and less clear argument: they don’t neatly divide into three parts. Typically the problems in an essay are closely related: for example, a misunderstanding of content can lead to a logical flaw in the argument and to prose that is full of short sentences because the author is not certain which ideas should be subordinated to others. Because of this system of logical relations, it is all the more important to include a final comment with a grade.

Writing final comments may indeed slow grading, but the pedagogical benefits of comments far outweigh the few minutes per paper needed to write them. Students continue to learn from an assignment if they understand what their work accomplished and what it didn’t. More importantly, final comments can help students write more fully conceived and better executed papers on the next assignment. (For a time-saving method of offering detailed comments about common problems in a set of essays, see Nancy Childress’s essay “Using General Comment Sheets,” published in the October, 1995 issue of Teaching Concerns; she recommends preparing a handout for the entire class in addition to [shorter] written comments on individual essays.)

One way to organize an end comment is to write at least one sentence pertaining to each of the three categories of ideas, argument, and style and mechanics. Breaking an essay into these three components can help us comment on an essay’s strengths and weaknesses more quickly than if we had no set criteria or if we had too many. A particularly successful comment will explain to a student how ideas, argument, style, and even grammar work together. Final comments also serve as a check on ourselves, especially if we tie our general end comments to specific examples within the paper. For example, when I finish reading Student A’s essay, I may sense that he didn’t offer proof in support of assertions. But when I look for an example of an unsupported assertion, I find there are passages that might serve as supporting evidence; however, he has not explained very carefully how the examples work, so my impression has been that his essay lacks proof. Even when we are sure that we have avoided bias and inconsistency, comments pointing to examples will better illustrate to students what they can improve. Above all, comments should not be mere justifications for grades, though they may coincidentally deter students from seeking explanations as to why the received a “B” instead of an “A.”

TRC NOTE: For help in implementing these suggestions, request a Writing Workshop.

Learning Objective

  1. Be able to describe the various appraisal methods.

It probably goes without saying that different industries and jobs need different kinds of appraisal methods. For our purposes, we will discuss some of the main ways to assess performance in a performance evaluation form. Of course, these will change based upon the job specifications for each position within the company. In addition to industry-specific and job-specific methods, many organizations will use these methods in combination, as opposed to just one method. There are three main methods of determining performance. The first is the trait method, in which managers look at an employee’s specific traits in relation to the job, such as friendliness to the customer. The behavioral method looks at individual actions within a specific job. Comparative methods compare one employee with other employees. Results methods are focused on employee accomplishments, such as whether or not employees met a quota.

Within the categories of performance appraisals, there are two main aspects to appraisal methods. First, the criteria are the aspects the employee is actually being evaluated on, which should be tied directly to the employee᾿s job description. Second, the rating is the type of scale that will be used to rate each criterion in a performance evaluation: for example, scales of 1–5, essay ratings, or yes/no ratings. Tied to the rating and criteria is the weighting each item will be given. For example, if “communication” and “interaction with client” are two criteria, the interaction with the client may be weighted more than communication, depending on the job type. We will discuss the types of criteria and rating methods next.

Graphic Rating Scale

The graphic rating scale, a behavioral method, is perhaps the most popular choice for performance evaluations. This type of evaluation lists traits required for the job and asks the source to rate the individual on each attribute. A discrete scale is one that shows a number of different points. The ratings can include a scale of 1–10; excellent, average, or poor; or meets, exceeds, or doesn’t meet expectations, for example. A continuous scale shows a scale and the manager puts a mark on the continuum scale that best represents the employee’s performance. For example:

PoorExcellent

The disadvantage of this type of scale is the subjectivity that can occur. This type of scale focuses on behavioral traits and is not specific enough to some jobs. Development of specific criteria can save an organization in legal costs. For example, in Thomas v. IBM, IBM was able to successfully defend accusations of age discrimination because of the objective criteria the employee (Thomas) had been rated on.

Many organizations use a graphic rating scale in conjunction with other appraisal methods to further solidify the tool’s validity. For example, some organizations use a mixed standard scale, which is similar to a graphic rating scale. This scale includes a series of mixed statements representing excellent, average, and poor performance, and the manager is asked to rate a “+” (performance is better than stated), “0” (performance is at stated level), or “−” (performance is below stated level). Mixed standard statements might include the following:

  • The employee gets along with most coworkers and has had only a few interpersonal issues.
  • This employee takes initiative.
  • The employee consistently turns in below-average work.
  • The employee always meets established deadlines.

An example of a graphic rating scale is shown in Figure 11.1 “Example of Graphic Rating Scale”.

Essay Appraisal

In an essay appraisal, the source answers a series of questions about the employee’s performance in essay form. This can be a trait method and/or a behavioral method, depending on how the manager writes the essay. These statements may include strengths and weaknesses about the employee or statements about past performance. They can also include specific examples of past performance. The disadvantage of this type of method (when not combined with other rating systems) is that the manager’s writing ability can contribute to the effectiveness of the evaluation. Also, managers may write less or more, which means less consistency between performance appraisals by various managers.

Checklist Scale

A checklist method for performance evaluations lessens the subjectivity, although subjectivity will still be present in this type of rating system. With a checklist scale, a series of questions is asked and the manager simply responds yes or no to the questions, which can fall into either the behavioral or the trait method, or both. Another variation to this scale is a check mark in the criteria the employee meets, and a blank in the areas the employee does not meet. The challenge with this format is that it doesn’t allow more detailed answers and analysis of the performance criteria, unless combined with another method, such as essay ratings. A sample of a checklist scale is provided in Figure 11.3 “Example of Checklist Scale”.

Figure 11.1 Example of Graphic Rating Scale

Figure 11.2 Example of Essay Rating

Figure 11.3 Example of Checklist Scale

Critical Incident Appraisals

This method of appraisal, while more time-consuming for the manager, can be effective at providing specific examples of behavior. With a critical incident appraisal, the manager records examples of the employee’s effective and ineffective behavior during the time period between evaluations, which is in the behavioral category. When it is time for the employee to be reviewed, the manager will pull out this file and formally record the incidents that occurred over the time period. The disadvantage of this method is the tendency to record only negative incidents instead of postive ones. However, this method can work well if the manager has the proper training to record incidents (perhaps by keeping a weekly diary) in a fair manner. This approach can also work well when specific jobs vary greatly from week to week, unlike, for example, a factory worker who routinely performs the same weekly tasks.

Work Standards Approach

For certain jobs in which productivity is most important, a work standards approach could be the more effective way of evaluating employees. With this results-focused approach, a minimum level is set and the employee’s performance evaluation is based on this level. For example, if a sales person does not meet a quota of $1 million, this would be recorded as nonperforming. The downside is that this method does not allow for reasonable deviations. For example, if the quota isn’t made, perhaps the employee just had a bad month but normally performs well. This approach works best in long-term situations, in which a reasonable measure of performance can be over a certain period of time. This method is also used in manufacuring situations where production is extremely important. For example, in an automotive assembly line, the focus is on how many cars are built in a specified period, and therefore, employee performance is measured this way, too. Since this approach is centered on production, it doesn’t allow for rating of other factors, such as ability to work on a team or communication skills, which can be an important part of the job, too.

Ranking Methods

In a ranking method system (also called stack ranking), employees in a particular department are ranked based on their value to the manager or supervisor. This system is a comparative method for performance evaluations.The manager will have a list of all employees and will first choose the most valuable employee and put that name at the top. Then he or she will choose the least valuable employee and put that name at the bottom of the list. With the remaining employees, this process would be repeated. Obviously, there is room for bias with this method, and it may not work well in a larger organization, where managers may not interact with each employee on a day-to-day basis.

To make this type of evaluation most valuable (and legal), each supervisor should use the same criteria to rank each individual. Otherwise, if criteria are not clearly developed, validity and halo effects could be present. The Roper v. Exxon Corp case illustrates the need for clear guidelines when using a ranking system. At Exxon, the legal department attorneys were annually evaluated and then ranked based on input from attorneys, supervisors, and clients. Based on the feedback, each attorney for Exxon was ranked based on their relative contribution and performance. Each attorney was given a group percentile rank (i.e., 99 percent was the best-performing attorney). When Roper was in the bottom 10 percent for three years and was informed of his separation with the company, he filed an age discrimination lawsuit. The courts found no correlation between age and the lowest-ranking individuals, and because Exxon had a set of established ranking criteria, they won the case (Grote, 2005).

Another consideration is the effect on employee morale should the rankings be made public. If they are not made public, morale issues may still exist, as the perception might be that management has “secret” documents.

Fortune 500 Focus

Critics have long said that a forced ranking system can be detrimental to morale; it focuses too much on individual performance as opposed to team performance. Some say a forced ranking system promotes too much competition in the workplace. However, many Fortune 500 companies use this system and have found it works for their culture. General Electric (GE) used perhaps one of the most well-known forced ranking systems. In this system, every year managers placed their employees into one of three categories: “A” employees are the top 20 percent, “B” employees are the middle 70 percent, and “C” performers are the bottom 10 percent. In GE’s system, the bottom 10 percent are usually either let go or put on a performance plan. The top 20 percent are given more responsibility and perhaps even promoted. However, even GE has reinvented this stringent forced ranking system. In 2006, it changed the system to remove references to the 20/70/10 split, and GE now presents the curve as a guideline. This gives more freedom for managers to distribute employees in a less stringent manner1.

The advantages of a forced ranking system include that it creates a high-performance work culture and establishes well-defined consequences for not meeting performance standards. In recent research, a forced ranking system seems to correlate well with return on investment to shareholders. For example, the study (Sprenkel, 2011) shows that companies who use individual criteria (as opposed to overall performance) to measure performance outperform those who measure performance based on overall company success. To make a ranking system work, it is key to ensure managers have a firm grasp on the criteria on which employees will be ranked. Companies using forced rankings without set criteria open themselves to lawsuits, because it would appear the rankings happen based on favoritism rather than quantifiable performance data. For example, Ford in the past used forced ranking systems but eliminated the system after settling class action lawsuits that claimed discrimination (Lowery, 2011). Conoco also has settled lawsuits over its forced ranking systems, as domestic employees claimed the system favored foreign workers (Lowery, 2011). To avoid these issues, the best way to develop and maintain a forced ranking system is to provide each employee with specific and measurable objectives, and also provide management training so the system is executed in a fair, quantifiable manner.

In a forced distribution system, like the one used by GE, employees are ranked in groups based on high performers, average performers, and nonperformers. The trouble with this system is that it does not consider that all employees could be in the top two categories, high or average performers, and requires that some employees be put in the nonperforming category.

In a paired comparison system, the manager must compare every employee with every other employee within the department or work group. Each employee is compared with another, and out of the two, the higher performer is given a score of 1. Once all the pairs are compared, the scores are added. This method takes a lot of time and, again, must have specific criteria attached to it when comparing employees.

Human Resource Recall

How can you make sure the performance appraisal ties into a specific job description?

Management by Objectives (MBO)

Management by objectives (MBOs) is a concept developed by Peter Drucker in his 1954 book The Practice of Management (Drucker, 2006). This method is results oriented and similar to the work standards approach, with a few differences. First, the manager and employee sit down together and develop objectives for the time period. Then when it is time for the performance evaluation, the manager and employee sit down to review the goals that were set and determine whether they were met. The advantage of this is the open communication between the manager and the employee. The employee also has “buy-in” since he or she helped set the goals, and the evaluation can be used as a method for further skill development. This method is best applied for positions that are not routine and require a higher level of thinking to perform the job. To be efficient at MBOs, the managers and employee should be able to write strong objectives. To write objectives, they should be SMART (Doran, 1981):

  1. Specific. There should be one key result for each MBO. What is the result that should be achieved?
  2. Measurable. At the end of the time period, it should be clear if the goal was met or not. Usually a number can be attached to an objective to make it measurable, for example “sell $1,000,000 of new business in the third quarter.”
  3. Attainable. The objective should not be impossible to attain. It should be challenging, but not impossible.
  4. Result oriented. The objective should be tied to the company’s mission and values. Once the objective is made, it should make a difference in the organization as a whole.
  5. Time limited. The objective should have a reasonable time to be accomplished, but not too much time.

Setting MBOs with Employees

(click to see video)

An example of how to work with an employee to set MBOs.

To make MBOs an effective performance evaluation tool, it is a good idea to train managers and determine which job positions could benefit most from this type of method. You may find that for some more routine positions, such as administrative assistants, another method could work better.

Behaviorally Anchored Rating Scale (BARS)

A BARS method first determines the main performance dimensions of the job, for example, interpersonal relationships. Then the tool utilizes narrative information, such as from a critical incidents file, and assigns quantified ranks to each expected behavior. In this system, there is a specific narrative outlining what exemplifies a “good” and “poor” behavior for each category. The advantage of this type of system is that it focuses on the desired behaviors that are important to complete a task or perform a specific job. This method combines a graphic rating scale with a critical incidents system. The US Army Research Institute (Phillips, et. al., 2006) developed a BARS scale to measure the abilities of tactical thinking skills for combat leaders. Figure 11.4 “Example of BARS” provides an example of how the Army measures these skills.

Figure 11.4 Example of BARS

Figure 11.5 More Examples of Performance Appraisal Types

How Would You Handle This?

Playing Favorites

You were just promoted to manager of a high-end retail store. As you are sorting through your responsibilities, you receive an e-mail from HR outlining the process for performance evaluations. You are also notified that you must give two performance evaluations within the next two weeks. This concerns you, because you don’t know any of the employees and their abilities yet. You aren’t sure if you should base their performance on what you see in a short time period or if you should ask other employees for their thoughts on their peers’ performance. As you go through the files on the computer, you find a critical incident file left from the previous manager, and you think this might help. As you look through it, it is obvious the past manager had “favorite” employees and you aren’t sure if you should base the evaluations on this information. How would you handle this?

How Would You Handle This?

https://api.wistia.com/v1/medias/1360849/embed

The author discusses the How Would You Handle This situation in this chapter at: https://api.wistia.com/v1/medias/1360849/embed.

Table 11.3 Advantages and Disadvantages of Each Performance Appraisal Method

Type of Performance Appraisal MethodAdvantagesDisadvantages
Graphic Rating ScaleInexpensive to developSubjectivity
Easily understood by employees and managersCan be difficult to use in making compensation and promotion decisions
EssayCan easily provide feedback on the positive abilities of the employeeSubjectivity
Writing ability of reviewer impacts validity
Time consuming (if not combined with other methods)
Checklist scaleMeasurable traits can point out specific behavioral expectationsDoes not allow for detailed answers or explanations (unless combined with another method)
Critical IncidentsProvides specific examplesTendency to report negative incidents
Time consuming for manager
Work Standards ApproachAbility to measure specific components of the jobDoes not allow for deviations
RankingCan create a high-performance work culturePossible bias
Validity depends on the amount of interaction between employees and manager
Can negatively affect teamwork
MBOsOpen communicationMany only work for some types of job titles
Employee may have more “buy-in”
BARSFocus is on desired behaviorsTime consuming to set up
Scale is for each specific job
Desired behaviors are clearly outlined

Key Takeaways

  • When developing performance appraisal criteria, it is important to remember the criteria should be job specific and industry specific.
  • The performance appraisal criteria should be based on the job specifications of each specific job. General performance criteria are not an effective way to evaluate an employee.
  • The rating is the scale that will be used to evaluate each criteria item. There are a number of different rating methods, including scales of 1–5, yes or no questions, and essay.
  • In a graphic rating performance evaluation, employees are rated on certain desirable attributes. A variety of rating scales can be used with this method. The disadvantage is possible subjectivity.
  • An essay performance evaluation will ask the manager to provide commentary on specific aspects of the employee’s job performance.
  • A checklist utilizes a yes or no rating selection, and the criteria are focused on components of the employee’s job.
  • Some managers keep a critical incidents file. These incidents serve as specific examples to be written about in a performance appraisal. The downside is the tendency to record only negative incidents and the time it can take to record this.
  • The work standards performance appraisal approach looks at minimum standards of productivity and rates the employee performance based on minimum expectations. This method is often used for sales forces or manufacturing settings where productivity is an important aspect.
  • In a ranking performance evaluation system, the manager ranks each employee from most valuable to least valuable. This can create morale issues within the workplace.
  • An MBO or management by objectives system is where the manager and employee sit down together, determine objectives, then after a period of time, the manager assesses whether those objectives have been met. This can create great development opportunities for the employee and a good working relationship between the employee and manager.
  • An MBO’s objectives should be SMART: specific, measurable, attainable, results oriented, and time limited.
  • A BARS approach uses a rating scale but provides specific narratives on what constitutes good or poor performance.

Exercise

  1. Review each of the appraisal methods and discuss which one you might use for the following types of jobs, and discuss your choices.

    1. Administrative Assistant
    2. Chief Executive Officer
    3. Human Resource Manager
    4. Retail Store Assistant Manager

1“The Struggle to Measure Performance,” BusinessWeek, January 9, 2006, accessed August 15, 2011, http://www.businessweek.com/magazine/content/06_02/b3966060.htm.

References

Doran, G. T., “There’s a S.M.A.R.T. Way to Write Management’s Goals and Objectives,” Management Review 70, no. 11 (1981): 35.

Drucker, P., The Practice of Management (New York: Harper, 2006).

Grote, R., Forced Ranking: Making Performance Management Work (Boston: Harvard Business School Press, 2005).

Lowery, M., “Forcing the Issue,” Human Resource Executive Online, n.d., accessed August 15, 2011, http://www.hrexecutive.com/HRE/story.jsp?storyId=4222111&query=ranks.

Phillips, J., Jennifer Shafter, Karol Ross, Donald Cox, and Scott Shadrick, Behaviorally Anchored Rating Scales for the Assessment of Tactical Thinking Mental Models (Research Report 1854), June 2006, US Army Research Institute for the Behavioral and Social Sciences, accessed August 15, 2011, http://www.hqda.army.mil/ari/pdf/RR1854.pdf.

Sprenkel, L., “Forced Ranking: A Good Thing for Business?” Workforce Management, n.d., accessed August 15, 2011, http://homepages.uwp.edu/crooker/790-iep-pm/Articles/meth-fd-workforce.pdf.

This is a derivative of Human Resource Management by a publisher who has requested that they and the original author not receive attribution, which was originally released and is used under CC BY-NC-SA. This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

0 Thoughts to “Written Essay Evaluation Method

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *