Data
Data Collection
The t-test of my study was the Fountas & Pinnell Benchmark Assessment System. I examined three different data points from the t-test: accuracy, self-corrections, and comprehension conversations. Accuracy refers to the number of words read correctly in a portion of the text. It is expressed as a percentage and can be calculated using a simple formula: (total words read - total errors)/total words read x 100. The data collected for self-corrections represents the number of times a student makes an error while reading and corrects it. These are not considered errors due to the fact that they are corrected and do not affect accuracy. Self-corrections can be a sign of reading comprehension because they indicate that a student realizes what he or she read did not make sense. During the comprehension conversation portion of this assessment, I asked each student a series of within the text, beyond the text, and about the text questions after they had orally read the story. Within the text questions are literal and can be found directly in the text. Beyond the text questions ask the reader to create new thoughts and connections from what they read. About the text questions ask students to consider information and opinions from themselves and others. Their answers for each question type were rated on a scale from zero to three. A score of zero indicated no understanding, one reflected limited understanding, two indicated partial understanding, and three reflected excellent understanding. Students were tested at the same reading level in the pretest and post test, just with a different genre of text.
​
My district’s reading curriculum, Storytown, had weekly assessments that coincided with the weekly story, vocabulary, and language skills taught. My school only tested students on the reading comprehension and vocabulary portions of the assessment, but the curriculum was quite outdated, with a copyright date of 2008. As more research was being conducted on best teaching practices for creating test items, it had become apparent that the Storytown questions were produced in a way that led to results that were not necessarily indicative of comprehension of the weekly story. That being said, I still gave the Storytown weekly assessments to the students as required by my district, which is why I used them as a data point for my study. I only utilized the reading comprehension scores on the test, however, because this was the focal point of my study, and it gave me additional quantitative data regarding reading comprehension.
​
I collected data regarding the weekly targeted questions that I created in order to guide my instruction. On a weekly basis, I created at least six targeted questions pertaining to the leveled reader of each small group and recorded a yes (Y) or a no (N) depending on my perception of the group’s understanding of the question. I kept the data in a spreadsheet to contemplate while creating the following week’s targeted questions. The questions that I constructed were modeled off of the Fountas & Pinnell’s within the text, beyond the text, and about the text questions to mimic the questions that the students were asked on the t-test.
​
Data Collection Examples
How was assessment information utilized to inform instructional decisions?
These data collection methods were best for the population outlined in the study because they were all collection methods that the students had previously been exposed to in a different light. Students were frequently leveled by their classroom teacher and reading specialist throughout the school year using the Fountas & Pinnell Benchmark Assessment System, so by the time I conducted my study they were accustomed to how my t-test worked. Classroom teachers at all grade levels in my building were required to administer the Storytown test on a weekly basis, so many of my students had experience with this testing instrument since kindergarten. Before my study, I asked targeted questions during certain whole group reading lessons. Although students were not used to this instructional method in the small group setting, they had at least occasionally encountered it during whole group reading. The differing collection methods allowed me to determine the effectiveness of my study and the teaching strategy I implemented, targeted questioning, and not the extraneous variable of a foreign assessment. When I benchmarked my students, I did so one-on-one in a quiet room. Most of my students need one-on-one attention for me to get an accurate depiction of their understanding of a concept, but all students benefited from this decision.
As previously explained, I collected data on a spreadsheet on a daily basis from each of my guided reading groups to keep track of their progress. Since I implemented the use of targeted questioning during my guided reading group, it was a bit difficult for me to ask each individual student every single question I created. I attempted to directly ask at least one question to each student on a weekly basis, which is why I did not use the same scale as the Fountas & Pinnell Benchmark Assessment System. Since there were three to five students per guided reading group, I did not find it fair to rate an entire group’s answer to a comprehension question on a scale of zero to three based off of one student’s answer. For that reason, I either marked a yes or a no if the student could come to a correct answer with a general consensus from the group. I utilized a multitude of group sharing strategies such as thumbs up/thumbs down to determine if students agreed with one another, turn and talks, and multiple students per question shared their answers.
I used the weekly targeted questioning data to guide my whole group instruction. For example, if I noticed that my students were struggling with one of the three specific types of questions that I was asking during guided reading, or a specific format or wording of a question, I would construct similar questions pertaining to the whole group story in order to give my students more exposure to that question configuration and allow them to be more comfortable with it. When I noticed a specific standard that I was formatively assessing through my targeted questioning was not being met, such as cause and effect or making inferences, I would use that information to guide my whole group instruction, as well.
Analysis
This bar graph compares the average comprehension conversation scores of my class on the pretest and post test of the Fountas & Pinnell Benchmark Assessment System. Students were tested at the same reading level for the pretest and post test as growth within the subsections of the t-test was intended to be measured. The data indicates that my students experienced growth when answering beyond the text and about the text questions, but remained the same when answering within the text questions. I believe this occurred as a result of my focus on asking students beyond the text and about the text questions during my study, because I knew that, per my pretest scores, they were already quite successful when it came to answering within the text questions. My students' ability to answer beyond the text questions grew the most, potentially because some of my whole group reading lessons during my study touched on text-to-self, text-to-text, and text-to-world connections.
​
When critically examining the results, I’m wondering if the data is skewed due to individual genre preferences. Since I tested my students at the same reading level during the pretest and post test, I had to use different genres due to the availability of resources at my school. For example, if I tested a student with a nonfiction level U text for the pretest, I tested the student at a fiction level U text for the post test. If a student really enjoyed nonfiction texts and did really well on the pretest because of it, but when it came time to the post test the student had to read a fiction book and experienced a decline because they did not enjoy fiction in general, that could have negatively skewed my results. The opposite could be said for a student who started off with the genre they were least comfortable, but was post tested on the genre they read on a daily basis. That instance could cause a positive skew in the results. All in all, the data indicates that my students ended my study more prepared to answer a plethora of beyond the text and about the text questions regarding all types of genres in the future, but could use more practice with answering those literal, within the text questions to experience optimal comprehension of a text.
Analysis
This bar graph compares accuracy and overall comprehension between the pretest and post test of the Fountas & Pinnell Benchmark Assessment System. While categorical scores were examined in the Class Comprehension Conversation graph, overall comprehension scores are shown in this graph in relation to accuracy. Since each student could potentially get three points for each type of questioning, receiving full points in all categories would lead a student to an overall comprehension score of 9/10. Getting a 10/10 was difficult and the extra point could have been awarded by the teacher for some sort of additional understanding that was expressed during the comprehension conversation. Those overall comprehension scores were converted into respective percentages. For example, 9/10 was converted to 90% in order to have the same scale as the accuracy data.
​
The data shows me there is a correlation between accuracy and overall comprehension. The class’s average accuracy went from 97% to 98%, depicting 1% growth. The class’s average overall comprehension went from 72% to 80%, indicating 8% growth. The more accurate a student was when they read, the more likely they were to answer comprehension questions correctly because what they read was what the author intended. I feel confident that my students were reading more accurately as a result of my study and as they prepared to enter sixth grade. The remaining question that is still lingering in my brain is which data type led to growth in the other? Did my students comprehend more of what they were reading which allowed them to read at a higher accuracy rate? Or had my students become better readers and their accuracy rate led them to higher levels of understanding? The focus of my study indicated the latter over the former, but both were possibilities.
Analysis
This bar graph delves into individual student results on the overall comprehension conversation between the pretest and post test of the Fountas & Pinnell Benchmark Assessment System. Student A and Student B were from below grade level guided reading groups, Student C and Student D were from on grade level guided reading groups, and Student E and Student F were from above grade level guided reading groups. All but one student showed growth in overall comprehension by at least one point.
​
Student F showed a decline, and I believe it is because of the book that he was required to read and discuss during the post test. His reading level was X and he read fiction during the pretest and nonfiction during the post test. Based on the books he selected for himself during silent reading time, I was aware of the fact that this student preferred fiction books. His preference for fiction in conjunction with the fact that the nonfiction level X book is all about the creation of email and the internet, and the questions were quite difficult, could have caused his overall comprehension score to decrease. This student was a high achiever, and I wanted him to give the demanding text a shot. All things considered, I believe scoring 8/10 was quite an accomplishment for a book of this rigor. As the researcher, I’m wondering if I should have had a conversation with the reading specialist to ask if there were other level X nonfictions at her disposal that I could have used instead of the strenuous one I utilized in my study.
​
While my study had a positive impact on all but one student shown in this graph, Student D experienced the most growth. I believe this is for a multitude of reasons. This student started going to a small group reading intervention with a para that assisted her with test-taking skills to use during reading tests. The intervention took place two days a week for 30 minutes. This intervention, in combination with my implementation of targeted questioning, allowed her to be exposed to all different types of questions and strategies and in return prepared her to answer them correctly. This student read nonfiction during the pretest and fiction during the post test, and she enjoyed fiction more, so that also could have been a factor in her three-point increase.
Despite one discrepancy, overall my students comprehended what they read at a higher level than when my study began. This will impact my students because comprehending texts will serve students well across disciplines.
Analysis
The Individual Comprehension Conversations and Individual Self-Corrections graphs depict different data from the same students. For example, Student A in the Individual Comprehension Conversations bar graph is the same exact student in the Individual Self-Corrections bar graph. The reason I constructed the two different graphs representing the same students is to show the correlation between comprehension conversation scores and self-corrections. Self-corrections are crucial to reading comprehension because they show that a student is understanding what they are reading and able to catch any mistakes they make by going back and rereading for the correct word or phrase.
​
Student A, Student D, and Student F all had increasing self-corrections. This is interesting, because Student A and Student D both showed comprehension growth. As explained earlier, Student F read a difficult text and although his overall comprehension score went down, his self-corrections went up, showing me that he was able to identify disconnecting words and phrases, suggesting at the least a baseline level of comprehension. Student B and Student C had the same amount of self-corrections on the pretest and the post test, showing me that they were still able to monitor their understanding at the same rate while answering the comprehension questions at more of an in-depth level. Student D was the only student that had a decreasing number of self-corrections but an increasing comprehension score. I believe this is because, during small group reading instruction, I had been working with him on slowing down his reading and not trying to be the fastest in the room. Due to the slower speed, he was reading more carefully and, therefore, did not have as many mistakes to catch.
​
Taking in the self-correction data as a whole, generally speaking, the self-corrections either remained the same or increased along with an increase in comprehension.
Analysis
This bar graph shows the percentage of questions answered correctly during each week of targeted questioning that took place during guided reading instruction. Group A and Group B are the groups that were reading below grade level, Group C and Group D were reading on grade level, and Group E and Group F were reading above grade level. What is interesting about this data is that one group from each level got 100% of the questions I asked on a weekly basis correct. The only group that ended week six with a decrease in percentage of questions answered correctly from week one is Group A. This group experienced intermittent periods of success, and I was ecstatic by their results from weeks one, three, four, and five, but a bit discouraged by weeks two and six. Group A also saw the reading interventionist on a daily basis for thirty minutes. They worked almost entirely on reading comprehension, and their effort and results were dependent upon their mood, interest, and behavior on any given day. Group C’s results were so turbulent because one student in the group was in the special education program for reading. The student struggled with about the text questions, especially, which brought the correct answer percentage down a bit for that group.
​
After examining the data, I’m wondering if I could have constructed Group C differently. The student with an IEP in reading was attending a small group reading session organized by the special education teacher on a daily basis, but was still not showing much growth in reading comprehension. I could have met with her during my guided reading block on an individual basis. This would have been difficult for me as a teacher to find the time, but it could have given her the one-on-one instruction she could have benefited from and my graph would have then depicted more accurate results for Group C. Overall, judging by this data, at the end of my study students were doing better or were still completely accurate when answering within the text, beyond the text, and about the text comprehension questions during small group reading.
Analysis
The Storytown test averages are comprised of the scores attained on the comprehension portion of the assessment. The class average started at 72% after the first week of the study and culminated at 90% at the end of the study. The significant drop in comprehension during week two was more than likely due to the fact that the test was regarding a story of the biography genre, and the students had not had much hands-on experience with that genre nor did they seem to be very interested in that story type. Although these results supported the conclusion that my study succeeded in increasing the reading comprehension of my students, Storytown test results are somewhat inconclusive, because the testing instrument itself is invalid due to poorly constructed test items. Some questions are valid comprehension questions whereas others could be interpreted in multiple ways. Though the validity of the test is questionable, the results of the weekly test were consistent with the findings of the other data collection methods, indicating that students improved at answering comprehension questions of all formats.
​
Analyzing my practice as a teacher, I’m wondering if I could have reworded the invalid test items to reflect best practice construction of test items. This would have required a conversation with one of my internal stakeholders, the principal of my school, but it would have been a change in favor of the students. Two things could have happened if the principal approved my alteration of the tests. The first scenario is that the students could have done excellent and their scores could have been substantially higher than the averages shown in this graph. The second possibility is that students had become so accustomed to the poorly constructed test items that best practice test items would have confused them and the scores could have been significantly lower than the averages shown in this graph.
Triangulation
When triangulating data, my graphs confirm and enrich one another. Of course there were outliers, but generally speaking my students showed reading comprehension at a higher level once my study had commenced. The Storytown Test Average graph enriches the Group Targeted Question Comprehension graph. Both showed a decline in comprehension during week two, potentially due to the biography genre. The Class Comprehension Conversation graph and the Class Average Benchmark Data confirm one another in the sense that they show an increase in reading comprehension both categorically and overall. I would imagine, had I had consecutive five day work weeks with students, their scores would have increased even more. Since I had to condense my instructional content into fewer days because of the snow days, the quality of that instruction was not what was originally intended. That being said, my goal of increasing reading comprehension through the use of targeted questioning succeeded as shown by my data.