top of page

Research in Network Analysis

Rebekah Manweiler, Erick Oduniyi, Prof. Jon Brumberg, Prof. Nicole Beckage
Fall 2017-Spring 2018
University of Kansas, Lawrence Kansas
Week Thirty-Six

May 3, 2018

Quite a bit has happened this week! On Saturday, Erick and I participated in the Undergraduate Research Symposium where we each gave separate presentations on our work this semester. My research talk was titled "Exploring the Network Structure of Child Directed Speech" and was a high level overview of the project, its goals, and my initial results. Below is a PDF of my presentation slides as well as a picture my mom took during my presentation (Thanks mom!).

Also during this week, I found out that Professor Beckage and I were accepted into the Distributed Research Experience for Undergraduates (DREU) funded by the CRA-W. I will be getting in touch with Professor Beckage soon so we can discuss our plans for the summer. I still plan on working with Erick and Professor Brumberg even if I leave to work with Professor Beckage in Madison Wisconsin, and will be updating this blog in conjunction with Erick's over the summer. Hopefully, we will be able to write a joint paper to be published in a journal later this year.

Lastly, we also found out that Erick's and my poster application to the Tapia conference was accepted! So, we will both be traveling to Orlando Florida this coming September to present our joint research project.

 

This summer is going to be jam-packed with research and I am so excited to get started! But, next week is finals week, which I will need to prepare for, Erick and I still need to write our final report, and the following weeks may be busy with planning and packing for my trip to Madison. So, it may be a while until my next post. Until then, a big thank you to the CRA-W for this amazing opportunity and to our professors Nicole Beckage and Jon Brumberg for all of their help and guidance through this year and the times to come. We will never forget it!

Week Thirty-Four

April 19, 2018

This week I finished my presentation draft and went to the Undergraduate Research Symposium presentation workshop to get feedback. It was very helpful but, unfortunately we did not get a chance to share what we had already worked on. We mainly focused on logistics of the symposium and how to start working on your presentation and what goes into a good presentation. So, this weekend I will be giving my presentation to some of my friends and next week to Professor Brumberg for feedback and corrections. 

I also worked on getting more information about the nodes we removed to create the reduced networks. I was able to get the name of the node removed and its in and out degree, but I didn't get the nodes it was connected to at first to make sure that the maximum in and out degree was always 5 so I could preallocate the right amount of entries in the dataframe holding the data. I did find that the max was 5 and will be able to start running the code to get all the node's neighbors' names. I am worried that this code will take a while to run because even just getting the name and the in-degree and out-degree took quite a while. I think I was able to go through about 2,000 nodes in an hour and the voxforge has over 14,000 nodes, so I can only imagine how long this will take.

 
 
 
 
 
 
 
 
 
Week Thirty-Five

April 26, 2018

Well, I've gotten a lot of feedback and will run my presentation by Professor Brumberg one more time and present on Saturday! I've been getting kinda stressed and haven't been able to do any more research this week beyond running my code to get the nodes' neighbors and realizing that there was something wrong. So, I guess we will see what happens next week.

 

Extension: Week One

May 25, 2018

Hello from Madison Wisconsin! I have just finished my first full week here in Madison and so far I am enjoying myself. This first week has been pretty slow: I've been getting to know people in the lab, getting acquainted with the town, settling into my living space, and reading an endless list of papers. 

There are a couple things I want to mention first that pertain to the CREU project. First, I heard back from Grace Hopper and they accepted my poster proposal, unfortunately Tapia and Grace Hopper happen at the same time and we have already accepted the poster presentation at Tapia so I had to decline the offer from Grace Hopper. Secondly, Erick and I did eventually finish the final report for the CREU project and it is attached here.

 

 

Moving on to this summer, I will continue working with Erick and Professor Brumberg in conjunction with Professor Beckage to finish our proposal and hopefully write a joint paper, but I will also be working on a new project with Professor Beckage that she is currently receiving funding for with Professor Austerweil at the University of Wisconsin. So, I will do my best to schedule my time so I am working on both projects and Professor Beckage suggested I try a 30-70 split (70 for the new project). I am not sure if this will be enough time committed to the CREU extension (I may make it a 40-60 split) but I will do my best to get in enough time for useful growth. One aspect of being in Madison is that work on the CREU project will be easier because I am in direct daily contact with Professor Beckage, so this should help the pace and quality of work for this summer. 

I have not been in touch with Professor Brumberg and Erick since I left, but I am planning on emailing them this week to work out a weekly meeting time over Skype to keep in contact with them and also keep me accountable for my end of the project.

This coming work week I will be doing some more reading for my DREU project and meeting with Professor Austerweil, I will be catching up with the CREU project and what I need to do next, and I will be meeting several new students and professors that are part of the PREP (Psychology Research Experience Program) summer program. I am not part of this program, however the advisers for the program are allowing me to join in their activities since I am here by myself. So, I think this will give me a chance to really branch out in this new field and meet some amazing new people. 

Lastly, one of the requirements for the DREU program is to create a web page for my project that can later be hosted on the CRA-W website, so I have hosted a web page on GitHub (previously used for a Software Engineering class group project ... oh well!) to hold my content for that project.         is a link to the page, and I am thinking that later I will move the content back here to stay consistent.

Week Thirty-Three

April 12, 2018

This week I finished creating the reduced networks and calculating their respective network measures. Below are the tables of results

 

Professor Brumberg and I have also decided that I need to get more information about the nodes that were removed to see what kind of patterns arise. This should be able to give us a better idea of why the measures change like they do. So I will work on getting information about the removed nodes, and begin working on the Monte Carlo simulations. I would like to have all of this done and in my presentation for the Undergraduate Research Symposium. (oh yea by the way, I applied to give a talk about this research at the KU Undergraduate Research Symposium and was accepted! So I will be presenting this research on April 28th!)

I will also be working on a first  draft of my presentation which I will first present to Brumberg and Erick on Monday, and then attend a URS presentation workshop to get more feedback. 

Week Thirty-Two

April 5, 2018

Well, last week when I stated that I had my measures calculated for the VoxForge network I may have jumped the gun a wee bit. I was in the process of running my code to calculate the measures and later that night when I was calculating the geodesic distance my computer ran out of memory ... again. This is the computer with 16 GB RAM so I am kind of at a loss as to what needs to happen. I have every other measure but this one, but no matter what I try, I don't have enough space to compute the statistic. 

What I decided to do was reduce the network by removing all words that occur in the data set less than two times. This dropped the VoxForge network size from 14591 unique words down to 9195, so I was able to recalculate all of my statistics (including the geodesic distance) just fine. But, now I can't compare these measures to my other networks because they are not reduced. So, I am currently in the process of reducing all of the CHILDES networks and recalculating all of my measures. 

I figured this wasn't a big deal because I know that I won't run into any memory issues with smaller networks, and it was something that Professor Brumberg and I had discussed earlier in the semester to see how the networks changed. I may do a similar reduction where I remove nodes with degree less than two as well and see what happens there. I've also been wondering if I should be reporting which nodes I remove during the reduction and information about them (their in-degree, neighborhood, ect.) to see if we can find anything interesting but, I have not written any code for that yet and will ask Professor Beckage if it is worth my time.

After I finish the analysis on the reduced CHILDES networks, I will code the Monte Carlo simulation which should not take very much time. In fact, writing code for the reduction has given me a really good idea of what needs to be done for the simulation and it should be very simple to write.

Week Thirty-One

March 29, 2018

 

This week I heard back from MUCSC. I found the email in my KU spam mail box on Monday after break. The email had been sent a week prior (so Monday, the first day of spring break) and congratulated me on my abstract. They invited me to come give a presentation of my project at the conference on April 7 and asked for me to reply by March 21. I was so excited and mad at myself when I found the email because they were impressed with my abstract but I didn't even get a chance to accept the invitation. At the same time, I'm not sure it would have been good timing for me to go anyway. I haven't requested for travel money, I'm still working on getting results, and am dealing with some personal issues at home. So, I am very thankful that they liked my abstract and it gives me some assurance about our other proposals.

I ran the analysis code for the VoxForge network and will start working on the Monte Carlo simulations. In these simulations, for each CHILDES network I will make 1,000 random VoxForge networks with the same number of nodes as the current CHILDES network and compute the analysis statistics that I have been using. For each measure I will create a distribution of the 1,000 statistics calculated from each random network and use the distribution to compare with the measure from the current CHILDES network. If the CHILDES measure has a significant p value in the distribution then the represents a characteristic found in child-directed speech that is not found in normal adult speech.

Week Thirty

March 22, 2018

This week has been fairly relaxing, but I wish that I didn't have to work full time.

Next week I will run my analysis code for the VoxForge network and then begin working on the code for the Monte Carlo simulations that will be used for further analysis of the CHILDES networks.

Week Twenty-Nine

March 15, 2018

 

Our submissions are finally complete! My MUCSC submission went very well, but our Tapia submission was rather stressful. We had a very difficult time getting all of the information we needed to include for both of our projects on just two pages including images and references, but we did it with a lot of help from Professor Beckage and Professor Brumberg. I don't know that we would have been able to submit our proposal without their help and guidance. 

With next week being spring break, I think I am going to take a much needed break and update again with my plan for the following week.

My abstract submitted to the Midwestern Undergraduate Cognitive Science Conference 2018:

Child-directed speech has been hypothesized to aid in the process of language acquisition and may be due to a number of reasons. So, what makes child-directed speech different from normal adult speech and what do these differences mean in the scope of learning language? This paper, we examine the network structure of child-directed speech. We specifically look at whether or not the richness of child-directed speech as a co-occurrence network increases over time. We hypothesize that as a child matures, the parent speech will become richer in both the number of words and the structure of those words, resulting in both larger and more connected networks. While we predict these networks will grow over time, we also expect meaningful differences in child-directed speech as compared to adult speech. Our findings suggest that parents respond specifically to a child's knowledge and learning capabilities and that the differences between younger and older child speech is not only based on the number of words that are being spoken but also the complexity of the structure of those words. 

Click here to see our completed poster proposal for the Richard Tapia Conference 2018. 

Week Twenty-Eight

March 8, 2018

 

Still working on conference submissions, submitted to GH on 7th, MUCSC on 9th, Tapia on 12th

Well this week has been hectic to say the least! I was able to figure out how to compute the geodesic distance with Professor Brumberg's help and run my code for all of my measures on the 0-12 and 12-24 month networks no problem. But, when I tried to do the same for the 24-36 month network my code would fail trying to compute the geodesic distance. I finally realized it was because my computer (which I have had since high school) did not have enough available memory to compute the measure! My computer has 8 GB of RAM and did not have enough space to compute the geodesic distance on top of all of the other applications running on my computer. So I closed as many background applications as I possibly could as well as all of my opened applications (like my email, Microsoft Word, ect.) and I got my available space up to 6 GB. I also removed all unnecessary variables from my RStudio environment to maximize my memory as much as possible. I tried running my code again and there still wasn't enough space. I emailed Professor Brumberg who quickly responded trying to help me connect to one of the servers in his lab space, but that also was not working. I ended up having to use my work computer which has 16 GB of RAM available to finish my computations with just enough time to finish formatting my proposal, inserting my results section, and creating my results table. 

Because of all this trouble I didn't even try to compute the statistics for the VoxForge network since it is larger than the 24-36 mo. network and just used my CHILDES networks in my proposal.

Click this button to see my completed proposal.

 

Now that I have this proposal done, it should make the others easier to complete. My MUCSC abstract is due tomorrow and my Tapia proposal with Erick will be due this Monday.

Week Twenty-Seven

March 1, 2018

 

This week I have made more progress on my results. I can compute the maximum degree, the average degree, and the clustering coefficient of a network and have done so for the 0-12 mo. network, but I am still working on calculating the average geodesic distance. I have also found a better way of plotting the networks. The plots will not contain labels for the words but will still contain all of the graphs disconnected components. Below is an example from the 0-12 mo. network.

 

 

As Erick and I continue working on our results this week, I think we will also start talking about what work needs to be done for our individual and joint submissions. For myself, I will be submitting to Grace-Hopper which is due March 7th and to the Midwestern Undergraduate Cognitive Science Conference which is due March 9th. 

So, at the end of the week I should have all of my results from all of my networks complete so that I can write my poster proposal for Grace-Hopper. It will be a three page proposal that is almost like a mini conference paper. I will talk about the background to my project and how it is unique, my methods, my data, my results, and my future work. 

Week Twenty-Six

February 22, 2018

 

I feel like I've made good progress this week. I have my networks, some of my measures, a histogram of the word frequency, and a plot of the differences between the word frequencies. I am still working on the degree distributions, more measures, and plotting the largest connected component but hopefully those will be done soon as well. Below I've included some sneak previews of my data.

Here is a plot of the 0-12 month network. It is very hard to see right now because the nodes are too close together and the edges are not dark enough, but here you can see that the network is very disconnected and most of the connections that we want to depict are all very tightly clustered together in the center.

Next is a Histogram of the frequency that a word was used some number of times. The x-axis is the number of times a word was used (i.e. we could look at words that were only used once in the entire data set, then we would look at 1 on the x-axis). The y-axis is the frequency that words were only used x number of times. For example, there was a frequency (a count) of over 700 words that were only found once in the data set.

Next, based on our histogram, we wanted to find a threshold of the number of times a word was used to remove insignificant data. So, we can measure the difference between the frequency of n times and n+1 times to measure the gaps between our data points. This should give us a plot with a nice 'elbow' where the differences become steady and where we can find our removal threshold.

So in my plot we can see that 1-2, 2-3, 3-4, and maybe 4-5 have large enough differences that they could be insignificant data, but I will have to do some t-tests to find the threshold. 

Week Twenty-Five

February 15, 2018

 

I heard back from Professor Beckage earlier this week and was able to fix my code with her help. Now I can get my networks and start getting their measures and degree distributions. This should give me some high level results to work with in our paper, and I should be able to plot the distribution of word usage to clean up the data a bit more. During our meeting with Professor Brumberg, he mentioned that one thing we could do in our papers is talk about what we can learn about the data from all of the cleaning that we've had to do. Specifically, one of the questions I can ask is what is special about all of the words that only appear once in the entire CHILDES data set. What are they generally connected to? Are they more generally non-words or words that are just less common? Do they change our network statistics when they are removed from the graph? And, if so, how?

Also, we were able to discuss what the game plan is for conference submissions. We will stick to our original goal of at least one individual conference submission for both Erick and myself, and then one joint conference submission later on. For myself, I will submit to Grace-Hopper, per the CREU's request, but I will also submit an abstract to the Midwest Cognitive Science Conference that will be held in Indiana this year. My submission to Grace-Hopper will focus more on the application I created to more easily work with the CHILDES data, and the algorithms I create to analyze the networks, and any models that I create to represent the data. In contrast, my abstract to the Midwest Cognitive Science Conference will focus on our exploratory analysis and our findings with the network measures and models. Then, for the joint paper, there are a number of Linguistic and Cognitive Science Journals we can try and submit a paper to later in the year, or we can submit a joint poster to the Tapia Conference. But, Professor Brumberg told us to focus these next few weeks on getting meaningful results and less on where we will submit, so that's what we'll do.

Week Twenty-Four

February 8, 2018

 

I have started working on my network analysis code in R Studio and am running into some trouble. I can read in the csv files that I created containing the matrices, but when I try to use the function graph_from_adjacency_matrix to get my network object I get a weird type error. I've been fiddling with it long enough so my next step is to get in contact with Professor Beckage and see what needs to be changed in my code. Hopefully it will be a simple type conversion and it won't take long to fix.

The other thing I've been working on is organizing a list of conferences for Erick and I to submit to including all the materials needed for each submission and the date and time the submission is due. We were both disappointed that we couldn't make the CogSci deadline, but there are more opportunities ahead and I don't want us to miss them. So, for our next group meeting I think we will discuss where to try and submit to next. 

Week Twenty-Three

February 1, 2018

 

Neither of us were able to submit to CogSci. We worked so hard, but we couldn't get our results in time. 

I think I need to take a break for a couple of days and catch up with my school work. So, we still start again fresh next week.

Week Twenty-Two

January 25, 2018

 

We are swiftly approaching our deadline and both Erick and I are working overtime to try and get our results. We met with Professor Brumberg on Friday and Erick seems to be having a similar problem as I do. We need to be able to distinguish words from non-words and proper names within the CHILDES data, but Professor Brumberg has argued that we don't need to this right away. Erick can still get phonemic information from the non-words, and I can determine a minimum threshold of the number of times a word is used in the data and ignore any word below the threshold. In order to do this I must plot the degree distribution of the whole CHILDES data network.

So, items I need to finish for the paper:

1. Calculate the standard deviation for the age, the number of words, the number of unique words, the mean length utterance, and the mean length word for each data set

2. Edit and add to the introduction section

3. Read-in and analyze each network matrix, generating network measures for each

4. Plot the degree distributions for each network

5. Write the results section comparing the network measures from each network

6. Write the discussion section discussing the implications of the results and the continuation of the project including problems that we have faced so far. 

It's a lot to accomplish in 6 days, but all I can do is try.

Week Twenty-One

January 18, 2018

 

We have not met this week yet since Professor Beckage is in the process of moving, but we are planning on meeting sometime tomorrow. I have been very frustrated this week trying to finish getting the co-occurrence network adjacency matrices to work because the data is still fairly messy. There are many utterances with strings that are either not words or are made-up words I don't currently have a way of detecting what is and isn't a word that can be used in the matrix. The best I have done right now is to get a list of all the current strings included in the scraped data and go through them by hand to pick out what is and isn't a word and then go back to my scraped data and rerun it so it doesn't include the non-words. There is also the issue of whether or not I include made-up words and names in the matrices. I am leaning towards no for both at the moment, but I feel that if I leave them out from the scraped data then my co-occurrence data may be off a bit. I also do not have a way to stem or lemmatize the words I get from the scraped data.

I think what may have to happen is to finish going through the complete list of words used in the CHILDES data and make a list of strings that I will not allow in the scraped data including non-words, made-up words, and names, and I won't worry about stemming or lemmatizing the words that are left. It is going to take a while to go through all 15,000 words but it is the only answer I can think of.

Week Twenty

January 11, 2018

 

We met this week on Wednesday instead of Tuesday (because of the weather), and talked about how Erick and I need to edit our papers, what needs to get done in the next week, and when we will be meeting next.

As for my paper, I am a couple pages under the requirement and need to bolster my introduction with more information about networks and their measures. Professor Beckage also mentioned that since these are separate papers, I do not need to mention Erick's work or compare any of my results to Erick's. It makes the paper a bit simpler and easier to follow, but it means that I will lose a large chunk of my introduction. So, I am planning on rewriting the introduction section all together to talk more about what we are trying to accomplish specifically in this experiment, why we are focusing on child-directed speech, and how network analysis methods can help us achieve our goal.

Over the next week I need to finish my code and create the following co-occurrence network adjacency matrices: one matrix from all of the CHILDES files, matrices for each of the age groups (excluding 0-12 because there isn't enough data), and a matrix from all of the VoxForge data to be used to make random networks controlling for size. Once we have the matrices we need, data analysis should be fairly straightforward and hopefully won't take too much time. Then Professor Beckage and I will be able to write the results section together quickly and get a couple edits in from Professor Brumberg. 

Our next meeting is currently scheduled for next Tuesday.

Week Nineteen

January 1, 2018

 

Merry Christmas and Happy New Year! I am taking a break this week to spend time with friends and family and will be back on the fourth for a two day training program at work and my birthday on the fifth! 

Week Eighteen

December 28, 2017

 

This week I have all of the data scraped to just the parent speech and time stamps and have included low level data like the sex of the target child, the number of words used, the number of unique words used, the mean length utterance (MLU), and the mean length word (MLW) at the beginning of the text file. I have completed my first draft of my individual submission to CogSci and will continue editing it, and will also continue working on creating the software to get the co-occurrence networks. Our group did not meet this week but we have kept in touch and will continue to do so after the holidays. 

This coming week will be very busy and I do not anticipate being able to work on this project very much unfortunately, but I am hopeful to finish everything I need to by the end of January and submit the paper before February.

Week Seventeen

December 21, 2017

 

We met on Tuesday for our last meeting of 2017. We began by presenting our code and results as requested from last week. Erick's code is able to go through all of our current transcription files and determine if it is usable data based on the child's age, the speaker tags (must have MOT, FAT, or PAR in the file), and the presence (or absence) of time stamps. My code is able to go through files that are considered usable data to scrape lines and time stamps from the speakers we care about and pipe them to a new file. Together, we have code that can identify and format our usable data. Below is a portion of my project (the code to scrape the files) and a sample output file.

public void Scrape()
        {
            DirectoryInfo outputdir = new DirectoryInfo(OutputPath);
            DirectoryInfo data = new DirectoryInfo(InputPath);

            foreach (DirectoryInfo corpra in data.GetDirectories())
            {
                foreach(FileInfo childFile in corpra.GetFiles())
                {
                    string curoutpath = OutputPath + "\\" + corpra.Name + "_" + childFile.Name;
                    if (System.IO.File.Exists(curoutpath)) System.IO.File.Delete(curoutpath);

                    System.IO.File.Create(curoutpath);
                    StreamWriter writer = new StreamWriter(curoutpath);

                    string filepath = InputPath + "\\" + corpra.Name + "\\" + childFile.Name;
                    StreamReader reader = new StreamReader(filepath);
                    string line = "";
                    while((line = reader.ReadLine()) != null)
                    {
                        if (line.Contains("*MOT:") || line.Contains("*FAT:") || line.Contains("*PAR:"))
                        {
                            line = line.Substring(6);
                            writer.WriteLine(line);
                        }
                    }

                    writer.Close();
                    reader.Close();
                }
            }
        }

what's the matter ? 3146391_3147224
that's a much better idea xxx . 3147224_3160166
he talks to himself . 3353635_3356973
what are you looking at [!!] ? 3534418_3537573
at ? 3537573_3538593
what's that Peter ? 3538593_3549079
what does she have in her hand ? 3549079_3550229
frisbee [>] . 3552151_3553168
no more pennies (.) you don't have any in your pocket (.) all_gone
where's the lady ? 3710723_3711473
is she gone ? 3711473_3712392
<what happened to the lady> [<] ? 3713487_3715173
are you playing with my feet ? 3747550_3748550
seat ? 4135129_4136162
get out ? 4203381_4204031
down ? 4204031_4205067
down ? 4206436_4207069
should we wipe your mouth first ? 4207069_4209325
not your tongue (.) your mouth (.) you gonna stick out your tongue
wash this out . 4217406_4221594
what ? 4222614_4223280
there's no more milk . 4223280_4224700
why don't you let me wash out the cup (.) do you want to wash out
cup (.) wash out the cup ? 4231997_4234869
here (.) let me wash out the cup . 4234869_4245389
there is no more (.) want some juice ? 4255558_4257510
milk is yyy . 4258227_4259413
do you want some juice now ? 4267230_4292959
xxx . 4294259_4305014
yeah [>] . 4305747_4306313
what ? 4307616_4308132
what ? 4308132_4309568
(.) <you finished> [>] ? 4309568_4311520
you want more ? 4312636_4313939
don't spill it (..) you want more ? 4317111_4324171
put your cup down (.) do you want more ? 4325104_4327806

do you want more ? 4327806_4328456
put your cup down . 4328856_4329972
I'm not gonna pour it there (be)cause I'll spill it . 4329972_4333414
put your cup down . 4333414_4337402
I'll pour you some juice (.) put it down . 4337402_4339621
oh (.) you want (.) to put that back in (.) oh I see . 4339621_4342860
what ? 4345746_4346549

The code is just a portion of the GUI that I am creating to make it easier to work with the CHILDES data, and the sample is part of one transcript for one session with a child. The lines are everything spoken by the child's parents and are our child directed speech. These data will become a semantic network and a phonological network and part of a training set for Erick's speech recognition model. 

Now that we have our usable data we can start designing models and creating algorithms that will serve as statistical tests to analyze our data. Our first step will be to create our models. For Erick that means training his system with the data sets, and for me that means creating networks for the data sets. The next step will be creating our null model (or starting our null simulation). 

The null model will be built with the following steps.

1) take an observation from our data (a network/ network measure)

2) find a new population from a normalized source (our VOX data representing normal adult speech)

     repeat this 1000x to create a normal distribution

3) is our observation unique compared to our distribution?

We will be able to do this multiple times and focus on how we can change our normalized model to better represent our observations. This simulation will most likely be the bulk of our papers, but later we can also separated our observations into the three time ranges (0-12, 12-24, 24-36) to see how the model changes over time. So, goals for the rest of the year and January is to finish the parser GUI to build any network we may need, and then to use R to analyze the networks and our null model, and finally to write and submit a paper to CogSci.

Week Sixteen

December 14, 2017

 

This week has been especially busy and stressful. We get closer every day to the end of the semester and thankfully tomorrow is the end for me! But, the research continues and now I will have more time to get some real work done for this project and hopefully make next semester a bit easier. 

Our group still met this Tuesday in Professor Brumberg's lab. We discussed availability and whatnot over the break and our game plan for the next couple of months. We started to make a list of tasks that will need to be completed for our CogSci submissions which we will refine next week before we part ways. One thing that is certain, we must have all the data that we can use formatted and available before Christmas, that way we can begin our analysis over the break. To complete this, Erick and I have been assigned separate portions of the 'data formatting' goal to be completed before our next meeting this coming Tuesday. We must also start our own LaTex file using the CogSci style sheet with the goal of having our first drafts done and sent to our professors by December 26. 

Even though the semester will be over soon, I don't think Erick and I will slowing our momentum towards our goal.

Week Fifteen

December 7, 2017

 

This week we met with Professor Brumberg on Tuesday and gave our outline presentations. Professor Beckage could not make our meeting this week unfortunately, so I think that we will be meeting with her next Tuesday. 

The meeting went well and both Erick and I got great feedback from Professor Brumberg, and I feel like these presentations will make it easier to write our papers because we already have a decent outline and a bit of content to start with. Over the break the plan is to complete several drafts and edits with as many people as possible to get as much feedback as possible. We also decided it would be beneficial to describe our overall project in each of our papers to give more context to our individual parts of the project. 

We were also able to figure out how to extract time stamp information from the transcription files!!! We were fiddling with the CLAN program during the meeting and found a setting on some of the files that 'expanded bullets' which gave us line by line start and end time stamps in milliseconds. It was crazy that we had never noticed it before, and it was definitely a relief to finally get that information. So, I will have to do some more digging to see if I can get the bullets on all files just by syncing the transcriptions with their audio files. If I can't then I will have to add more parameters to my code that searches the data to determine file by file if we can get the time stamps using the CLAN software.

Week Fourteen

November 30, 2017

 

This week Erick and I met and discussed what is left to do for this semester before we go on break. In our meeting with Professor Beckage and Professor Brumberg we were asked to give 15 minute presentations as a first outline of our individual papers, so Erick and I first discussed how we would go about structuring and preparing our papers and then shared our progress and got feedback on our presentations for this Tuesday. 

Then we discussed how we would go about separating our data so we can begin working with it. Professor Beckage has asked us to identify 100 hours of adult speech, preferably parent speech, for three different age ranges: 0-12, 12-24, and 24-36. At the moment I can search our data set for transcriptions with parent speech and for a specific age range, but I have no way of telling Erick when a parent speaks just from the transcription text file. We know that the CLAN software we are using can track where we are in a transcription while its audio file is playing, but we don't know how to extract that information so we can extract only the parent speech from the audio file. We spent most of our time today searching google for an answer and came up short, so right now I think we just have to wait until Tuesday to ask our professors how to deal with this problem. 

Once we are able to find our 300 hours of audio data, then I believe that early results will come easily and we will be able to start our results and discussion sections for our individual papers. 

Week Thirteen

November 23, 2017

 

Happy Thanksgiving!! This week Erick and I will be taking a much needed break from our classes to spend time with friends and family, and hopefully catch up on some work. But, next week we will be up and running again with some new progress on our project!

Week Twelve

November 16, 2017

 

This week Eric and I met and talked more about our last meeting to make sure we understood what we needed to do and we came up with a short list of questions for our professors to better understand how the project would eventually come together. We also talked about how we were going to store all of the mp3 and mp4 files since the computer he got at ITTC doesn't have enough space. He has also mentioned that he was fiddling with MATLAB and figured out an easy way to find how many hours of data we have, so now I don't have to finish my script in c#.  So, I will continue working on my parser.

Week Eleven

November 9, 2017

 

This week when Erick and I met with Professor Beckage and Professor Brumberg, I realized that what Erick and I had discussed two weeks ago and what I most recently posted was kinda wrong. I say kinda because we do eventually want to be working with child language, but that is not what we can do now. So, for our project we will be focusing on ADULT speech and trying to understand the difference between normal adult speech and impoverished adult speech that is directed towards children. Our meeting was more focused on the machine learning model this week so I will do my best to report what we talked about accurately. So, Erick will be training multiple models. The first he will train on his VOX data which is of normal adult speech with a certain number of passes (learning stages) through his algorithms. He will do the same thing for the adult speech in the CHILDES data which will be our child-directed speech with the same number of passes. Then we will be able to compare how many words each model has learned and when they learned them. We could also compare a model created from the VOX data with fewer passes to the impoverished data in the same way. 

Then Professor Beckage said that I could compare semantic networks representing the same types of adult speech. I can create co-occurrence networks for the VOX data and the CHILDES data and after I create those, I can find the network that is the union of those two. Then I can also create reduced VOX and CHILDES networks that only include words and connections from the union network and compare all of those, the VOX, CHILDES, reduced VOX, and reduced CHILDES networks.

Then one thing I'm not completely sure of at the moment is how we will be able to put what we have together or at least compare what we have. 

Week Ten

November 2, 2017

 

Last week (and briefly this week) Erick and I met to talk about all of the options we have in terms of a focus for our project. For most of time we discussed the different options just to rehash our mutual understanding of the options. What we decided was kind of a two step process. 

    Step 0: Finish the things we need to do to be able to use, view, and manipulate our data. Because if we can't do that then we have bigger problems.

    Step 1: Summarize our data and track how our respective environments change over time. So, Erick will be tracking changes in the sound signal and trying to find measures that capture the richness of individual children's phonetic environments. Similarly, I will be tracking changes in the semantic networks and finding measures that capture the richness of individual children's semantic environments. Then we will be able to look at both of these environments as they change together. There are some things that we already know will happen. For instance, we know that a child will be more dependent on it's phonemic environment when it is very young, and then gradually its semantic environment will grow and change, and then at some point (maybe once all of the phonemes of the child's language are known and can be said easily) the semantic environment will become more important than the phonemic environment. What we don't know is how both environments grow while interacting. When neither environment is more favorable than the other, how do they interact? Do they grow in richness at the same rate? Do they feed off of each other in order to maximize the child's learning? 

    Step 2: Based off of findings from step 1, we can decide to explore additional questions: How does the adult language differ from the child language over time? Are parents' language adapting based on the age or learning level of their child? How does the vowel space change over time? Does it correlate with changes in the phonemic space overall? How quickly does is change to match that of the adults?

 

All of these questions and more can be added to our research once we have a firm understanding of step 1. 

It is all very exciting to me and I can't wait to start seeing results and trying to understand what exactly is going on, but there is much more work to be done before then.

Week Nine

October 26, 2017

 

In this week's meeting I showed Professor Beckage and Professor Brumberg what I learned about using the CLAN program using the                   , and gave them a small demo. We also talked a lot about what we could do with our data and how we can narrow our focus from our proposal to write a paper to submit to CogSci in February. For the paper, we could focus on the difference between the adult language and the child language for every child, how children's vowels change over time, how adult language changes to best accommodate the child over time, or if the richness of the sound signal increases at the same rate as the semantics. So, Erick and my task this week is to come up with more ideas and then decide what our focus will be. In addition to our focus, we will be summarizing our data which will include summary statistics and information about frequency changes in the number of words they use, the semantic variance, and the number phonemes in the sound signal as a function of gender and age. 

So, our main goals this week are to decide our focus and continue updating our Literary Review spreadsheet. My other goals for this week is to continue working on my parser to separate the adult language from the child language in the transcripts, and to use the CLAN program to get a high-level view of our data and see if there is a correlation between the mean length utterance (MLU) between the children and parents and maybe even the child and siblings if I can find it in one of the studies.

Week Eight

October 19, 2017

 

I realized this week that it's been two months since the official start of this project and part of me feels like we haven't really done anything! Erick was kind of feeling the same way, like we aren't making any progress so we brought it up to Professor Beckage. She told us that research can be slow and that all of our preliminary work and all of our reading will allow us to do better research and that we shouldn't rush it. At the same time I don't want to get too comfortable and get in the habit of putting things off like I feel like we have been, because then the end of the project will jump on top of us and we'll have to scramble to finish and end up with a project of lesser quality. 

I guess we just have to be able to find a balance: continue pushing through and doing good work, but don't stress or compromise the finer details. So, I'll keep reading, and talking with Erick and Professor Beckage, and doing what I can. *And this has been Life Lessons During A Stressful Week with Rebekah.*

Week Seven

October 10, 2017

 

This week's meeting was a bit shorter than normal. Eric and I showed Professor Beckage and Professor Brumberg all of the data we found in the CHILDES Database with audio and video and they helped us get rid of a couple sets that we wouldn't need. So, we have our data set and now we can start to work on it. I will try and write a small program to determine how much audio we have and then try and create a parser in C# or C++ to go through the data transcriptions and separate the adult speech from the target child speech. We also did a demo for the CRAN system that is used by the CHILDES Database to view information. The one on their website is much more user friendly, so I need to figure out how to get CRAN on my computer to work the same way and it will be very helpful. I also need to keep reading and update a spreadsheet that I created to organize our Literary Review papers. Hopefully, the next two weeks won't be too terribly stressful as midterms begin but we will see how it goes.

Week Six

October 5, 2017

 

This week I was tasked with creating an ipython notebook with R capabilities so that I can share the code I write to build our networks more easily. I thought it would be a trivial process but was mistaken and had to take to the internet to figure out exactly what software I needed, what version worked best with the other software, and what packages I needed to include. Not something I would consider fun, but I was able to find some help and get it done.        is the link to the page that was most helpful. I don't have any code in there yet, but I'm sure it will be helpful once I start making networks and want to attach a link to my work. The only other thing Erick and I were asked to do was find 'style files' for CogSci papers and submissions. I couldn't find anything like what Professor Beckage was talking about, but I'm sure I'm just not understanding completely what she wants and looking in the wrong place.

The Richard Tapia Celebration of Diversity

September 29, 2017

    Last week I attended                                                                             hosted in Atlanta, Georgia. The conference is focused on fostering inclusion within the field of computing and giving underrepresented groups the chance to find resources and network within the community. This was my first time at the conference and my first time traveling outside of Kansas for a large conference, so there were new experiences all around! The conference took place over four days, from the 20th to the 23rd of September, and was filled with workshops, panels, keynote speakers, a career fair, and tons of networking. There was no way I could attend everything they had planned, but what I did participate in was always interesting and full of information, so I will highlight a couple of the activities I enjoyed most during the conference. 
    The first activity was a panel on Artificial Intelligence and Social Responsibility. The part of this panel that I appreciated most was when the panelists talked about the different views of AI from the public’s perspective and from our perspective as Engineers and Computer Scientists: they differ much more than we realize. So, it is our job to create better communication between us and the public in order to reduce the misuse of AI software and data produced by it. The next activity I wanted to mention was a talk given by the keynote speaker Avani Wildani. Her speech, titled “New Interfaces in Neural Computing”, was about the connection between computers and our brains. She is currently working on learning about the brain’s structure and inner workings to make better computers that are more efficient, faster, and more powerful. Her talk really resonated with me because of my background in Linguistics and my interest in learning about the brain. I was able to talk with her after her speech and she really encouraged me to pursue learning about the brain as a serious part of my future graduate studies.
    I was so lucky to have gotten the chance to attend this conference, and I have recommended it to several people since returning to Kansas. This conference has made me want to attend more conferences and do more research so I can share my knowledge with others and build upon the knowledge of others. Hopefully I will get the chance to go again next year.

Week Five

September 27, 2017

 

Yesterday, my team and I held our third group meeting. It felt much more relaxed and productive than our first two meetings, and Erick and I decided that we will meet more regularly to stay on track with each other's progress. We discussed many things in our meeting, which I will outline below as they pertain to me, and as I finish tasks I will add links and images to document my progress.

    1. We decided to start with the             data set. Erick will work mainly with the audio and video recordings and I will work with the data transcriptions. We both will be working on a catalog of the data that will include how much video and audio we have, how many children and parents we have, and the age of the children. We will aslo need to narrow which researcher's data sets we use, since there are several in the database. 

    2. I will begin coding a parser for the transcription files, and a program to create adjacency lists that will represent co-occurrence networks for the parent's and the children. Then, the adjacency lists will be handed to either an R program or to          to compute statistics and visualize the networks. Since, I will have to separate the parent's speech from the child's speech in this program anyway, I will also keep track of how many words each person uses. Then, later we will be able to see how much the parent speaks compared to the child in any given file and how the parent's speech influences the child's speech.

    3. One of our goals is to submit a paper to           in January. It will most likely be a literary review of our field where we highlight the interaction between my high level network representations and Erick's low level speech recognition and machine learning. Hopefully we will be able to show what each brings to the table and how both sub-disciplines supplement each other. We plan plan on having an outline of the paper finished by the end of October.

    4. A personal goal of mine is to start reading research papers more regularly and creating a repository of summaries and findings from these papers. I will probably use a Google spreadsheet to organize my papers and later use it to write our Literary Review. My plan is to read a paper every Monday, Wednesday, and Friday for a total of 3 papers a week, which should give me a substantial amount of resources to pull from for the conference paper. 

So, those were the main takeaways and tasks from our meeting! I hope I can get most of it done in the next two weeks before our next meeting. I also plan on posting a short essay I wrote for the Professional Development portion of my Computer Science Senior Design class on my experience at the Tapia 2017 conference. And lastly, here is a link to           blog.

Week Four

September 20, 2017

 

So, this is our fourth week on our project but I am going to talk about our second meeting from last week on September 12. We are still looking at all of our options for data, but I could tell that Professor Beckage was leaning towards just using the CHILDES data set. Along those lines, we also need to figure out what kind of networks we want to build and what they will tell us about the data. The most popular option from our discussion has been co-occurrence networks which give us both syntactic and semantic information from the individual's language system. So, at some point I will need to decide how I am going to go through the data to create these networks. Professor Beckage also mentioned that we should start with the children's networks since they will be smaller and organize them into age categories. She also suggested that I do this in R or in python and I will probably build the networks using one of those languages. The last thing she told us was to make sure that our blog posts were detailed and specific: so far, so good.

Week Three - Team Picture!

September 15, 2017

Erick Oduniyi, Jon Brumberg, Rebekah Manweiler, Nicole Beckage

Week Two

September 7, 2017

 

Second week in, and it's been pretty slow. The four of us didn't meet this week, but Professor Brumberg gave Erick and I a paper to read about LENA which will help Erick understand what is ahead and the methods used in this type of research. I actually felt like I understood more of the paper than I initially expected and it was surprisingly interesting. Hopefully other papers that our professors give us and papers that we find will be the same. On that note, I am starting to compile a list of papers to read myself in addition to anything I am given and find a way to track our progress.

Week One

September 2, 2017

 

This is the first step into my first real research project. This is a team project with Erick Oduniyi, Professor Jonathan Brumberg, and Professor Nicole Beckage, supported by the Computing Research Association. The main theme of our research will be understanding the organization and development of language in young children using computer science methods. I will be focusing my research question in the next couple of weeks and will be using Graph Theory and Network Analysis to study it. We hope that this project will allow Erick and I to utilize each other's questions, data, methods, and results to produce larger findings. 

We held our first meeting August 29 to begin our journey. We discussed possible research topics inside our main theme and available data sets we could utilize. To prepare for our next meeting, I have started a reading list that will allow me to explore the semantic development of first language acquisition as well as learn more about the methods of network analysis I will be using in the project. By the end of the month, it is my goal to have a solid research question and the beginnings of a literary review for the fields in encompasses. 

bottom of page