Sitting in my second packed room of the Grace Hopper conference! Considering we're still before "official" launch time, I can't believe how many women are here and how packed every session is! Here in my first session in the PhD series, I'm excited to see three PhD students present their research.
An n-gram Based Approach to the Classification of Web Pages by Genre: Jane E Mason, Dalhousie University:
Mason is looking for a novel approach to doing classification of web sites by actual genre - not just keywords. For example, searching for a health condition and only showing you information pages instead of pages by drug manufacturers attempting to sell you something.
Mason chose to use n-grams, because they are relatively insensitive to spelling errors, are language independent, and relatively easy to program. She combines these and then processes them with the Keselj Distance Function, which is apparently "simple", but it has been awhile since I've been in Differential Equations :-)
Mason and her team have been looking at how to let some web pages have multiple genres, which means that some pages end up with no genre - noise! While it's easy for a human to identify a nonsense/useless web page, I think it's pretty cool to get a computer to do this for you, so you won't even see it in the search results!
Ant Colony Optimization: Theory, Algorithms and Applications: Sameena Shah, Institute of Technology Delhi:
I've never heard of this type of optimization, so this was very interesting for me. Shah chose to study this area of optimization because ants don't have centralized coordination and they make great decisions based only on local information. She sees this as a great method to apply to distributed computing. Now, how do we get computers to leave pheromones on the path of least resistance?
Other than the lack of pheromones, another problem she had to solve is that ants don't always find the shortest path - if enough ants have taken a longer path before the short path is discovered, all of the ants in the colony will use the longer path and ignore the short path. Obviously, she doesn't want that short coming in her algorithm :-)
Shah does have a slide in her presentation which shows the statistical "solution", but it's a much more complicated formula than I ever saw in my intro to statistics course at Purdue. :)
Using Layout Information to Enhance Security on the Web: Terri Oda, Carlton Univeristy:
Ms Oda is a woman after my own heart, starting her presentation with a xkcd comic :-)
She starts her talk out talking about different types of security, like secure networks between companies. Oda tells us about how the threat models are no longer obvious: those seemingly innocuous applications in facebook that have access to your private chats on the site and private emails, websites that don't properly protect passwords, and malicious users on the same forums. Her talk moved onto the types of threats she's actually trying to protect you against: cross-site scripting and previously good sites that have gone bad.
She makes an excellent point that most (all?) web pages are done by web designers (aka artists), NOT web security experts and with all their deadlines and basic functionality bugs, there is no time to even think about security. Is it any wonder we have so many attacks and vulnerabilities out there?
but how can we solve this? Schedules will never have enough padding and most people designing web sites did not receive a BS degree from Purdue (where we were told over & over again that security must be designed in from the beginning, not as an add-on)
She's looking at using heuristics to correctly identify different elements on a page so that it's visually evident which components on the page are from the site you're visiting or being served from an external site (like an ad). I can't wait to see how her research turns out, and how much she can protect the user with a simple browser add-on!