View Full Version : data mining
I'm ignorant of the topic but read of it's relevance for Wurmser/Maloof (Doug Feith, OSP, piple line to CHeney) to find links between Saddam and Al Qaeda.
I wonder if GW was using warrantless searches in this context, that the technology for filtering millions of communications makes a blanket permission slip appear more efficient.
Listening to GW this morning he commented on how "the enemy" has benefited by the leak of the NYT. Protection of methods and sources.
Ian McColgin
12-19-2005, 10:31 AM
It all seems like six degrees of seperation from Kevin Bacon.
Ian, I think that folks unfamiliar with new technologies will tend to use it no matter how effective because it's easy to project ones desires onto ignorance.
Reagan wanted an umbrella to protect us from ICBMS.
CPES instead of removing wood.
Predators armed with a couple of small missles piloted from 1000miles away.
In some situations the technology is appropriate, in some it's not.
But if your desire and need is great then it's easy to make decisions based on the advice of the 'experts'.
Oyvind Snibsoer
12-19-2005, 10:46 AM
Data mining is, simply put, the task of extracting meaningful information from large amounts of data. It's usually used to gather statistical information and look for trends.
Data mining would be useful, say, if you wanted to analyze the periodic fluctuations in email traffic between countries, but would be useless for extracting any specific email.
What's the connection? Do you have a link?
Oyvind, if you had access to the contents of 10,000,000 emails wouldn't analytical tools for finding key words, phrases or types of dialog be useful?
I'm wondering if this is an area where we can be susceptible to group-think where theoritical rewards are the basis for action.
Where Rumsfelds Team B ratchets up the fear of the unknown so much that possibilities become probabilities.
And civil rights take a second priority to survival. If we're at war and the enemy is ready to strike at any time it would make one feel that our survival is at risk.
[ 12-19-2005, 12:00 PM: Message edited by: LeeG ]
Meerkat
12-19-2005, 11:01 AM
Lee; According to the limited bit of information available, that's how the FBI's "Carnivore" and the NSA's "Echelon" work: by scanning for keywords and phrases.
Words like president, dirty bomb, plutonium, flight, assasination, port, uranium, etc.
PatCox
12-19-2005, 11:03 AM
Well, Meer, you just guaranteed that your post tripped some of those monitors.
You have no privacy rights on the internet, anyway. I have no complaint with that, its just like being in the public square, if you are talking in the public square, people, and even the government, are allowed to listen.
[ 12-19-2005, 12:04 PM: Message edited by: PatCox ]
If a leader was told by a general,,"we've got a new weapon , it's terrible, awful, and will kill 10,000's of thousands at once".....
We'd make a lot of them. Even if it took 'only' 1000 to decimate most of the civilized urban centers in the world we'd make 10,000,,and we did.
Ok,,now we're in a war of ideologies, assymetric warfare and it's intel,,we don't have good intel, they hate us becuase we like McDonalds and don't speak Arabic.
And a general says "we got a new weapon in the GWOT, it'll analyze all the communications everywhere and we can get the bad guys." And the lawyers say,,,"yes,,but it will violate our laws"..and a leader says,,"we're at war,,whatever it takes".
I'm just wondering if the administrations group-think will tear out the wiring in the Whitehouse looking for a rat because the president is too f*****g allied to the ideology of fear.
He's been told that rats are Satans work and if the house has to be rebuilt,,that's the price for survival.
Meerkat
12-19-2005, 11:24 AM
Originally posted by PatCox:
You have no privacy rights on the internet, anyway. Whose idea was that?
If the government isn't lying (HAH!) it's entirely legal to use strong encryption, that the government can't break (so they say!), to protect emails and other internet communications.
GNU-PG is one example of freely avaiolble software for the purpose. http://www.gnupg.org/
Oyvind Snibsoer
12-19-2005, 11:34 AM
Originally posted by LeeG:
Oyvind, if you had access to the contents of 10,000,000 emails wouldn't analytical tools for finding key words, phrases or types of dialog be useful?
...
Certainly, insofar as it would be impossible to extract ANY useful information from such a large population without such tools.
OTOH, If is inherently more difficult to extract meaningful patterns from free text documents than it is to, say, extract information about spending trends from WalMarts databases.
I could go on and on about this, but Wikipedia (http://en.wikipedia.org/wiki/Data_mining) really says it better than I can. Read especially the part about data dredging:
Data dredging
Used in the technical context of data warehousing and analysis, the term "data mining" is neutral. However, it sometimes has a more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist. This imposition of irrelevant, misleading or trivial attribute correlation is more properly criticized as "data dredging" in the statistical literature. Another term for this misuse of statistics is data fishing.
Used in this latter sense, data dredging implies scanning the data for any relationships, and then when one is found coming up with an interesting explanation. (This is also referred to as "overfitting the model".) The problem is that large data sets invariably happen to have some exciting relationships peculiar to that data. Therefore any conclusions reached are likely to be highly suspect. In spite of this, some exploratory data work is always required in any applied statistical analysis to get a feel for the data, so sometimes the line between good statistical practice and data dredging is less than clear. The common approach, in data mining, to overcoming the problem of overfitting is to separate the data into two or three separate data sets (called the training set, validation set, and testing set). The model is built using the training and validation set, and is then tested using the testing set; the procedure can be repeated many times by resampling the data sets, in order to be more certain that a real pattern has been found and that the model is not merely capitalizing on random chance (i.e. overfitting).
A more significant danger is finding correlations that do not really exist. Investment analysts appear to be particularly vulnerable to this. "There have always been a considerable number of pathetic people who busy themselves examining the last thousand numbers which have appeared on a roulette wheel, in search of some repeating pattern. Sadly enough, they have usually found it." 3. However, when properly done, determining correlations in investment analysis has proven to be very profitable for statistical arbitrage operations (such as pairs trading strategies), and furthermore correlation analysis has shown to be very useful in risk management. Indeed, finding correlations in the financial markets, when done properly, is not the same as finding false patterns in roulette wheels.
Most data mining efforts are focused on developing a finely-grained, highly detailed model of some large data set. Other researchers have described an alternate method that involves finding the minimal differences between elements in a data set, with the goal of developing simpler models that represent relevant data. 4
I am wondering if GW desire for secrecy is to protect other "methods and practices" used recently.
http://www.sourcewatch.org/index.php? title=Counter_Terrorism_Evaluation_Group (http://www.sourcewatch.org/index.php?title=Counter_Terrorism_Evaluation_Group )
About a year after the "Group"'s formation, in the October 24, 2002, New York Times, Eric Schmitt and Thom Shanker wrote (http://archives.econ.utah.edu/archives/a-list/2002w43/msg00137.htm) that
"Defense Secretary Donald H. Rumsfeld and his senior advisers have assigned a small intelligence unit to search for information on Iraq's hostile intentions or links to terrorists that the nation's spy agencies may have overlooked, Pentagon officials said today."
"The four- to five-person intelligence team was established by Douglas J. Feith, the under secretary of defense for policy and another strong advocate for military action against Mr. Hussein. It was formed not long after the Sept. 11 attacks to take on special assignments in the global war on terror.
"The team's specialty is using powerful computers and new software to scan and sort documents and reports from the Central Intelligence Agency, the Defense Intelligence Agency and other intelligence agencies.
"The team's current task, described by one official as 'data mining' is to glean individual details that may collectively point to Iraq's wider connections to terrorism, but which may have been obscured by formal assessments that play down the overall Iraqi threat.
"In an interview tonight, Paul Wolfowitz said the members of the special intelligence team 'are helping us sift through enormous amounts of incredibly valuable data that our many intelligence resources have vacuumed up.' He emphasized, 'They are not making independent intelligence assessments.'"
"Although the team was created one year ago, its existence is only now becoming known outside of Mr. Rumsfeld's inner circle as the debate over the administration's Iraq policy intensifies.
"The new team is the latest example of an often contentious relationship between Mr. Rumsfeld and his top policy makers on one side, and intelligence agencies on the other."
[ 12-19-2005, 12:46 PM: Message edited by: LeeG ]
Bruce Hooke
12-19-2005, 11:46 AM
Originally posted by PatCox:
You have no privacy rights on the internet, anyway. I have no complaint with that, its just like being in the public square, if you are talking in the public square, people, and even the government, are allowed to listen.It seems to me that you are mixing up the Internet and email, and oversimplifying things to boot. Of course posts to a public forum such as this one are not private. That's the whole point after all. On the other hand, regular emails are, if not strickly private and secure, at least supposed to be reasonably private. As with a telephone call, there is a reasonable expectation that others will not listen in on our conversation without the proper permission. Of course anyone who is paying attention knows that emails are not very secure and that they should therefore not be used to send sensative information without using some form of encryption, but there is a big difference between knowing that emails are easy to "steal" and knowing that the government might be looking at everything you send.
Furthermore, it is quite possible to set up secure websites that are only accessible to selected individuals. This happens all the time and is an essential part of the way many companies operate, not to mention an essential part of e-commerce.
So, saying you have no privacy rights on the Internet is at best a gross oversimplification and at worst flat out wrong IMOOP.
High C
12-19-2005, 11:53 AM
The Internet is a wide open space, uncontrollable, and not under the jurisdiction of any single body or nation. Communicating on the Internet is like yelling accross the fence to your neighbor. If you have anything important to say, you'd better do it in code, because anyone can overhear.
Meerkat
12-19-2005, 11:55 AM
Originally posted by High C:
The Internet is a wide open space, uncontrollable, and not under the jurisdiction of any single body or nation. Communicating on the Internet is like yelling accross the fence to your neighbor. If you have anything important to say, you'd better do it in code, because anyone can overhear.And to think our president doesn't think the "terrorists" are smart enough to figure that out long before anyone mentioned it in the press?
When the satellite clue phone started ringing, they probably got the general idea. :rolleyes:
Yeeha! it's the wild west out there on the Internet Range,,no rules except the ones you can enforce with your trusty firearms,,yeeeha!
Powered by vBulletin® Version 4.1.12 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.