Tuesday, August 14, 2018

Garbage collection tips in Java

The OverOps blog has a nice overview of Java garbage collection followed by some tips, including writing GC efficient code.

https://blog.takipi.com/improve-your-application-performance-with-garbage-collection-optimization/

Wednesday, March 22, 2017

Dictionary of Algorithms and Data Structures

The National Institute of Standards and Technology hosts a Dictionary of Algorithms and Data Structures on its website.   This is a great resource for students as well as experienced professionals who want to refresh their memory about the exact definition of a red-black tree.

The dictionary is edited and maintained by Paul Black and Vreda Pieterse.  It's quite extensive, with references, bibliography, and a guide for how to cite.

The URL is https://xlinux.nist.gov/dads/ .





Thursday, August 6, 2015

National Consortium for Data Science "DataBytes" Webinars

Check out the National Consortium for Data Science "DataBytes" Webinars, available at:

http://data2discovery.org/databytes-webinars/

The webinars are held on the first Wednesday of every month, excluding July.

According to the NCDS site,  DataBytes is "an NCDS-sponsored lunchtime webinar series that gives our members and the larger data community the chance to discuss the most pressing and interesting issues in data science. "


Saturday, April 11, 2015

About APIs

A recent article in JavaWorld talks about the wide variety of APIs now available for use.  While the concept of APIs is not new, recent "advances in infrastructure, type and variety of programming languages, and abundant computing resources have created new opportunities. "

The post links out to several other interesting article about the challenges and opportunities of API development.

View the article here:  http://www.javaworld.com/article/2899704/enterprise-application-integration/all-eyes-on-the-api.html

Wednesday, September 24, 2014

Watson Analytics Tool

IBM has released a new tool for performing analytics on big data.   The name of the tool is "Watson Analytics," and the name refers to Watson, the famous IBM computer that won on Jeopardy. Essentially this tool allows users to pose natural language questions and get answers derived from large data sets.  An example given in the New York Times article linked below is "What high-value customers am I most likely to close sales with in the next 30 days?"     The idea is that a business user can use the tool without having to talk to a data scientist first.  The author of the New York times article ponders whether this is the "killer app" for Big Data.

Here is the link:

http://nyti.ms/1qbtCnt

Saturday, January 11, 2014

New ways of representing data

The New York Times Bits blog this week has an article titled "A Makeover for Maps" that talks about cutting edge visualizations of data.  The article cites some interesting new depictions of data--such as the one created by Eric Rodendeck of  Stamen Design that creates  "...a representation of how photos spread on Facebook that looks like ice crystals forming on a car window."

The article goes on to talk about the idea of creating visualizations that can be used by different audiences -- like a dynamic chart of Nasdaq financial data that can be read one way by a pension fun manager looking for trends and another way by an SEC investigator looking into an unusual spike in trading during a given day.

Check out the full article with multiple visualization examples at

http://bits.blogs.nytimes.com/2014/01/06/a-makeover-for-maps/?smid=pl-share

Monday, October 14, 2013

Looking at the problems on the healthcare.gov site

You may have heard the news about people in the U.S. experiencing problems trying to sign up for healthcare on the new healthcare.gov site.  This site is perhaps the most visible manifestation of the Affordable Care Act, aka "Obamacare."  On healthcare.gov, users can search the "Health Insurance Marketplace" for coverage.  However, the site has been plagued by reports of slowness and poor user experience. 

Several online articles attempt to surmise what is going wrong with the site from a technical perspective.  Regardless of your personal views on the Affordable Care Act, I think these are relevant for us to read because they provide us a chance to avoid making the same (apparent) technical mistakes in our projects.  I'd like to highlight some general points that I have read so far:

  • Load test both the interface _and_ the back end functionality. 
  • User experience (UX) should not be an afterthought
  • Poor validation logic (eg requiring numbers as part of username/password but not informing user of this beforehand or in error messages) leads to extremely poor UX

For more on the technical issues with healthcare.gov, see the following articles:




And here is a contrarian opinion, stating that the architecture of healthcare.gov is sound, but that the project was rushed:

Friday, September 27, 2013

Google updates its search algorithm

Google has updated its search algorithm for the first time since 2010.  According to this post in the Bits Blog on the New York Times site, this is the most extensive overhaul of the algorithm since 2000.

Although Google is not revealing specific details of the new algorithm, code named "Hummingbird," the main thrust of the changes are geared toward understanding longer and more complex queries.  To that end, Google is employing its Knowledge graph, which is a map of semantic relationships.

One of the drivers for understanding longer queries is that more people are speaking queries into their phones using natural language, which is by its nature more complex than a simple keyword query typed into a search box.

For more information, see:

Forbes article:
http://www.forbes.com/sites/roberthof/2013/09/26/google-just-revamped-search-to-handle-your-long-questions/

Search Engine Land FAQ on Hummingbird:
http://searchengineland.com/google-hummingbird-172816

Google "15th anniversary" blog post (includes a great "Google Search Timeline" graphic):
http://insidesearch.blogspot.com/2013/09/fifteen-years-onand-were-just-getting.html

Google post on the Knowledge graph (2012):
http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html



Saturday, May 11, 2013

YouTube trends map

YouTube Trends map

YouTube has a new feature called "Trends Map" that allows you to see what people in different parts of the country are watching.

Here is the YouTube trendsmap site:
http://www.youtube.com/trendsmap

And here is an article describing the new feature:
http://philadelphia.cbslocal.com/2013/05/11/youtubes-trends-map-shows-popular-videos-by-geographic-location/


Monday, April 8, 2013

"Predatory journals" in science

Interesting NYTimes article today about a disturbing phenomenon in academic research:  "predatory" journals and conferences that charge (unadvertised) fees for publication.  

http://www.nytimes.com/2013/04/08/health/for-scientists-an-exploding-world-of-pseudo-academia.html?smid=pl-share


Tuesday, February 5, 2013

Friday, November 16, 2012

The Role of Data Analysis in "Get Out The Vote" Operations

In this article in The Atlantic, Ben Jacobs describes the failures of the Romney campaign's "ORCA" software platform.  ORCA was supposed to help the campaign identify which likely Romney voters had already cast their ballots.   Thus, they could focus get-out-the-vote operation resources on voters who hadn't gone to the polls yet. 
It turns out that the Obama campaign had tried something similar in 2008, and that system had also crashed.   The Obama campaign learned its lesson from this experience. They came to the conclusion that such an election day system was not feasible.   For 2012 they went with a system called "Narwhal" that targeted likely voters before election day. 

So in a sense the Romeny campaign was on cycle behind from a technical perspective -- they learned a lesson that the Obama campaign had learned 4 years earlier. 

Thursday, November 8, 2012

Nate Silver and Big Data

There are a host of post-election stories on Nate Silver's success in using statistical analyses to predict the outcome of the election overall as well as the the outcome of voting on a state by state basis.  For example, in this article from InformationWeek, Eric Lundquist ruminates on how to apply Silver's success to business.

If you are interested in the nuts and bolts of how Silver uses data to make predictions, here are a few links from Silver's blog on the New York Times site.  One is an explanation of his methodology, the other a glossary of terms:



Or, you can go right to the source and read Silver's book The Signal and the Noise:

Thursday, October 11, 2012

Two Congressional Investigations Into Activities of Data Companies


 There are now two Congressional investigations being held regarding the activities of data gathering companies.  One investigation is targeting so called "data sellers" while the other is targeting similary termed "data brokers."   Both investigations are targeting some of the same companies. 

The "data sellers" investigation is being run by Representative Edward J. Markey, Democrat of Massashusetts, and Representative Joe L. Barton, Republican of Texas.  The investigation was prompted by this article on data gethering company Acxiom in the New York Times and targets companies such as Acxiom and Experian.

The "data brokers" investigation was opened this week by Senator John D. Rockefeller IV of West Virginia, and targets some of the same companies as the other investigation.

You can read more about the investigations at the links below:

 http://www.nytimes.com/2012/10/11/technology/senator-opens-investigation-of-data-brokers.html?smid=pl-share

http://www.nytimes.com/2012/07/25/technology/congress-opens-inquiry-into-data-brokers.html?_r=1

http://commerce.senate.gov/public/index.cfm?p=PressReleases&ContentRecord_id=a42a865a-be30-4171-8278-86ee0a8c76fb



Wednesday, October 10, 2012

Visualization of what AT&T's network looked like on 9/11/01


CNN has a slideshow of graphs that show what AT&T network traffic looked on the days on which events initiated a  high volume of calls and texts. Included are visualizations of network traffic on 9/11/01 as well as the days on which the East Coast Earthquake and  Hurricane Katrina occurred.

See the article on the CNN site:

http://money.cnn.com/gallery/technology/mobile/2012/10/09/mobile-network-traffic/index.html




Friday, September 28, 2012

How Facebook uses Big Data

To get an idea of where Big Data is headed, "there are worse places to look than Facebook," says Derrick Harris, the author of a GigaOM article on Facebook's use of Big Data technologies.

In the article, Facebook's VP of Infrastructure Engineering Jay Parikh talks abouut the technologies that Facebook is using to harness its mountains of data.  These technologies include the usual suspects like Hadoop and Hive in addition to mySql and some home brewed backend and interface pieces.

Read the article at:
http://gigaom.com/data/for-the-future-of-big-data-startups-look-to-facebook/


Tuesday, September 11, 2012

Dark side of big data: "terms of service" agreements

Time Magazine has an article on how the way big data is collected may affect us all.  In order for companies to analyze all that data, they have to collect it first -- from you, when you are browsing and clicking.

http://moneyland.time.com/2012/09/10/big-data-which-major-websites-respect-your-privacy-rights-the-least/

--Mark

Friday, September 7, 2012

Harnessing the body's data

The New York Times profiles several current and upcoming devices that promise to harvest biometric data from the body for the benefit of both the wearer and perhaps for the populace in general.

See the story here:

http://bits.blogs.nytimes.com/2012/09/07/big-data-in-your-blood/?smid=pl-share


Thursday, May 31, 2012

Today's Big Data links

Here are a few posts of interest from the last few days regarding Big Data.  In the Columbia Journalism Review, "When Big Data is Bad Data" concerns the controversy over public posting of teacher rankings based on questionable metrics.  From the Wall Street Journal's "CIO Report" blog, "Taking Small Steps to Big Data" presents views on Big Data that were aired during the recent MIT Sloan CIO Symposium.

"When Big Data is Bad Data"
http://www.cjr.org/behind_the_news/the_press_and_standardized_tes.php?page=all

"Taking Small Steps to Big Data"
http://blogs.wsj.com/cio/2012/05/24/taking-small-steps-to-big-data/



Thursday, May 24, 2012

White House Launches New Digital Initiative

The White House recently announced a new digital strategy, which addresses a broad array of topics such as open government, data quality, and attracting talent, among others.

This strategy is the work of Steven VanRoekel CIO of the Federal government, and Todd Park, the CTO.

VanRoekel assumed the role of Federal CIO after Vivek Kundra, the first person to hold the position newly created under the Obama administration, left last year.  Similarly, Park assumed the CTO role after Aneesh Chopra departed.

Here's the blog entry on O'Reilly's "radar" site (btw as a longtime M*A*S*H fan, I love that the url is radar.oreilly.com):

http://radar.oreilly.com/2012/05/white-house-launches-new-digit.html