Paritosh Joshi: Statistical Doppelganger

07 Dec,2012

By Paritosh Joshi

 

You know the columnist be facing a serious case of writer’s block if he has to resort to strange German words in the heading itself. Either that or, if you are in a more indulgent mood, maybe you’ll allow for the possibility that there is no other way to express it in the lingua.

 

The Merriam-Webster defines ‘doppelganger‘ as ‘a ghostly counterpart of a living person’. As best as I can tell the only time this word, or idea, made its appearance in popular culture was a 1993 film starring Drew Barrymore as Holly Gooding, “who moves from New York to Los Angeles after being implicated in a murder, pursued by what is apparently her evil twin”. (Source: Wikipedia)

 

By now, you are used to this columnist’s propensity for stream-of-consciousness meanderings but incredibly enough this is not one of them.

 

One of the biggest problems that confounds market researchers is respondent fatigue associated with questionnaire duration. It is generally accepted wisdom that questionnaires that run for much longer than 30 minutes almost always suffer from this problem. The respondent is not alone. Interviewers too suffer from fatigue, indeed even more so, considering that they have to keep repeating administration of the same instrument to respondent after respondent. However, the answer cannot always lie in forcing questionnaire length down by truncating further questions after the 30 minute Rubicon has been reached. Syndicated researches of all sorts, readerships studies for example, have a wide scope of discovery. The Indian Readership Survey picks up detailed demographic and socio-economic variables, product and category linkage and other media consumption behaviour in considerable detail in addition to its primary task: determining print readership for several hundred publications in over a dozen languages. The implications for fatigue all around are easily imagined.

 

When researchers started thinking about this problem, they realized that in any set of responses to an instrument, there were many that bore uncanny similarities to each other. A deeper exploration began to reveal systematic correlations, if not causal relationships between ‘independent’ variables such as basic demographics and ‘dependent’ variables such as consumption of a particular product or media vehicle. By applying this analysis to large data sets, researchers found responses that were near doppelgangers (that word again) of one another.

 

Contemporary syndicated studies involving large discovery areas (implying long questionnaires) have operationalized this learning.Questionnaires are divided into multiple parts. The first part, that picks up all the classificatory variables, typically demo-, socio- and psychographic variables is administered uniformly to all respondents. The other sections are administered to a subset of the overall sample. For instance, if there were two segments beyond the classificatory unit, the sample would be divided randomly into equal sized halves. After all data are in, the following process is undertaken:

 

Responses to questions in the Classification segment are administered to both respondents on the basis of which they become a matched pair. Then, Respondent 1 is administered Segment A but skips Segment B while Respondent 2 skips Segment A and is administered Segment B. Finally, Respondent 1 ‘donates’ his responses on Segment A to Respondent 2 who ‘receives’ them and Respondent 2 ‘donates’ his responses on Segment B to Respondent 1 who ‘receives’ them. This donor-recipient process is called Ascription. Missing data is ascribed and fills in the blanks, in a manner of speaking.

 

Testing of the ascription model involves administering the entire questionnaire to both respondents, then checking the extent to which the actual and ascribed responses vary from one another. A well-selected match would have a high statistical fit. Research agencies around the world have been spending a lot of development time developing such ascription algorithms.

 

Finally, whole data sets are married to one another using a similar process of respondent level matching and ascription. This kind of large scale merging, technically called Data Fusion, is being employed in various markets to stitch Readership, Listenership, Viewership and Digital Media consumption habits together to deliver a comprehensive view of the manner in which multiple media collide and coalesce in the lives of consumers.

 

Here in India, we are at the cusp of a lot of exciting development on Ascription and Fusion. Expect this column to tell you more as it happens.

 

 

Post a Comment 

Comments are closed.

Videos