Mining E-Customer Behavior
Mining E-Customer Behavior
Every day, your e-business customers leave tell-tale footprints all over your Web site. Do you know how to turn their tracks into valuable business information?
By Jesus Mena
Winter 1999

Printer-Friendly Version
Email this Story
Bookmark to del.ico.us
Digg It!

Increasingly, the first point of contact a company has with its customers is at its Web site, where a staggering amount of consumer data can be aggregated for analysis and mining. The Web provides companies with an unprecedented opportunity to analyze customer behavior and preferences. Every visit to a Web site generates important consumer behavioral data, regardless of whether or not a sale is made. Every visitor action is a digital gesture exhibiting habits, preferences, and tendencies. These interactions reveal important trends and patterns that can help a company design a Web site that effectively communicates and markets its products and services. Companies can aggregate, enhance, and mine Web data to learn what sells, what works and what doesn't, and who is or isn't buying.

Figure 1: Web data utilization in large U.S. corporations

Web Data Applications
Marketing 18%
Customer Service 16%
Don't Use Web Data 72%

However, according to a recent survey by Forrester Research, few companies are listening: Of 50 of the largest U.S. corporations, only 18 percent are using their Web data (see Figure 1, above). Why are so few companies taking advantage of this resource? There seem to be two reasons:

  1. In the frenzy to become the next Amazon.com, companies of all sizes and types are scrambling to set up e-commerce sites. They often concentrate on the mechanics of transactional processing, setting up inventory and shopping carts, but fail to plan to use the vast amount of customer data their sites will generate. Most companies fail to see that e-commerce success will depend on how this Web data is leveraged to convert visitors into customers and customers into loyal clients. The Web data generated with a single sale is of more value then the sale itself, because it can lead to a long and profitable relationship with that customer. The goal of marketers today is not to capture market share but to capture a share of a customer over a long period of time. The Web provides an ideal marketplace for doing this.
  2. The process of mining Web data is complicated because of the diversity of the data collected. A single visit to a site can be captured not only on server log files but also in cookies with ad networks or databases created by CGI scripts from registration and purchase forms. One of the challenges to mining Web data is organizing it into a cohesive view of visitors and customers. Most of today's log analyzers and ad networks report on TCP/IP activity and not consumer demographics, lifestyle, values, behavior, and attributes. They are limited to reporting the activity of browsers, not individuals.

GATHERING WEB DATA
Let's take a look at how to collect data on visitors to your e-business. The main sources for Web site data are log files, cookies, and forms:

Log Files. Server log files provide domain types, time of access, keywords, and search engines used by visitors. Figure 2 (below) illustrates the amount of information gathered in a log file. The referer section of a log file provides valuable information about where visitors are coming from. It can tell you what your visitors were looking for when they came to your site by identifying the keywords they used in their search (assuming they found you through a search engine) and what search engine or banner ad they were referred from.

Figure 2: Information included in a typical log file.

Anatomy of A Log File

  1. Internet provider IP address: This can be either webminer.com or 204.58.155.58
  2. Identification field: This usually appears as a dash, "-"
  3. AuthUser: This is an ID or password for accessing a protected area
  4. Date, time, and GMT (Greenwich Mean Time): Thu July 17 12:38:09 1999
  5. Transaction: Usually "GET" filename such as /index.html/products.htm
  6. Status or error code of transaction: Usually 200 (success)
  7. Size in bytes of transaction (file size): 3234
  8. Additional Fields in the Extended Log Format
  9. Referer: search engine and keyword used to find your Web site, such as http://search.yahoo.com/bin/search?p=data+miningư/index.html
  10. Agent: browser used by your visitor, such as Mozilla/2.0 (Win95; I)
  11. Cookie: .snap.com TRUE / FALSE 946684799 u_vid_0_0 00ed7085

Cookies. Cookies dispensed from the server can track browser visits and pages viewed and can provide some insights into how often a visitor has been to your site and what sections they wander into. Cookies are special HTTP headers that servers pass to a browser. They reside in small text files on a browser's hard disk. You can find the cookie value in the last field of the extended log format file. A retail Web site can issue cookies to:

  • First-time visitors to introduce products and services
  • Returning visitors to acknowledge their preferences
  • All visitors at the point of registration in order to associate a cookie with a customer's personal information from online forms.

Cookies are standard components for tracking customer activity in most e-commerce sites. They are used as counters and unique identification values that tell retailers who is a first-time visitor and where returning visitors have been within a site.

Forms. By far the most effective method of gathering Web site visitor and customer information is via registration and purchase forms (see Figure 3, below). Forms can provide important personal information about visitors, such as gender, age, and ZIP code. Form submissions can launch a CGI program that returns a response to the Web site visitor. Forms are simple browser-to-server mechanisms that can lead to a complex array of customer interaction from which relationships can evolve. These customer relationships can evolve into direct feedback systems through which consumers can communicate with a retailer and servers can continue to gather information from browsers.

Figure 3: A Web registration form for collecting visitor information.

Using CGI forms, you can create either relational tables or comma-delimited flat files recording the entries from your forms. These customer-provided information files can be analyzed directly or imported into a relational database such as DB2. It's a good idea to import the files into a relational database as your file volume grows. The database engine not only makes data management easier, but it also handles such issues as integrity, security, backup, and restoration. Having the data in a relational database environment also gives you access to enterprise-strength analysis tools, such as IBM's Intelligent Miner, which can turn your Web data into valuable business insights (I'll say more on this later).

As a Web site retailer, you want to place menus, links, and contests in your home page in order to capture visitor preferences via forms and cookies. The more you interact with your customers, the more information you should be collecting about their needs, values, choices, and preferences. Take care, however, to ask for only the most essential information. No one likes lengthy and intrusive questionnaires. Keep in mind that there are methods and sources for gathering demographic information that don't involve asking for it directly. A ZIP code captured from a contest registration form can provide some demographic data, while a physical address culled from a purchase form in your store can provide valuable household information for subsequent data mining.

Your home page should quickly solicit information about your visitors' needs and offer information about your various products and services. By taking the time to consider the overall design of your site, such as what prompts and links you position in your home page, you can direct the movements of your visitors. In addition, a quick and short registration at the onset of a visit, inquiry, or purchase, can capture important personal information that you can latter enhance and mine. Focus on interacting with your customers to learn what their needs are so you can service them better over time and retain them.

One key to compiling and capturing this shopper information is a unique identifier: a visitor ID number. A proven strategy for collecting key visitor data is to entice new visitors to register at your site with a special service or incentive. Offer access to a special section of your site or have contests and door prizes. The point is that you need them to register in order to set a cookie, which can be used as the unique ID number. From that point, the unique key can enable you to track every interaction with that visitor. This unique key will allow your site to link log files and forms database with your company's data warehouse and other third-party demographic and household information, ad server networks, or collaborative filtering engines.

ENHANCING YOUR WEB DATA
Of the three data sources I've mentioned - log files, cookies, and forms - forms provide the most important customer view because they contain information that can be used to append additional data such as from a data warehouse or a third-party provider.

The kinds of additional data you may want to append include such demographic and household data as a visitor's probable income, the type of car they drive, and the number of children they have. By linking this external information to your Web-site database, you gain additional insight into the identity, attributes, lifestyle, and behavior of your visitors and customers. For example, a ZIP code allows you to provide visitors with local news, coupons, services, and weather while enabling you to discover the demographics of your visitors (see Figure 4, below). Median income, age, presence of children, type of automobile, age of home, and other factors are available when a physical address is known. Various data providers make this information available, and some are beginning to provide their information via the Web. Experian and Acxiom can today match and append the consumer information you capture in your registration or purchase forms in real time. Other vendors of this type of demographic information include CACI and Polk. There is an entire industry devoted to segmenting, classifying, and reselling consumer behavior information.

Figure 4: Zip code collection as a way to gather user demographics.

In addition, new providers of webographics - details on browsing activity, such as length of visits, number of return visits, preferences exhibited by clickthroughs in banner ads - have recently emerged, selling software or services, and sometimes both, for collaborative filtering, relational marketing, and visitor profiling. These new data providers - including Andromedia, DoubleClick, Engage Technologies, Firefly, Manna, Net Perceptions, and Personify - represent a whole new genre of Web companies seeking to capture and generate information about Internet users' behavior and preferences. They use a myriad of solutions to track and profile visitors - everything from proprietary software and databases to commingling cookies via server networks.

Collaborative filtering software such as Andromedia's LikeMinds uses individual purchase history or preferences to find people with similar tastes and make suggestions to shoppers. LikeMinds can help Web sites make personal recommendations and offer direct marketing based on visitors' past behavior. Its Preference Server delivers personalized recommendations based on preferences either explicitly stated by the visitor or customer via forms or implicitly determined by sales records, clickthroughs, or other interactions within the site. Collaborative filtering networks like Firefly provide the same matching functionality over multiple Web sites.

Ad networks such as DoubleClick and Flycast also capture and store webographics. The DoubleClick system tracks user movements among more than 170 sites that commingle their cookies in order to place the appropriate ad to visitors. DoubleClick targets ads based on a user's interests as expressed via their selections in the member Web sites in the ad network. DoubleClick recently purchased Abacus Direct, which manages a database of more than 80 million households and 1,100 consumer mail catalogs. The mix of online webographics and offline demographics will give DoubleClick an enhanced view of consumer behavior.

Webographics are also being captured in proprietary databases from such companies as Engage Technologies. Engage provides member clients with access to its proprietary database of 30 million anonymous behavior-based consumer profiles. Engage tracks the interest and preferences of Web site visitors without tracking their identity. Profiles are based on the content viewed, the time spent viewing, and the frequency of visits. Profiles include identification number, interest category code, and interest score, but no identity.

Another company, Aptex Software, uses both proprietary real-time content analysis techniques and neural networks to predict Web user behavior. Its two main products are SelectCast and SelectResponse. The Aptex profiling technology doesn't store personal customer information; instead, it uses a neural network to profile users based exclusively on their real-time actions and observed user behavior.

Other webographic players include MatchLogic, which collects profiles from interactive sites that track where users go after viewing online ads; Net Perceptions, which offers real-time ad targeting via the use of neural networks, fuzzy logic, and genetic algorithms; Personify, which tracks clickstreams, registration, and transaction data for segmentations and anonymous profile; and Primary Knowledge, which also collects clickstream information from large consumer sites to identify paths buyers navigate to goods and sells these vital statistics to online retailers.

All this internal and external demographic and webographic information can be written to a relational table or a flat file, which can then be linked or imported into a data mining tool. These include automated tools, which have principally been used in data warehouses to extract patterns, trends, and relationships, and new-generation data mining tools with GUI interfaces that are designed for business and marketing personnel. These data mining analyses can provide actionable solutions in many formats, which can be shared with those individuals responsible for the design, maintenance, and marketing of e-commerce and content-providing Web sites.

MINING DYNAMIC DATABASES
Most analysis of Web data until now has involved log traffic reports, which mainly provide cumulative accounts of server activity but do not provide any true business insight about customer demographics and online behavior. Most current traffic analysis software, including WebManage Technologies' NetIntellect, Marketwave's HitList, Sane Solutions' NetTracker, Netrics.com's Surf Report, and WebTrends Corp.'s WebTrends, offer predefined reports about server activity based on the analysis of log files. One of the best logic analyzers is Marketwave's HitList, which uses cookies as part of its report and allows log files to be compressed and prepared for Web mining. These tools, however, deal exclusively with domain names, IP addresses, cookies, browsers, and other TCP/IP-specific machine-to-machine activity.

On the other hand, mining Web data for an e-commerce site yields insight into visitor behavior and profiles, rather than server statistics. Your e-commerce site needs to know about the preferences and lifestyles of its visitors. Data mining in this context enables you to address such business questions as, "Who is buying what items and at what rates?"

You should also know what is selling so you can adjust your inventory and plan your orders and shipping. You need to know how to sell, what incentives, offers, and ads work, and how you should design your site to optimize your profits. Data mining algorithms can search for relationships in Web data to determine if patterns exist that can yield actionable business and marketing intelligence. Data mining solutions come in many types, such as association, segmentation, clustering, classification (prediction), and visualization:

  • Association : using affinity market basket analysis to determine which products tend to sell together (see Figure 5, below)
  • Segmentation: determining distinguishing features of your most profitable customers (see Figure 6, below)
  • Clustering : profiling customers to identify the characteristics of your visitors (see Figure 7, below)
  • Classification/Prediction : anticipating consumer behavior to discover who is likely to make multiple purchases (see Figure 8, below)
  • Visualization : viewing distributions and relationships to reveal what your visitors are purchasing (see Figure 9, below).

Figure 5: Data Association

Figure 6: Data segmentation

Figure 7: Data clustering

Figure 8: Data prediction

Figure 9: Data Visualization

Using a data mining tool that incorporates these algorithms, you can segment a Web site database into unique groups of visitors, each with specific behavioral characteristics. These tools perform statistical tests on the data and partition it into multiple market segments independent of the analyst or marketer and can identify key intervals and ranges in the data that distinguish good prospects from bad ones.

If you're in a DB2 environment and using Intelligent Miner as your data mining tool, you have access to all of these processes. Intelligent Miner performs clustering, classification, and prediction - a form of classification into the future. For prediction, Intelligent Miner uses either a tree induction algorithm or a neural network to predict a field, such as the number of purchases a customer is likely to make. Using a self-organizing map, also known as a Kohonen neural network, Intelligent Miner can be used to segment a population of similar customer accounts. In addition to conducting association analysis to identify items frequently sold in the same transactions, Intelligent Miner can also perform a more powerful sequential pattern analysis to match different transactions from the same customer over time.
Most data mining tools incorporate versions of such algorithms as CART (classification and regression trees), CHAID (chi-squared automatic interaction detection), and ID3 (Interactive Dichotomizer), or its successors C4.5 and C5.0. They segment a database into statistically significant clusters based on a desired output. They generate decision trees, which provide a graphical breakdown of a data set in the form of a map of significant clusters. These tools produce rules that can point out important ranges and characteristics. For example, this rule might point out a higher-than-average propensity to make an online purchase when a particular category exists in combination with a certain number of visits:

	IF
   Last   Sale Category is 
         "Computer Book"
         and Number of Visits is 8.00 
         (average = 5.94 )

	THEN Number  of Total Sales is more 
         than 3.76
         Rule's probability: 0.879
         The rule exists in 2900 records.
         Significance Level: Error 
         probability < 0.01

The process of stratification is automated by data mining algorithms on the basis of the data. For example, a Web-site database created from registration or purchase forms can be segmented by these algorithms to discover the key attributes (domain, referred engine, age, gender, and so forth) that distinguish profitable from nonprofitable visitors.

RECOGNIZING OPPORTUNITIES
Web data mining goes beyond log analysis and ad clickstreams; it focuses on identifying customer attributes and consumer behavior. The goals are generally to find out who is likely to buy your products and services and identify the features of your most loyal and profitable customers so that you can find more like them. Today, sites inundated with data face the challenge of recognizing the patterns of opportunities.

One of the common traits of firms that have traditionally used data mining, such as cellular phone and credit card companies, is that they have mountains of transactional data and compete for customer loyalty and dollars in crowded markets where it costs little for customers to switch to another company. The same description applies to the evolving e-commerce landscape.

The Web is a fast, competitive marketplace in which millions of online transactions are generated (and captured) in log files and registration forms every hour of every day - and that marketplace doubles every 100 days. Online shoppers browse retail sites with fingers poised over their mouse, ready to buy or move on should they not find what they are looking for or should the content, wording, incentive, promotion, product, or service of that site not meet their preferences. Browsers are retained based on how well the retailer remembers their needs and whims. The goal is to know and serve every customer, one at a time, and build long-term, mutually beneficial relationships.

Data mining is the key to customer knowledge and intimacy in this type of competitive and crowded marketplace. In hyper-competitive markets, the strategic use of customer information is critical to survival. In a networked electronic environment, the margins and profits go to the fast, responsive players who are able to leverage predictive models to anticipate customer behavior and preferences.

Retailing on the Web is an interactive process that allows consumers to negotiate, exchange information, and specify and customize the product and services they want from the retailer. For the electronic retailer, it is essential to analyze what consumers are doing and asking for.

As billions of business interactions evolve and organize themselves into revenue streams, subtle transformations occur in the relationships between consumers and retailers in this dynamic marketplace. Mining Web site data with data mining tools such as neural networks and machine-learning and genetic algorithms is an attempt to recognize, anticipate, and understand customer buying habits and preferences in a constantly evolving business environment.

4 Key Web Data Strategies

1. Leverage your data mining findings into your overall approach to communicating with your customers.
• Use segmentation analysis to stratify your email offers to prospects you have identified via your mining analysis.
• Use targeted email to provide incentives only to those individuals likely to be interested in your products or services. Remember that email to individuals who you know little about will be little more than spam.
• Automatically reply, route, manage, and segment email so you can efficiently and effectively respond to your customers through email via direct marketing.

2. Manage your customer contacts as you interact with them online and offline .
• Provide prompt customer service via auto or segmented email.
• Pool together data about customer behavior and transactions as customers interact with you online and offline — through sales calls, meetings, phone, and email inquiries — as they buy your product and services.

3. Track your marketing ad efforts to identify what works and why.
• Monitor which ads are getting click-throughs and which actually lead to sales.
• Develop profiles that include demographics, tastes, and email addresses of your best prospects.

4. Manage your back-end logistics effectively via your supply chain. Close the supply chain in the inventory loop and translate the knowledge of your customer’s tastes and purchases into a quick turnaround by customizing your products and services for them.


Jesus Mena is CEO of webminer.com, a Web data mining company, and is the author of Data Mining Your Website (Digital Press, 1999), a book on how to mine Web data for e-commerce and relational marketing. You can reach him at jmena@webminer.com .


CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 


:: IBM Database Magazine ::