Tuesday, 18 August 2015

Caught by the numbers

Data analytics winnows out possible fraudster



March/April 2014

MarApr-caught-by-numbers
The bank comptroller used internal statistics to point the finger at an over-performing loan officer. Learn how you can use powerful data analytics tools to narrow your list of suspects for fraud examinations.

The case in this article, a composite of several similar cases involving data analytics and statistical analysis applied to fraud examinations, is designed to be a tutorial for CFEs. — ed.


Jeff Baker, controller for a large regional bank, entered the room hoping to get an admission of guilt from his suspect. He came prepared. Baker had spent considerable time carefully preparing his questions and planning his interview tactics, and he had the documentation to back him up. Above all, he was proud that he had identified a sizeable straw-purchase and kickback fraud scheme, which probably involved the bank employee in the interview room. Baker credited the identification of this scheme to effective analytical procedures that included some basic statistical methods.

Weeks before, when Baker reported a significant spike in defaults on mortgage loans through the second quarter of 2013, several of the bank’s board members expressed concern over a recently adopted growth strategy. In late 2012, the bank had eased underwriting requirements in an effort to increase market share of residential mortgage lending; a major component of these new underwriting practices was an across-the-board reduction in debt-to-income (DTI) requirements for borrowers.

Board members were worried about impact to the balance sheet. They wanted assurances that toxic, defaulted assets wouldn’t erode shareholders’ equity. Baker, however, wasn’t convinced that changes in underwriting guidelines were the root cause of the uptick in loan nonperformance. He was aware that the stalling economic climate within the bank’s operating footprint had exacerbated the moral hazard for mortgage fraud, so he had been following several high-profile prosecutions of straw-purchase schemes at other financial institutions throughout the region.
Despite his employer’s sterling reputation based on lending history, business practices and community involvement, Baker worried that the organization’s reluctance to break from the traditional risk management model created a blind spot to internal threats, weak controls and susceptibility to fraud schemes.

CORRELATION ANALYSIS: COMPARING TWO VARIABLES

MarApr-figure-1
Figure 1: The CORREL function
After buying some time from the board members by voicing his suspicions, Baker tested his theory that the easing of DTI requirements was unrelated to the spike in nonconforming loans. He calculated the correlation coefficient of nonperforming loans to DTI requirements. Baker obtained aggregated default rate data of loans made at various DTI requirements from internal management reports and performed his calculations in Microsoft Excel using the CORREL function. (CORREL is based on the mathematical formula, Figure 1, above.)

In this case, he compared two variables — DTI requirements and default rate. If the assumption is that default rate is correlated to DTI, then DTI is the independent variable and default rate is the dependent variable. A correlation coefficient value has a range between -1 and 1; values closer to -1 or 1 indicate a negative or positive correlation, respectively. In a negative correlation, the default rate would decrease as DTI requirements increase. In this particular case, the correlation coefficient was very close to 0, which indicates a loose correlation between required DTI and default rate. Baker was correct on his first assumption — the new growth initiatives weren't a significant driver in loan nonperformance.

IDENTIFYING WHAT SHOULD OCCUR: PROBABILITY DISTRIBUTIONS
In statistical terms, a probability distribution refers to a graph, table or formula, which illustrates the probability for each value of a random variable, such as household income, IQ or set of test scores. The normal probability distribution is perhaps the most widely known and is commonly referred to as a “bell curve” based on its appearance; the mean of the distribution is represented by the top of the bell curve, as it represents the expected value. The bell curve is symmetric, and a key related concept is variation from the mean, measured by the standard deviation. In a normal distribution nearly all of the possible values — 95 percent, in fact — fall within two standard deviations of the mean. The greater the standard deviation, the more possible values could occur naturally. 

Baker determined the probability distribution of original loan values of the bank’s outstanding mortgage loans based on the mean and standard deviation values obtained from aggregated internal data. The distribution he identified is represented in Figure 2 below.

MarApr-figure-2
Figure 2: Probability distribution - original loan amounts

This chart indicates that the highest probability of original mortgage loan amounts that the bank originated will be close to the mean value of $213,157. Because 95 percent of all original loans issued by the bank will be within two standard deviations of the mean in either direction, almost all of the mortgage loans made by the bank have an original loan balance between $76,767 and $349,547. If nonperformance of loans occurred randomly and wasn’t tied to any particular characteristic of underwriting, Baker would expect to observe a similar distribution for loans currently in default. However, a statistical analysis of data on the nonperforming loans that the bank originated reveals significantly different characteristics, as presented in Figure 3 below.
  MarApr-figure-3
Figure 3: Probability distribution - original loan amounts (nonperforming) 

The population presented in this distribution is those loans within the bank’s portfolio that are in default. The statistical mean original loan amount on these nonperforming loans is $95,132 — significantly lower than the mean original loan amount of the bank’s entire mortgage portfolio. The observed standard deviation of this population is $26,538; based on a normal probability distribution, Baker concluded that virtually all nonperforming loans were originated at amounts between $42,056 and $148,208. There’s statistical significance in this disparity: loans originated by the bank that are in default exhibit a much lower mean original loan amount and degree of variability than the entire mortgage loan portfolio.
Baker was aware that there were many possible fraud and non-fraud scenarios that would explain this disparity. Borrowers with original mortgage loan amounts between $42,056 and $148,208 may present a greater credit risk based on volatile employment situations or adverse credit histories. Mortgage loans in this range typically require a small down payment, which increases the borrowers’ incentive to “walk away” when situations become dire. Baker noted that the dispersion of the nonperforming loans is also very narrow: the coefficient of variation — or the ratio of standard deviation to the mean — is less than 1/3 ($26,538/$95,132 = .27). In other words, there’s a very narrow range of original loan amount in which a majority of his employer’s loans default above and below the statistical mean.
Baker focused on one particular characteristic based on his knowledge of his organization’s internal control structure: The mean value of the original loan amount on nonperforming loans is slightly below $100,000, and the bank requires secondary approval on those mortgages with original loan amounts above that threshold. This secondary approval serves as a check against unauthorized (and potentially fraudulent) loan origination. Baker’s analysis of the probability distribution of nonperforming mortgage loan data indicated the secondary approval control might have been circumvented in a mortgage-fraud scenario.
Baker’s probability distribution of original loan amounts represents a target-rich environment of data for fraud analytics. Those nonperforming loans with original values between two standard deviations below the mean ($42,056) and the $100,000 threshold is depicted graphically in Figure 4 below.
MarApr-figure-4
Figure 4: Probability distribution - nonperforming loans w/o secondary approval

USING SAMPLES TO CHARACTERIZE THE POPULATION

Just as circumvention of controls is only one possible explanation for the observed distribution of original loan amounts on the bank’s bad mortgages, there are numerous statistical characteristics of nonperforming loans that Baker could have analyzed for indications of mortgage fraud including:
  • Number of payments made prior to default.
  • Ratio of appraised value to fair market value at time of origination.
  • Refinance loan requests for excessive amounts.
  • Mortgage applications in the names of holding companies.
  • Multiple mortgage loans requested in the name of the same buyer (straw-purchaser scheme).
To this point in his analysis Baker had worked with aggregated financial information about his employer’s mortgage portfolio, including metrics involving underwriting and loan performance. The bank’s control environment includes information systems, which capture numerous data points for management reporting purposes. Baker needed to examine the loan files for analysis of fraud indicators. Because it would have been impractical to review every loan file within the target-rich range that Baker had identified in his analysis, he utilized sampling methods to test for fraud. Knowing that mortgage fraud often involved falsified appraisal information, Baker decided to analyze two key attributes: the loan officer and appraisals associated with each loan.
The concept behind sampling is intuitive. Baker’s intent was to arrive at some conclusions about the entire population of mortgage files within his target-rich environment based on observations of a sample that he selected. Sampling has a wide array of applications and is so effective that a sample of only 1,000 people allows political pollsters to arrive at reasonable conclusions about how 300 million Americans will vote.
The size of a sample is as positively correlated to precision as it is to diminishing returns; an efficient sample should be large enough to be meaningful yet not so much as to be inefficient. Baker selected 200 files in the target range and identified the distribution of approving loan officers on nonperforming loans with original loan amounts between $42,056 and $100,000 presented in Figure 5 below.
MarApr-figure-5
Figure 5: Distribution table, approving loan officer on nonperforming loans with original loan values between $42,056 and $100,000


Baker needed to determine how precise his sample was before making further conclusions; by selecting only a sample size of 200 files, how confident could Baker be in his data? To determine how closely a sample matches the total population, statisticians calculate the margin of error using this formula:
MarApr-formula
in which z* (pronounced z-star) is the confidence interval critical value, p is the observed proportion within the sample, and n is the number of files sampled. The confidence interval critical value measures the number of standard errors (or deviations) to be added or subtracted to achieve the desired confidence level in sample results. Most statisticians rely on a confidence level of 95 percent in calculating margin of error, and z* for that level is 1.96. The margin of error for the selected sample size is then applied to the identified distribution as a plus/minus range.
In this case, Baker used the formula to determine, with 95 percent confidence, that based on the selected sample, Mike Matthews was the approving loan officer in 35 percent to 49 percent of all mortgages in the target range as presented in Figure 6 below.
MarApr-figure-6
Figure 6: Confidence intervals for the selected sample of nonperforming mortgages 

The number of files in the sample is in the denominator of the margin of error formula; consequently an increase in sample size reduces the margin of error. For example, if Baker had sampled 400 files and found that the percentage of mortgages that Matthews had approved remained the same, the margin of error would have been 5 percent, and the confidence interval would have been between 37 percent to 47 percent. If Baker had wanted to increase his confidence level, he would’ve selected a higher z* value; to obtain a 99 percent confidence level he would’ve used a confidence interval critical value of 2.58 because 99 percent of all sample means fall within 2.58 standard errors (or deviations) of a normal distribution population mean.
At this point in his analysis, Baker took inventory of the facts: nonperforming loans exhibited characteristics (original loan amount) atypical of the bank’s entire loan portfolio. Internal controls required secondary approval on all mortgage loans above a $100,000 threshold, and a majority of the nonperforming loans fell below this threshold. Loan officer Matthews had approved more of the nonperforming loans than any other employee in the bank’s mortgage division, but this doesn’t necessarily prove fraud. Matthews was a highly successful mortgage loan originator for the bank; Baker knew he needed to compare his data with divisional sales figures to identify correlations.

MarApr-jailed-fraudster
DIGGING DEEPER: USING THE SAMPLE CHARACTERIZATIONS TO MAKE CONCLUSIONS
Sales performance reports indicated that Matthews had originated approximately 35 percent of the bank’s total mortgage loan portfolio in the past year, while loan officer Larry Jenkins had struggled with his sales numbers over the same period. Jenkins originated approximately 3 percent of the bank’s mortgage portfolio but between 10 percent and 20 percent of the nonperforming loans below the secondary approval control threshold. In addition to examining why Jenkins was significantly outperforming his overall average in this particular subset of loans, Baker analyzed characteristics of the appraisal reports contained in the loan files.
One of the most common mortgage fraud schemes is illegal property flipping, in which a falsified appraisal increases the sale price — and loan amount — for a straw buyer. In some cases, mortgage loan officers, appraisers and other parties act in collusion to execute the scheme in exchange for kickbacks. Baker analyzed the appraisers on nonperforming loan files originated by Jenkins and compared the appraised value of the properties to historic sales prices and property tax assessments. Within the 30 files on nonperforming loans that Jenkins had approved in the target range, Baker noted the following key statistics:
  • Southern Appraisal Services performed the appraisals on 24 of the 30 loans in the sample. Records on file with the state division of corporations revealed that Southern Appraisal Services is an LLC owned by a holding company named Precision Real Estate Professionals.
  • Public records indicated that the registered agent for Precision Real Estate Professionals shared an address with a relative of Jenkins.
  • Appraisal values on the 30 properties exceeded the most recent property tax appraisals by a factor of 10. Baker wasn’t surprised to learn that the appraisal values in the loan files also significantly exceeded comparable sales data obtained from the Multiple Listing Service database.
These statistical methods provided Baker with the proper predication to expand his examination. He determined that the increase in defaults wasn’t a result of aggressive marketing and that he had sufficient evidence to suspect a mortgage fraud scheme. After he allayed the board of directors’ concerns, he initiated a full-scale investigation of Jenkins’ activities. Armed with his analysis and some additional background investigation, Baker was able to obtain a quick admission of guilt from Jenkins, who was promptly terminated and handed over to the authorities.

FRAUD EXAMINERS' SABERMETRICS
Baseball fans like to discuss the merits of the “Sabermetrics” approach to analyzing the game and characterizing the skill of players and teams through advanced statistical analysis. (The term is derived from the acronym SABR — the Society for American Baseball Research.) The same statistical methods allow fraud examiners to make characterizations about large volumes of data even when only pieces of it are available. Statistical analysis isn’t absolute proof of fraud and certainly is no substitute for a thorough, sound fraud examination. Initially, Jenkins might have had valid reasons for originating a significant percentage of nonperforming loans. However, only through Baker’s complete investigation was Jenkins’ fraud later confirmed. The value of his analysis was the development of a strong foundation for his fraud examination and detection efforts. 

John Giardino, CFE, CPA, is a senior fraud analyst working in the telecommunications industry.

Read more insight and discuss this article in the ACFE's LinkedIn group

No comments:

Post a Comment