By Richard Satran, U.S. News & World Report
The Internal Revenue Service is
collecting a lot more than taxes this year–it’s also acquiring a huge
volume of personal information on taxpayers’ digital activities, from
eBay auctions to Facebook posts and, for the first time ever, credit
card and e-payment transaction records, as it expands its search for
tax cheats to places it’s never gone before.
The IRS, under heavy pressure to help Washington out of its
budget quagmire by chasing down an estimated $300 billion in revenue
lost to evasions and errors each year, will start using “robo-audits”
of tax forms and third-party data the IRS hopes will help close this
so-called “tax gap.” But the agency reveals little about how it will
employ its vast, new network scanning powers.
Tax lawyers and watchdogs are concerned about the sweeping
changes being implemented with little public discussion or clear
guidelines, and Congressional staff sources say the IRS use of “big
data” will be a key issue when the next IRS chief comes to the Senate
for approval. Acting commissioner Steven T. Miller replaced Douglas
Shulman last November.
[Read: Are You Taking the Right Tax Deductions?]
“It’s well-known in the tax community, but not many people
outside of it are aware of this big expansion of data and computer
use,” says Edward Zelinsky, a tax law expert and professor at Benjamin
N. Cardozo School of Law and Yale Law School. “I am sure people will be
concerned about the use of personal information on databases in
government, and those concerns are well-taken. It’s appropriate to
watch it carefully. There should be safeguards.” He adds that taxpayers
should know that whatever people do and say electronically can and
will be used against them in IRS enforcement.
IRS’s big data tracking. Consumers are already
familiar with Internet “cookies” that track their movements and send
them targeted ads that follow them to different websites. The IRS has
brought in private industry experts to employ similar digital
tracking–but with the added advantage of access to Social Security
numbers, health records, credit card transactions and many other
privileged forms of information that marketers don’t see.
“Private industry would be envious if they knew what our models
are,” boasted Dean Silverman, the agency’s high-tech top gun who heads a
group recruited from the private sector to update the IRS, in a
comment reported in trade publications. The IRS did not respond to a
request for an interview.
In trade presentations and public documents, the agency has said
it will use a massively parallel computer system that can analyze data
from different networks to find irregularities and suspicious
Much of the work already has been automated to process and
analyze electronic tax returns in current “robo-audits” that flag
unusual behavior patterns. With IRS audit staff reduced by budget cuts
this year, the agency will be forced to rely on computer-generated
audits more than ever.
The agency declined to comment on how it will use its new
technology. But agency officials have been outlining plans at industry
conferences, working with IBM, EMC and other private-sector
specialists. In presentations, officials have said they may use the big
— Charting and analyzing social media such as Facebook
— Targeting audits by matching tax filings to social media or electronic payments
— Tracking individual Internet addresses and emailing patterns
— Sorting data in 32,000 categories of metadata and 1 million unique “attributes”
— Machine learning across “neural” networks
— Statistical and agent-based modeling
— Relationship analysis based on Social Security numbers and other personal identifiers
Officials have said much of the data will be used only for
research. The agency’s economic forecasts and data are a key part of
Washington’s budget infrastructure. Former commissioner Douglas Shulman
said in an IRS statement that the technology will employ “billions of
pieces of data” to target enforcement and to “detect and combat
[Read: The Big Tax Shelter Many Financial Planners Overlook.]
U.S. Tax Court records show that information gathered from
Facebook and eBay postings have been used by the IRS in defending tax
challenges. Under a Freedom of Information Act disclosure obtained by
privacy advocates at the Electronic Frontier Foundation, the group
published the IRS’s 38-page manual used to train auditors to search
Internet addresses, Facebook postings and other social media to back
In practice, the third-party data has been used only if the
irregular returns merit more attention. In one much-cited example, IRS
officials talk about prisoners who were filing false claims for energy
tax credits for window replacements.
The agency, wary of public opinion about invasive audit
practices, has pulled back from using so-called “social audits,” which,
for example, might single out horse-racing enthusiasts or sailboaters
for special attention. But by screening existing data for one million
unique attributes, the agency can quietly create a DNA-like code to
understand the economic behavior of any individual.
The IRS last year used a profiling test model to study 1,500 tax
preparers with histories of reporting deficiencies and managed to
recover $200 million. It cited the experience as proof that its data
analysis works. Early this year, however, a new set of rules it
developed for tax preparers was thrown out by a federal court who said
the agency had overstepped its mandate. The IRS would not comment on
whether the rules were based on its new screening tools.
Lots of computing power, for what? The agency’s
computers can now load all U.S. tax returns in just 10 hours, compared
with the four months it took just eight years ago, Jeff Butler, IRS
director of research databases told the IBM TechAmerica conference last
November. That leaves a lot of time for other uses. The IRS says it
expects 80 percent of its tax returns to be filed electronically this
year. That makes a total of 250 million returns filed, with $2 trillion
But processing those returns uses only a fraction of the
agency’s computing power. An entire year of tax returns amounts to 15
terabytes, or just 1.5 percent of the IRS storage of 1.2 petabytes (one
quadrillion bits of information), based on public data from IRS
presentations. The agency has expanded its data capacity by 1,000
percent in the past six years.
It also recently assembled $350 million in high-tech tools to do
a lot of auditing, tracking and analyzing what people do on the
Internet. The agency has used social media and other third-party
sources in the past, but it has now increased its capability to so from
its own growing database of networks.
Congressional staffers on the House Ways and Means Committee and
the Joint Committee on Taxation, both of which oversee the IRS, say
they have been occupied by more pressing issues related to the budget
crisis, and Congress gave the tax officials leeway to use technology to
solve the growing problem of identity theft. But they said they will
look at the possibility of errors in robo-audits as well as the storage
of data on millions of taxpayers.
The IRS is guarded about how its audits are triggered, tax
experts say, because too much information on what they do might help
tax cheats. Major accounting firms have been given little information
on the changes and were reluctant to comment, although some said
privately that they are aware of the new IRS tools but it is too early
to tell how they will be used. Taxpayer advocacy groups also say they
are waiting to see how the IRS manages its technology upgrade, and are
holding out hope that it will make taxes more fair and efficient and
force tax evaders to pay their share of the overall burden.
[See Tax Tips: The Good, Bad and Ugly (But Legal)]
While many applaud the effort to update government technology
with private-sector tools, they say the agency needs to conform to
“I don’t really see strong legal regulation in place to manage
something of this magnitude,” says Paul Schwartz, University of
California law professor and co-director of the Berkeley Center for Law
& Technology. The IRS is working with the same kind of oversight
and rules that were developed in the paper tax-return era, says
Schwartz. But with the technology it now has, the agency can “see into
people’s lives” as never before.
Tax returns are like narratives of how people spent their money,
and tax audits have been guided by “reasonable” interpretations of
allowable credits and deductions by the IRS agents who manage audits.
“Social media can make people testify against themselves,” Schwartz
says. “They provide a counter-narrative.” He cites as an example a
businessperson going to Florida for five meetings over a week who also
visits family in Miami. A casual Google+ posting to friends online
about “visiting my mother in Florida” could paint a different picture
than the deduction taken on the tax form.
“It will be interesting to
see what the IRS does with all of their new tools. They will have to be
very careful,” says Schwartz. So, too, will taxpayers.