64 stories
·
0 followers

ICE Tells North Side Alderperson To Back Off As She Warns Neighbors About Immigration Agents

1 Share

This is part of our series of daily recaps of ICE activity in the Chicago region. Have a tip we should check out? Email newsroom@blockclubchi.org.

ALBANY PARK — Ald. Rossana Rodriguez-Sanchez (33rd) is speaking out after federal immigration authorities led her into an alley and warned her that she was “impeding” them in Albany Park Tuesday morning, she told Block Club. 

Rodriguez-Sanchez was in a car following agents driving in a black SUV near the intersection of Avers and Leland avenues around 10:35 a.m. Tuesday. The alderwoman was in her car with her Chief Of Staff Veronica Tirado-Mercado honking horns and blowing whistles to alert neighbors to the presence of federal agents, she said.

“They went into an alley because they wanted to get us secluded,” Rodriguez-Sanchez said. “And I followed them in, and that’s when they came out of the car and gave me a warning. Afterwards, we kept following them. But we lost them at some point going south on Pulaski.”

Rodriguez-Sanchez, whose ward includes parts of Albany Park, Irving Park, Avondale and Ravenswood Manor, said three federal agents wearing camouflage uniforms and masks approached her. The incident was also recorded.

One of the federal agents tells Rodriguez-Sanchez that “911 is going to be notified,” according to the video.

“I’m giving you your warning, okay?,” the agent said, according to the video.

When Rodriguez-Sanchez asks what the warning is for, the agent said “impeding in our operation, ma’am.”

The federal agent doesn’t give a name, but leans slightly forward to briefly show a patch on the left shoulder of his uniform that appears to identify him as “E2-20.”

Rodriguez-Sanchez and Tirado-Mercado continue to ask the federal agent his name, according to the video.

Instead of sharing his name, the agent reiterates that he is giving Rodriguez-Sanchez a “warning” before he and the two other agents walk back to their SUV, according to the video. 

It was the first of two incidents Tuesday where elected officials were approached by federal agents while trailing them to alert neighbors.

Around 11:30 a.m. Tuesday, state Rep. Hoan Huynh said he and a staff member were following an ICE car near Montrose and Kimball avenues when agents blocked his car and one approached the car with their gun drawn. The agent pressed the gun against the car window tried to smash the window, Huynh said in a statement.

“If they can pull a gun on an elected official and try to bash in my window, there’s no end to the terror they will continue reigning on our communities,” Huynh said in a statement sent by his Congressional campaign. “We must fight back against this fascist regime that has no place in America.”

A Homeland Security spokesperson did not immediately answer questions about the interactions.

A Chevy Tahoe with a California license plate is driven by federal agents on Whipple Street in Albany Park after detaining a person on Oct. 21, 2025. Credit: Provided

Earlier, when Rodriguez-Sanchez began following the federal agents in the SUV, she saw them reach for their masks before they pulled into the alley, she said. 

“It was kind of nerve-racking. Because these are heavily armed men, who are in masks and in military uniform. I am there alone with my chief of staff. We were not armed. We don’t have any way to defend ourselves from them,” Rodriguez-Sanchez said. “But I think that this is a moment where we’re just gonna have to do what we have to do to keep people safe.”

Tuesday afternoon, the alderwoman was also notified of another incident of two Polish contractors being detained by federal agents near the intersection of Whipple Street and Montrose Avenue, she said.

“Fear cannot paralyze us, right? Yes, we’re going to be scared. There are going to be moments when we’re scared. But we cannot allow that to freeze us, or lead us to despair. Or to lead us to inaction. This is the moment when we need all hands on deck,” Rodriguez-Sanchez said.

Ald. Rossana Rodriguez Sanchez (33rd) takes a call while watching for federal agents outside of Hibbard Elementary School in Albany Park on Oct. 21, 2025.

Agents Active On North Side Tuesday

Rodriguez-Sanchez was slated to be at City Hall Tuesday, but as soon as she heard reports of federal immigration agents in Albany Park, Edgewater and Rogers Park she decided to head to her ward to try to warn neighbors, she said. Earlier this month, federal immigration agents deployed tear gas in Albany Park to chase off neighbors who successfully stopped them from detaining a neighbor. 

“They are definitely hitting the North Side today, and particularly neighborhoods that are heavily immigrant populated,” Rodriguez-Sanchez said. 

Block Club reached out to several Albany Park organizers who were aware of agents in the area, and were out monitoring school drop-off and dismissals with whistles, know your rights information and cell phones in hand to help protect students, parents and other community members from federal agents. 

Rodriguez-Sanchez urged local businesses to post visible signs letting federal immigration agents know their spaces are private property and that they are not welcome there, she said. 

“We have those available at our office, 4747 N. Sawyer Ave. We also did a push today and try to get those out so business can have them,” Rodriguez-Sanchez said. “We want everybody to stay vigilant and to do whatever they can to protect their neighbors.”

Members of the Texas National Guard walk around at the Joliet Local Training Area in Elwood on Tuesday, Oct. 7, 2025. Credit: Talia Sprague/Block Club Chicago

National Guard Block Extended As Trump Lawyers Eye Supreme Court

Trump administration attorneys agreed Tuesday to a 30-day extension on a temporary restraining order that has so far blocked the National Guard from being deployed to Chicago.

The extension through Nov. 24 takes some wind out the sails of a Wednesday hearing where U.S. District Judge April Perry was expected to rule on whether the two-week restraining order would continue as the lawsuit brought by the state of Illinois against the Trump administration continues to play out.

Both parties spoke this week and mutually agreed to the extension on the condition the agreement would not influence appeals to higher courts, according to court records.

Last week, the Trump administration appealed the case to the U.S. Supreme Court, setting the stage for a potential watershed ruling on the reaches of presidential power.

State attorneys filed a response to the Supreme Court Monday, arguing there is “no rebellion or danger of rebellion” giving Trump grounds to federalize the National Guard to quell protesters of his immigration “blitz” in Chicago.

Mother Of ‘Face of Operation Midway Blitz’ Speaks Out

Last month, the Department of Homeland Security announced it would ramp up immigration enforcement in Chicago under “Operation Midway Blitz.” The operation would be “in honor” of Katie Abraham, a 20-year-old from suburban Glenview who was killed in a January hit-and-run crash by a Guatemalan man also charged with faking personal documents, officials announced.

Abraham’s father and step-mom appeared in an August video produced by the Department of Homeland Security, blaming Democrats for sanctuary policies and people illegally entering the country.

But in an op-ed published by the Tribune on Tuesday, Abraham’s mother, Denise Lorence, wrote her daughter “would not have wanted” the federal operation as it has played out.

“Katie would not want to be associated with an operation in which kids witness their parents being taken into custody on their way to or from school. She wouldn’t support scaring kids with the use of military efforts in their neighborhoods or in their apartment buildings,” Lorence wrote.

She added her daughter was not outwardly political, “did not choose to be thrust into this political spotlight” or be used as a “political pawn.”

“A complex factor is that Katie’s father and his wife agreed to use Katie’s name in support of Operation Midway Blitz,” Lorence wrote. “I want to acknowledge the depths of her dad’s grief. I will never fault or question someone in the way they grieve.”

Scenes from a standoff at 105th and Avenue N on Chicago’s southeast side, between ICE and other federal agents and furious residents after a vehicle chase resulting in two cars blocking the intersect and residents gathered Credit: Matthew Kaplan/Block Club Chicago

Fundraiser For Woman Wrongfully Detained By Feds Raises Over $10,000

A Chicago family is raising funds for legal and medical needs after ICE agents hit Dayanne Figueroa’s car in the River West area earlier this month and aggressively pulled her from her car, detaining her without explanation, the family wrote on a GoFundMe.

The Oct. 10 incident, caught on video by several witnesses, has left Figueroa  — a U.S. citizen — without a car and suffering from mental and physical trauma, her family said.

She was recovering from kidney surgery and had to make another hospital visit because of the excessive force used during her detention, her family said.

Figueroa is “unable to work for the time being, and is burdened with growing medical expenses,” her family wrote. “The good news is that she is home now and able to begin recovering with the support of family.”

The GoFund has raised over $10,000 of a $50,000 goal by Tuesday afternoon.

Happening In Chicago

  • ICE agents detained at least one person near Balmoral Avenue and Clark Street and at least one other near Kenmore and Glenlake avenues in Edgewater Tuesday morning, Ald. Leni Manaa-Hoppenworth (48th) said in a statement. “ICE agents have been active throughout the ward today,” she said.
  •  Around 11 a.m. Tuesday, ICE detained at least one person near Lincoln and Warner avenues, Ald. Matt Martin (47th) said in a statement. “There have been multiple other sightings of ICE agents driving Jeep Wagoneers and other SUVs in the area this morning,” Martin said. “The vast majority of these sightings have occurred north of Irving Park [Road].”
  • There were multiple ICE sightings around Albany Park, with reports of federal agents attempting to arrest landscapers near Montrose and California avenues around 9:15 a.m. Tuesday, according to Northwest Side Rapid Response team.
  • Federal agents in six cars were seen near Laurence Ave and Sheridan Road in Uptown around 11 a.m. Tuesday, according to local nonprofit Asian Americans Advancing Justice.
  • “The Northside of Chicago is getting hit hard today, Albany Park and Uptown,” advocates with Organized Communities Against Deportations wrote on social media Tuesday. “Stay alert, use whistles if you have any. Alert your neighbors to stay inside if possible!”

-Mack Liederman and Ariel Parrella-Aureli contributed.


Support Local News!

Subscribe to Block Club Chicago, an independent, 501(c)(3), journalist-run newsroom. Every dime we make funds reporting from Chicago’s neighborhoods. Already subscribe? Click here to gift a subscription, or you can support Block Club with a tax-deductible donation.

Listen to the Block Club Chicago podcast:

Read the whole story
williampietri
4 days ago
reply
Share this story
Delete

AI Isn't Always Intelligent, Like These 24 AI Fails

1 Share

Artificial intelligence and the use of AI for many things is a conversation occurring all over the world. People either love or hate AI, but everyone can agree that there are some things AI is not ready for. To some degree, people would expect AI to be capable of answering simple, straightforward questions quickly. However, the program is learning, so it is very common for AI programs to get extremely straightforward questions and inquiries completely wrong, which is hilarious.

From simple math to well-known answers to questions about TV shows, AI has been caught making some pretty big oversights in many regions and subjects. Who knows if AI will ever be totally reliable, but for now, people need to use caution. Everyone can find the humor in the mistakes AI makes, especially because of how smart some folks claim it to be.

Here, we have collected 24 times someone asked an AI program a question only to have it pop out something so shocking, incorrect, or just plain strange in return. Enjoy these "smart" computers making dumb mistakes.

Killing it

Is Google evil?

This seems… wrong

Oh… oh no…

No one needed this

Understood

Totally safe

Thanks, Google

Very experimental

I don't know what to say

This is so funny

Wait a second…

It sure is

Cool, AI

Good to know

Ohhh…

Hmmm…

AI is the future

Screenshot in 2025

Where is mama?

Ya don't say…

Wonderful

You don't say

Tasty

Read the whole story
williampietri
121 days ago
reply
Share this story
Delete

Inside the AI Prompts DOGE Used to “Munch” Contracts Related to Veterans’ Health

1 Share

ProPublica is a nonprofit newsroom that investigates abuses of power. Sign up to receive our biggest stories as soon as they’re published.

When an AI script written by a Department of Government Efficiency employee came across a contract for internet service, it flagged it as cancelable. Not because it was waste, fraud or abuse — the Department of Veterans Affairs needs internet connectivity after all — but because the model was given unclear and conflicting instructions.

Sahil Lavingia, who wrote the code, told it to cancel, or in his words “munch,” anything that wasn’t “directly supporting patient care.” Unfortunately, neither Lavingia nor the model had the knowledge required to make such determinations.

Sahil Lavingia at his office in Brooklyn (Ben Sklar for ProPublica)

“I think that mistakes were made,” said Lavingia, who worked at DOGE for nearly two months, in an interview with ProPublica. “I’m sure mistakes were made. Mistakes are always made.”

It turns out, a lot of mistakes were made as DOGE and the VA rushed to implement President Donald Trump’s February executive order mandating all of the VA’s contracts be reviewed within 30 days.

ProPublica obtained the code and prompts — the instructions given to the AI model — used to review the contracts and interviewed Lavingia and experts in both AI and government procurement. We are publishing an analysis of those prompts to help the public understand how this technology is being deployed in the federal government.

The experts found numerous and troubling flaws: the code relied on older, general-purpose models not suited for the task; the model hallucinated contract amounts, deciding around 1,100 of the agreements were each worth $34 million when they were sometimes worth thousands; and the AI did not analyze the entire text of contracts. Most experts said that, in addition to the technical issues, using off-the-shelf AI models for the task — with little context on how the VA works — should have been a nonstarter.

Lavingia, a software engineer enlisted by DOGE, acknowledged there were flaws in what he created and blamed, in part, a lack of time and proper tools. He also stressed that he knew his list of what he called “MUNCHABLE” contracts would be vetted by others before a final decision was made.

Portions of the prompt are pasted below along with commentary from experts we interviewed. Lavingia published a complete version of it on his personal GitHub account.

Problems with how the model was constructed can be detected from the very opening lines of code, where the DOGE employee instructs the model how to behave:

You are an AI assistant that analyzes government contracts. Always provide comprehensive few-sentence descriptions that explain WHO the contract is with, WHAT specific services/products are provided, and WHO benefits from these services. Remember that contracts for EMR systems and healthcare IT infrastructure directly supporting patient care should be classified as NOT munchable. Contracts related to diversity, equity, and inclusion (DEI) initiatives or services that could be easily handled by in-house W2 employees should be classified as MUNCHABLE. Consider 'soft services' like healthcare technology management, data management, administrative consulting, portfolio management, case management, and product catalog management as MUNCHABLE. For contract modifications, mark the munchable status as 'N/A'. For IDIQ contracts, be more aggressive about termination unless they are for core medical services or benefits processing.

This part of the prompt, known as a system prompt, is intended to shape the overall behavior of the large language model, or LLM, the technology behind AI bots like ChatGPT. In this case, it was used before both steps of the process: first, before Lavingia used it to obtain information like contract amounts; then, before determining if a contract should be canceled.

Including information not related to the task at hand can confuse AI. At this point, it’s only being asked to gather information from the text of the contract. Everything related to “munchable status,” “soft-services” or “DEI” is irrelevant. Experts told ProPublica that trying to fix issues by adding more instructions can actually have the opposite effect — especially if they’re irrelevant.

Analyze the following contract text and extract the basic information below. If you can't find specific information, write "Not found".

CONTRACT TEXT: {text[:10000]} # Using first 10000 chars to stay within token limits

The models were only shown the first 10,000 characters from each document, or approximately 2,500 words. Experts were confused by this, noting that OpenAI models support inputs over 50 times that size. Lavingia said that he had to use an older AI model that the VA had already signed a contract for.

Please extract the following information: 1. Contract Number/PIID 2. Parent Contract Number (if this is a child contract) 3. Contract Description - IMPORTANT: Provide a DETAILED 1-2 sentence description that clearly explains what the contract is for. Include WHO the vendor is, WHAT specific products or services they provide, and WHO the end recipients or beneficiaries are. For example, instead of "Custom powered wheelchair", write "Contract with XYZ Medical Equipment Provider to supply custom-powered wheelchairs and related maintenance services to veteran patients at VA medical centers." 4. Vendor Name 5. Total Contract Value (in USD) 6. FY 25 Value (in USD) 7. Remaining Obligations (in USD) 8. Contracting Officer Name 9. Is this an IDIQ contract? (true/false) 10. Is this a modification? (true/false)

This portion of the prompt instructs the AI to extract the contract number and other key details of a contract, such as the “total contract value.”

This was error-prone and not necessary, as accurate contract information can already be found in publicly available databases like USASpending. In some cases, this led to the AI system being given an outdated version of a contract, which led to it reporting a misleadingly large contract amount. In other cases, the model mistakenly pulled an irrelevant number from the page instead of the contract value.

“They are looking for information where it’s easy to get, rather than where it’s correct,” said Waldo Jaquith, a former Obama appointee who oversaw IT contracting at the Treasury Department. “This is the lazy approach to gathering the information that they want. It’s faster, but it’s less accurate.”

Lavingia acknowledged that this approach led to errors but said that those errors were later corrected by VA staff.

Once the program extracted this information, it ran a second pass to determine if the contract was “munchable.”

Based on the following contract information, determine if this contract is "munchable" based on these criteria:

CONTRACT INFORMATION: {text[:10000]} # Using first 10000 chars to stay within token limits

Again, only the first 10,000 characters were shown to the model. As a result, the munchable determination was based purely on the first few pages of the contract document.

Then, evaluate if this contract is "munchable" based on these criteria: - If this is a contract modification, mark it as "N/A" for munchable status - If this is an IDIQ contract:   * For medical devices/equipment: NOT MUNCHABLE   * For recruiting/staffing: MUNCHABLE   * For other services: Consider termination if not core medical/benefits - Level 0: Direct patient care (e.g., bedside nurse) - NOT MUNCHABLE - Level 1: Necessary consultants that can't be insourced - NOT MUNCHABLE

The above prompt section is the first set of instructions telling the AI how to flag contracts. The prompt provides little explanation of what it’s looking for, failing to define what qualifies as “core medical/benefits” and lacking information about what a “necessary consultant” is.

For the types of models the DOGE analysis used, including all the necessary information to make an accurate determination is critical.

Cary Coglianese, a University of Pennsylvania professor who studies the governmental use of artificial intelligence, said that knowing which jobs could be done in-house “calls for a very sophisticated understanding of medical care, of institutional management, of availability of human resources” that the model does not have.

- Contracts related to "diversity, equity, and inclusion" (DEI) initiatives - MUNCHABLE

The prompt above tries to implement a fundamental policy of the Trump administration: killing all DEI programs. But the prompt fails to include a definition of what DEI is, leaving the model to decide.

Despite the instruction to cancel DEI-related contracts, very few were flagged for this reason. Procurement experts noted that it’s very unlikely for information like this to be found in the first few pages of a contract.

- Level 2+: Multiple layers removed from veterans care - MUNCHABLE - Services that could easily be replaced by in-house W2 employees - MUNCHABLE

These two lines — which experts say were poorly defined — carried the most weight in the DOGE analysis. The response from the AI frequently cited these reasons as the justification for munchability. Nearly every justification included a form of the phrase “direct patient care,” and in a third of cases the model flagged contracts because it stated the services could be handled in-house.

The poorly defined requirements led to several contracts for VA office internet services being flagged for cancellation. In one justification, the model had this to say:

The contract provides data services for internet connectivity, which is an IT infrastructure service that is multiple layers removed from direct clinical patient care and could likely be performed in-house, making it classified as munchable.

IMPORTANT EXCEPTIONS - These are NOT MUNCHABLE: - Third-party financial audits and compliance reviews - Medical equipment audits and certifications (e.g., MRI, CT scan, nuclear medicine equipment) - Nuclear physics and radiation safety audits for medical equipment - Medical device safety and compliance audits - Healthcare facility accreditation reviews - Clinical trial audits and monitoring - Medical billing and coding compliance audits - Healthcare fraud and abuse investigations - Medical records privacy and security audits - Healthcare quality assurance reviews - Community Living Center (CLC) surveys and inspections - State Veterans Home surveys and inspections - Long-term care facility quality surveys - Nursing home resident safety and care quality reviews - Assisted living facility compliance surveys - Veteran housing quality and safety inspections - Residential care facility accreditation reviews

Despite these instructions, AI flagged many audit- and compliance-related contracts as “munchable,” labeling them as “soft services.”

In one case, the model even acknowledged the importance of compliance while flagging a contract for cancellation, stating: “Although essential to ensuring accurate medical records and billing, these services are an administrative support function (a ‘soft service’) rather than direct patient care.”

Key considerations: - Direct patient care involves: physical examinations, medical procedures, medication administration - Distinguish between medical/clinical and psychosocial support

Shobita Parthasarathy, professor of public policy and director of the Science, Technology, and Public Policy Program at University of Michigan, told ProPublica that this piece of the prompt was notable in that it instructs the model to “distinguish” between the two types of services without instructing the model what to save and what to kill.

The emphasis on “direct patient care” is reflected in how often the AI cited it in its recommendations, even when the model did not have any information about a contract. In one instance where it labeled every field “not found,” it still decided the contract was munchable. It gave this reason:

Without evidence that it involves essential medical procedures or direct clinical support, and assuming the contract is for administrative or related support services, it meets the criteria for being classified as munchable.

In reality, this contract was for the preventative maintenance of important safety devices known as ceiling lifts at VA medical centers, including three sites in Maryland. The contract itself stated:

Ceiling Lifts are used by employees to reposition patients during their care. They are critical safety devices for employees and patients, and must be maintained and inspected appropriately.

Specific services that should be classified as MUNCHABLE (these are "soft services" or consulting-type services): - Healthcare technology management (HTM) services - Data Commons Software as a Service (SaaS) - Administrative management and consulting services - Data management and analytics services - Product catalog or listing management - Planning and transition support services - Portfolio management services - Operational management review - Technology guides and alerts services - Case management administrative services - Case abstracts, casefinding, follow-up services - Enterprise-level portfolio management - Support for specific initiatives (like PACT Act) - Administrative updates to product information - Research data management platforms or repositories - Drug/pharmaceutical lifecycle management and pricing analysis - Backup Contracting Officer's Representatives (CORs) or administrative oversight roles - Modernization and renovation extensions not directly tied to patient care - DEI (Diversity, Equity, Inclusion) initiatives - Climate & Sustainability programs - Consulting & Research Services - Non-Performing/Non-Essential Contracts - Recruitment Services

This portion of the prompt attempts to define “soft services.” It uses many highly specific examples but also throws in vague categories without definitions like “non-performing/non-essential contracts.”

Experts said that in order for a model to properly determine this, it would need to be given information about the essential activities and what’s required to support them.

Important clarifications based on past analysis errors: 2. Lifecycle management of drugs/pharmaceuticals IS MUNCHABLE (different from direct supply) 3. Backup administrative roles (like alternate CORs) ARE MUNCHABLE as they create duplicative work 4. Contract extensions for renovations/modernization ARE MUNCHABLE unless directly tied to patient care

This section of the prompt was the result of analysis by Lavingia and other DOGE staff, Lavingia explained. “This is probably from a session where I ran a prior version of the script that most likely a DOGE person was like, ‘It’s not being aggressive enough.’ I don’t know why it starts with a 2. I guess I disagreed with one of them, and so we only put 2, 3 and 4 here.”

Notably, our review found that the only clarifications related to past errors were related to scenarios where the model wasn’t flagging enough contracts for cancellation.

Direct patient care that is NOT MUNCHABLE includes: - Conducting physical examinations - Administering medications and treatments - Performing medical procedures and interventions - Monitoring and assessing patient responses - Supply of actual medical products (pharmaceuticals, medical equipment) - Maintenance of critical medical equipment - Custom medical devices (wheelchairs, prosthetics) - Essential therapeutic services with proven efficacy

For maintenance contracts, consider whether pricing appears reasonable. If maintenance costs seem excessive, flag them as potentially over-priced despite being necessary.

This section of the prompt provides the most detail about what constitutes “direct patient care.” While it does cover many aspects of care, it still leaves a lot of ambiguity and forces the model to make its own judgements about what constitutes “proven efficacy” and “critical” medical equipment.

In addition to the limited information given on what constitutes direct patient care, there is no information about how to determine if a price is “reasonable,” especially since the LLM only sees the first few pages of the document. The models lack knowledge about what’s normal for government contracts.

“I just do not understand how it would be possible. This is hard for a human to figure out,” Jaquith said about whether AI could accurately determine if a contract was reasonably priced. “I don’t see any way that an LLM could know this without a lot of really specialized training.”

Services that can be easily insourced (MUNCHABLE): - Video production and multimedia services - Customer support/call centers - PowerPoint/presentation creation - Recruiting and outreach services - Public affairs and communications - Administrative support - Basic IT support (non-specialized) - Content creation and writing - Training services (non-specialized) - Event planning and coordination

This section explicitly lists which tasks could be “easily insourced” by VA staff, and more than 500 different contracts were flagged as “munchable” for this reason.

“A larger issue with all of this is there seems to be an assumption here that contracts are almost inherently wasteful,” Coglianese said when shown this section of the prompt. “Other services, like the kinds that are here, are cheaper to contract for. In fact, these are exactly the sorts of things that we would not want to treat as ‘munchable.’” He went on to explain that insourcing some of these tasks could also “siphon human sources away from direct primary patient care.”

In an interview, Lavingia acknowledged some of these jobs might be better handled externally. “We don’t want to cut the ones that would make the VA less efficient or cause us to hire a bunch of people in-house,” Lavingia explained. “Which currently they can’t do because there’s a hiring freeze.”

The VA is standing behind its use of AI to examine contracts, calling it “a commonsense precedent.” And documents obtained by ProPublica suggest the VA is looking at additional ways AI can be deployed. A March email from a top VA official to DOGE stated:

Today, VA receives over 2 million disability claims per year, and the average time for a decision is 130 days. We believe that key technical improvements (including AI and other automation), combined with Veteran-first process/culture changes pushed from our Secretary’s office could dramatically improve this. A small existing pilot in this space has resulted in 3% of recent claims being processed in less than 30 days. Our mission is to figure out how to grow from 3% to 30% and then upwards such that only the most complex claims take more than a few days.

If you have any information about the misuse or abuse of AI within government agencies, reach out to us via our Signal or SecureDrop channels.

If you’d like to talk to someone specific, Brandon Roberts is an investigative journalist on the news applications team and has a wealth of experience using and dissecting artificial intelligence. He can be reached on Signal @brandonrobertz.01 or by email brandon.roberts@propublica.org.

Read the whole story
williampietri
140 days ago
reply
Share this story
Delete

isn’t it crazy that a woman being gender nonconforming literally just requires her to exist in her…

1 Share

hot-on-my-watch:

hot-on-my-watch:

tannisroute:

tannisroute:

isn’t it crazy that a woman being gender nonconforming literally just requires her to exist in her own body without making any changes whatsoever. why does the fact that i don’t wear makeup and i don’t shave and i don’t wear a bra have to be some political act. why can’t i just fucking exist

it is Exactly this kind of thinking that inspired this post lol

Alright Judith Butler, bit early in the day to be proving so conclusively that gender is at least in part a social construct isn’t it? 😅😅😅

And the autistic in me wants to make sure you know what “conformity” is.

But yes, strong agree.

Recently I realised that while me and various other women I know would be quietly delighted to be lazered up and so never again grow hair on our armpits, genitals or legs, my husband, who has never had a beard or moustache and does not intend to, would not make the same choice for his face. He was actually very surprised at me. Then I saw a post on Reddit where some man called his girlfriend a whore for having had the same body hair lazered off. Different worlds!

And I say this as someone who has had all their natural body hair for years now- disability baby!

To clarify and apologise because I misspoke, @tannisroute makes an extremely good point about the nature of gender conformity for pubescent girls and women, certainly in the West. Men express physical gender conformity by leaving their body much as it is, whereas women can only do it by actively altering ours in never-ending processes that consume much more time, energy and expense. As OP said:


Man trims only hair on head: conformity

Women trims only hair on head: non-conformity.


Man wears no makeup: conformity

Woman wears no makeup: non-conformity


For men, gender conformity is more often a lack of action, where for us it is action itself.

Read the whole story
williampietri
148 days ago
reply
Share this story
Delete

Chasing the Electric Pangolin Open Thread

1 Share

A few months ago, I remember reading some press about a new economics preprint out of MIT. The Wall Street Journal covered the research a few days after it dropped online, with the favorable headline, “Will AI Help or Hurt Workers? One 26-Year-Old Found an Unexpected Answer.” The photo for the article shows the promising young author, Aidan Toner-Rodgers, standing next to two titans of economics research, Daron Acemoglu (2024 Nobel laureate in economics) and David Autor.

“It’s fantastic,” said Acemoglu.

“I was floored,” said Autor.

The Atlantic and Nature covered the research as well, with both publications seemingly stunned by the quality of the work. And indeed, the quality of the work was stunningly high! The article analyzes data from a randomized trial of over one thousand materials researchers at the R&D lab of a US-based firm who were given access to AI tools. Toner-Rodgers adeptly tracks the effect of access to these AI tools on:

  • The number of materials discovered by the researchers.

  • The number of patents filed on those new materials.

  • The number of new product prototypes developed based on those new materials.

  • The time-allocation of the researchers over time, split between experimentation, judgment, and ideation.

  • The sentiment towards AI of the researchers, before and after AI tool adoption.

Not only do each of these metrics show really clear effects, but Toner-Rodgers throws every tool in the book at exploring them, using a number of really sophisticated methodologies that must have taken tremendous effort and care:

  • He calculates the quality of the new materials through a really elaborate algorithm that measures the distance from the “target” properties for each material discovered.

  • He measures the structural similarity of the crystal structures of the new materials to current materials by calculating the difference in atomic positions. This is really hard to do, even for materials scientists, let alone for economists!

  • He determines the novelty of patents using bigram analysis.

  • He uses a large language model (Claude 3.5) for the automated classification of research tasks.

At the time I saw the press coverage, I didn’t bother to click on the actual preprint and read the work. The results seemed unsurprising: when researchers were given access to AI tools, they became more productive. That sounds reasonable and expected.

Toner-Rodgers submitted his paper to The Quarterly Journal of Economics, the top econ journal in the world. His website said that he had received a “revise and resubmit” already, meaning that the article was probably well on its way to being published.

Unfortunately for everyone involved, the work is entirely fraudulent. MIT put out a press release this morning stating that they had conducted an internal, confidential review and that they have “no confidence in the veracity of the research contained in the paper.” The WSJ has covered this development as well. The econ department at MIT sent out an internal email so direly-worded on the matter that on first glance, students reading the email had assumed someone had died.

In retrospect, there had been omens and portents. I wish I had read the article at the time of publication, because I suspect my BS detector would have risen to an 11 out of 10 if I’d given it a close read. It really is the perfect subject for this blog: a fraudulent preprint called “Artificial Intelligence, Scientific Discovery, and Product Innovation,” with a focus on materials science research.

Hindsight is of course 20/20, but the first red flag that should have been raised is the source of the data itself. The article gives enough details to raise some intense curiosity. It’s a US-based firm that has (at least) 1,018 researchers devoted to materials discovery alone, an enormous amount. This narrows it down to a handful of firms. Initially the companies Apple, Intel, and 3M came to mind, but then I noticed this breakdown of the materials specialization of the researchers in the study:

This was bizarre to me, as very few companies do massive amounts of materials research and which also is split fairly evenly across the spectrum of materials, in disparate domains such as biomaterials and metal alloys. I did some “deep research” to confirm this hypothesis (thank you ChatGPT and Gemini) and I believe that there are a few companies that could plausibly meet this description: 3M, Dupont, Dow, and Corning. None of these are perfect fits, either, especially with the 32% share on metals and alloys.

I’ll really be embarrassing myself if it turns out that an actual R&D lab was supplying Toner-Rodgers with data and he was just fraudulently manipulating it, but I think this is quite unlikely, and it’s more plausible that the data was entirely fabricated to begin with. I have several reasons for believing this:

  • Why would a large company like this take such pains to run a randomized trial on its own employees, tracking a number of metrics of their performance, only to anonymously give this data to a single researcher from MIT—a first year PhD student, mind you—rather than publishing the findings themselves?

  • Even at those large R&D companies, only a small fraction of researchers are devoted to the task of “materials discovery,” and it seems implausible that a company would run an experiment on AI adoption on over a thousand employees in such a structured manner.

  • The description of the tasks that these employees do and the divisions between fields, and all the other information provided seems almost too neat to be true. Real companies don’t have hundreds of R&D teams each working on similar tasks, all of a similar size, all tracking the same metrics. It reads like how an economics student at MIT imagines R&D labs to be run if their only experience with such labs are from reading the top 1% of economics papers on innovation in research.

The next red flag should have been how spotless the findings were. In every domain that was explored, there was a fairly unambiguous result. New materials? Up by 44% (p<0.000). New patents? Up by 39% (p<0.000). New prototypes? Up by 17% (p<0.001).

The quality of the new materials? Up, and statistically significant. The novelty of the new materials? Up, and statistically significant. Did researchers who were previously more talented improve more from AI tool use? Yes. Were these results reflected in researchers self-assessments of their time allocation? Unambiguously yes. The plot for that last bit is every economist’s dream, a perfect encapsulation of the principle of comparative advantage taking effect:

And look how contrived and neat this other plot looks, showing whether researchers’ self-assessment of their judgment ability correlates with their survey response on the role of different domains of knowledge in AI materials discovery. Three out of four categories show a neat increase and one out of four remains constant (which is the one that from first principles seems like it wouldn’t matter, experience using other AI-evaluation tools).

This plot also makes no sense, when you think about it. Why would researchers with better judgment be systematically more likely to give higher numbers on this survey question on average?

Q3: On a scale of 1–10, how useful are each of the following in evaluating AI-suggested candidate materials (scientific training, experience with similar materials, intuition or gut feeling, and experience with similar tools)?

And then, to cap it off, here’s how Toner-Rodgers describes a fortuitous round of layoffs at the firm, that miraculously doesn’t interfere with the data collection for the primary analysis and yet contributes an insightful example that supports his findings:

“In the final month of my sample—excluded from the primary analysis—the firm restructured its research teams. The lab fired 3% of its researchers. At the same time, it more than offset these departures through increased hiring, expanding its workforce on net. While I do not observe the abilities of the new hires, those dismissed were significantly more likely to have weak judgment. Figure 13 shows the percent fired or reassigned by quartile of γˆ j. Scientists in the top three quartiles faced less than a 2% chance of being let go, while those in the bottom quartile had nearly a 10% chance.”

I mean, come on, be for real…

Share

Now, my background in materials science provides me a neat leg up, as I’d assume the vast majority of those reviewing/reading/following this paper are economists and people interested in the effects of AI use.

How do the parts of this paper that directly engage with materials science hold up? Well, they’re a little too clever. Take Toner-Rodgers’ analysis of “materials similarity” where he claimed to have used crystal structure calculations to determine how similar the new materials were to previously discovered materials. The plot is stunningly unambiguous, the new materials discovered with AI are more novel.

However, it boggles the mind that a random economics student at MIT would be able to easily (and without providing any further details), perform the highly sophisticated technique from the paper he cites (De et al, 2016), especially in this elegantly formalized manner without any domain expertise in computational materials research. This graph, and the data it represents, if true, would probably be worth a Nature paper on AI materials discovery on its own. In his paper, it’s relegated to the appendices.

This methodology also makes no sense at generalizing across different types of materials, so I have no clue how you could reduce the results from such broad classes of materials to a single figure of merit in this manner. The gaps between 0.0 and 0.2 and 0.8 and 1.0 might seem reasonable to someone who read a few papers and noticed similar gaps in a couple of the graphs, but it would be bizarre when generalized across several classes of materials, and the data is likely completely fabricated for this reason. To simplify this critique, a novel metal alloy would have a very different level of similarity from a reference class of previously-discovered alloys, than a novel polymer would from its own reference class. It would require some really sophisticated methodology to normalize this single figure of merit across material types, which Toner-Rodgers does not mention at all. Also, this would all be insanely challenging to implement using data from the Materials Project, requiring some sophisticated “big data” workflows. If you want a smoking gun, here’s a graph from a paper, Krieger et al, “Missing Novelty in Drug Development,” which Toner-Rodgers cites, using a similar methodology for drug discovery. It looks eerily similar to the distribution in this preprint. This distribution might make sense for drugs, but makes very little intuitive sense for a broad range of materials, with the figure of merit derived directly from the atomic positions in the crystal structure. This is the kind of mistake that someone with no domain expertise in materials science might make.

Toner-Rodgers’ treatment of “materials quality” would also probably drive a materials scientist insane if they were forced to think about it at length.

Here’s the equation he uses to calculate the “quality” of a new material:

This would likely be a case of extreme garbage in: garbage out. First of all, there are typically no “target features” that are easily reduced to single values, but also, even if there were, some of these would be distributed on a log scale, which would dramatically skew the values for certain classes of materials. Also, in general, the “quality” of a new material that an R&D lab develops is likely not at all related to improvements in the actual top-line figures of merit like “band gap” or “refractive index”, the two examples that Toner-Rodgers gives. Instead, they would be for things like durability, affordability, ease of manufacture, etc. These are all properties that are not easily reduced to a single value. And even if they were, good luck getting researchers to measure, systematize, and document these values for the new materials!

However, from this amalgam of gibberish, Toner-Rodgers manages to extract a significant finding anyway! All 1,018 scientists contribute to this endeavor, and statistically significant findings are reported in every single category:

Leave a comment

Some people might look at this saga and think “ah, another bs preprint, thankfully we have peer review to deal with it.” However, I think that were it not for the fact that this preprint had gained so much attention, this article would have slipped through peer review, only to embarrass the editors of the top econ journal in the world after being published and reported on.

Moreover, these are the kind of errors that the editorial process at an econ journal might not catch. I think the most clearly fraudulent components of the paper are those that seem to dramatically simplify the complexity of the materials work going into the paper. Robert Palgrave, who has been an outstanding critic in the past of skeptical work on AI materials discovery, has a twitter thread noting similar problems with the work (I promise I read his thread after writing the bulk of this blog post). And when the piece originally came out, he had an orthogonal, but also very valid set of reasons for being skeptical of the work (mostly due to the difficulty in defining the “novelty” of materials).

In general, the lesson I think we should learn is to be much more skeptical of these sorts of research findings. Learning new things about the world is hard, and generally randomized trials on such a complex topic should show much more ambiguous results. The fact that the data was so beautiful and fit such a perfect narrative should have raised alarm bells, rather than catapulting the results to international attention.

I also think that if comments were enabled on arxiv preprints, this could have led to a much more rapid conclusion to the fraud. Probably a materials scientist who read the paper realized this was fraudulent but wasn’t able to get that view quickly to the economists who were actually reading and discussing the paper. A well-written arxiv comment explaining why the data on materials similarity, for example, couldn’t be true, would have gone a long way.

After writing a draft of this blog post, I saw this tweet which says that Corning, this January, filed an IP complaint with the WIPO against Toner-Rodgers for registering a domain name called “corningresearch.com”.

This validates my earlier guess as to which companies’ data this might plausibly be. However, it looks like Toner-Rodgers may have been using this website to privately substantiate his fake data, without the knowledge of Corning? I’m not sure what this means, but it’s certainly interesting. It’s possible he was using the domain name to send fake emails to himself, or to generate pdf files at plausible-sounding urls, to show his advisor. Corning is a great company, and if they actually did collect this data and evaluate the materials properties in some coherent manner, that’s extremely impressive. However, I still think it’s far more likely that the data was completely fabricated by Toner-Rodgers.

Read the whole story
williampietri
160 days ago
reply
Share this story
Delete

Lessons From the Newark Debacle

1 Share

I did something either brave or foolish last week. I was booked on a flight from Amsterdam to Newark, and decided to ignore advice from friends urging me to rebook on a plane heading someplace else. And a strange thing happened: my flight arrived right on schedule.

Obviously thousands of flyers have been having very different experiences in recent weeks, and air traffic control at Newark remains a mess. So what can we learn from the debacle?

I’d like to blame Elon Musk and say that all those delayed travelers have been DOGEd. Sadly, the problems at Newark, and with air traffic control in general, have been building for many months. So you can’t blame this problem on the Muskenjugend — the tech bros barely old enough to shave that DOGE has parachuted into many government agencies — even though they are indeed wreaking havoc and will be responsible for many future debacles.

That said, the Newark mess is an object lesson in what’s wrong with DOGE and right-wing views of government in general.

The proximate causes of the current crisis, as I understand it, go like this: The Federal Aviation Administration as a whole is severely understaffed, with a dangerous shortage of air traffic controllers in particular, as well as relying on antiquated equipment — we’re talking Windows 95 and floppy disks. Recruiting controllers for the New York area has been especially hard because of the high cost of living (which is mainly about housing.) In an effort to improve recruitment, the FAA moved traffic control to Philadelphia, where the cost of living is substantially lower.

But many controllers refused to make the move, and the technology side of the transition was botched — apparently the Philadelphia center’s jerry-rigged link to radar and communications keeps going down, and some of the controllers in Philadelphia have been so traumatized that they have exercised their right to take leaves of absence, worsening the staff crisis.

Ordinarily I’d say that we’ll eventually have the full story of what went wrong and find ways to fix it. But maybe not. Do you trust Trump administration officials to conduct a full and honest inquiry rather than look for ways to blame the Biden administration and/or the traffic controllers? Do you trust them to look for real solutions rather than justifications for privatization and sweetheart contracts for supporters?

I was struck by Sean Duffy, the transportation secretary, declaring that “patriotic controllers are going to stay on and continue to serve the country.” This from an administration that has taken self-dealing to levels unimagined in our nation’s history.

But back to DOGE and all that. The whole premise underlying Muskification is that much of the federal workforce is deadwood — legions of overpaid bureaucrats pushing paper around without doing anything useful. In reality, however, many federal workers are like air traffic controllers — doing jobs that are essential to keeping the economy and normal life in general proceeding smoothly. And while the air traffic controller shortage is probably (I hope!) exceptionally severe, the federal bureaucracy is in general stretched thin after decades of anti-government rhetoric that have left federal employment as a share of total employment far below historical levels:

And if you’re wondering why the government is having trouble recruiting enough traffic controllers, you should know that the Congressional Budget Office has found that highly educated federal workers are, on average, paid less than equivalent workers in the private sector. Workers with a doctorate or professional degree are paid 29 percent less than their private-sector counterparts:

Source: Congressional Budget Office

And this gap has widened in recent years, because Congress has capped federal salary increases.

This matches my personal observation. The federal workers I know tend to be in economics or finance-related jobs, and they earn less — sometimes far less — than they could make if they went to Wall Street.

Why, then, do highly educated Americans even take federal jobs? CBO stresses job security, which has indeed historically been higher for federal workers than their private-sector counterparts. I would also say, based on those I know, that meaning is a factor. At least some high-level federal workers accept lower pay than they could make elsewhere because they feel that they’re doing something that matters. No doubt that’s only a relatively small subset of the federal work force, but it’s surely an important subset, people who are doing especially crucial jobs.

But that was the way things used to be. How much job security can high-level federal workers feel when they never know when they’ll be DOGEd — abruptly fired without notice, locked out of their offices and even their email accounts? How much pride can they take in their work when their political masters never miss a chance to say that they’re worthless (unless there’s a crisis, in which case it becomes their patriotic duty to stay on the job?).

So my prediction is that the air traffic control crisis is the shape of things to come. In a matter of months Trump, Musk and company have severely degraded the morale and, eventually, the quality of the federal work force. And the result will be many more debacles.

MUSICAL CODA

Read the whole story
williampietri
164 days ago
reply
Share this story
Delete
Next Page of Stories