NLP

I due programmi prendono in input i corpora da riga di comando e restituiscono l'output nei due file di risultato.

python programma1.py clinton trump > risultato1
python programma2.py clinton trump > risultato2

x

Obiettivo

Realizzazione di due programmi scritti in Python che utilizzino i moduli presenti in Natural Language Toolkit per leggere due file di testo in inglese, annotarli linguisticamente, confrontarli sulla base degli indici statistici richiesti ed estrarne le informazioni richieste.

Fasi realizzative

Creare due corpora in inglese contenenti i discorsi di Hillary Clinton e di Donald Trump, di almeno 5000 token ciascuno. I corpora devono essere creati selezionando i discorsi di Clinton da questa fonte e di Trump da questa fonte e salvandoli in due file di testo semplice utf-8. Sviluppare due programmi che prendono in input i due file da riga di comando, che li analizzano linguisticamente fino al Part-of-Speech tagging e che eseguono le operazioni richieste.

Programma 1

Confrontare i due testi sulla base delle seguenti informazioni statistiche:

  • il numero di token;
  • la lunghezza media delle frasi in termini di token;
  • la grandezza del vocabolario all'aumento del corpus per porzioni incrementali di 1000 token (1000 token, 2000 token, 3000 token, etc.);
  • la ricchezza lessicale calcolata attraverso la Type Token Ratio (TTR) all'aumento del corpus per porzioni incrementali di 1000 token (1000 token, 2000 token, 3000 token, etc.);
  • il rapporto tra sostantivi e verbi (indice che caratterizza variazioni di registro linguistico);
  • la densità lessicale, calcolata come il rapporto tra il numero totale di occorrenze nel testo di Sostantivi, Verbi, Avverbi, Aggettivi e il numero totale di parole nel testo (ad esclusione dei segni di punteggiatura marcati con POS "," "."): (|Sostantivi|+|Verbi|+|Avverbi|+|Aggettivi|)/(TOT-( |.|+|,| ) )
  • Programma 2

    Per ognuno dei due corpora estrarre le seguenti informazioni:

    1. estrarre ed ordinare in ordine di frequenza decrescente, indicando anche la relativa frequenza:

  • le 10 PoS (Part-of-Speech) più frequenti;
  • i 20 token più frequenti escludendo la punteggiatura;
  • i 20 bigrammi di token più frequenti che non contengono punteggiatura, articoli e congiunzioni;
  • i 20 trigrammi di token più frequenti che non contengono punteggiatura, articoli e congiunzioni;
  • 2. estrarre ed ordinare i 20 bigrammi composti da Aggettivo e Sostantivo (dove ogni token deve avere una frequenza maggiore di 2):

  • con probabilità congiunta massima, indicando anche la relativa probabilità;
  • con probabilità condizionata massima, indicando anche la relativa probabilità;
  • con forza associativa (calcolata in termini di Local Mutual Information) massima, indicando anche la relativa forza associativa;
  • 3. le due frasi con probabilità più alta. Dove la probabilità della prima frase deve essere calcolata attraverso un modello di Markov di ordine 0 mentre la seconda con un modello di Markov di ordine 1, i due modelli devono usare le statistiche estratte dal corpus che contiene le frasi; Le frasi devono essere lunghe almeno 10 token e ogni token deve avere una frequenza maggiore di 2;

    4. dopo aver individuato e classificato le Entità Nominate (NE) presenti nel testo, estrarre:

  • i 20 nomi propri di persona più frequenti (tipi), ordinati per frequenza;
  • i 20 nomi propri di luogo più frequenti (tipi), ordinati per frequenza
  • .

    Thank you. Thank you very much. Thank you. I want to thank Dr. Frank Mora, director of the Kimberly Latin American and Caribbean Center and a professor here at FIU, and before that served with distinction at the Department of Defense. I want to recognize former Congressman Joe Garcia. Thank you Joe for being here-a long time friend and an exemplary educator. The President of Miami-Dade College, Eduardo Padron and the President of FIU, Mark Rosenberg-I thank you all for being here. And for me it's a delight to be here at Florida International University. You can feel the energy here. It's a place where people of all backgrounds and walks of life work hard, do their part, and get ahead. That's the promise of America that has drawn generations of immigrants to our shores, and it's a reality right here at FIU. Today, as Frank said, I want to talk with you about a subject that has stirred passionate debate in this city and beyond for decades, but is now entering a crucial new phase. America's approach to Cuba is at a crossroads, and the upcoming presidential election will determine whether we chart a new path forward or turn back to the old ways of the past. We must decide between engagement and embargo, between embracing fresh thinking and returning to Cold War deadlock. And the choices we make will have lasting consequences not just for more than 11 million Cubans, but also for American leadership across our hemisphere and around the world. I know that for many in this room and throughout the Cuban American community, this debate is not an intellectual exercise-it is deeply personal. I teared up as Frank was talking about his mother-not able to mourn with her family, say goodbye to her brother. I'm so privileged to have a sister-in-law who is Cuban American, who came to this country, like so many others as a child and has charted her way with a spirit of determination and success. I think about all those who were sent as children to live with strangers during the Peter Pan airlift, for families who arrived here during the Mariel boatlift with only the clothes on their backs, for sons and daughters who could not bury their parents back home, for all who have suffered and waited and longed for change to come to the land, "where palm trees grow." And, yes, for a rising generation eager to build a new and better future. Many of you have your own stories and memories that shape your feelings about the way forward. Like Miriam Leiva, one of the founders of the Ladies in White, who is with us today-brave Cuban women who have defied the Castro regime and demanded dignity and reform. We are honored to have her here today and I'd like to ask her, please raise your hand. Thank you. I wish every Cuban back in Cuba could spend a day walking around Miami and see what you have built here, how you have turned this city into a dynamic global city. How you have succeeded as entrepreneurs and civic leaders. It would not take them long to start demanding similar opportunities and achieving similar success back in Cuba. I understand the skepticism in this community about any policy of engagement toward Cuba. As many of you know, I've been skeptical too. But you've been promised progress for fifty years. And we can't wait any longer for a failed policy to bear fruit. We have to seize this moment. We have to now support change on an island where it is desperately needed. I did not come to this position lightly. I well remember what happened to previous attempts at engagement. In the 1990s, Castro responded to quiet diplomacy by shooting down the unarmed Brothers to the Rescue plane out of the sky. And with their deaths in mind, I supported the Helms-Burton Act to tighten the embargo. Twenty years later, the regime's human rights abuses continue: imprisoning dissidents, cracking down on free expression and the Internet, beating and harassing the courageous Ladies in White, refusing a credible investigation into the death of Oswaldo Paya. Anyone who thinks we can trust this regime hasn't learned the lessons of history. But as secretary of state, it became clear to me that our policy of isolating Cuba was strengthening the Castros' grip on power rather than weakening it-and harming our broader efforts to restore American leadership across the hemisphere. The Castros were able to blame all of the island's woes on the U.S. embargo, distracting from the regime's failures and delaying their day of reckoning with the Cuban people. We were unintentionally helping the regime keep Cuba a closed and controlled society rather than working to open it up to positive outside influences the way we did so effectively with the old Soviet bloc and elsewhere. So in 2009, we tried something new. The Obama administration made it easier for Cuban Americans to visit and send money to family on the island. No one expected miracles, but it was a first step toward exposing the Cuban people to new ideas, values, and perspectives. I remember seeing a CNN report that summer about a Cuban father living and working in the United States who hadn't seen his baby boy back home for a year-and-a-half because of travel restrictions. Our reforms made it possible for that father and son finally to reunite. It was just one story, just one family, but it felt like the start of something important. In 2011, we further loosened restrictions on cash remittances sent back to Cuba and we opened the way for more Americans-clergy, students and teachers, community leaders-to visit and engage directly with the Cuban people. They brought with them new hope and support for struggling families, aspiring entrepreneurs, and brave civil society activists. Small businesses started opening. Cell phones proliferated. Slowly, Cubans were getting a taste of a different future. I then became convinced that building stronger ties between Cubans and Americans could be the best way to promote political and economic change on the island. So by the end of my term as Secretary, I recommended to the President that we end the failed embargo and double down on a strategy of engagement that would strip the Castro regime of its excuses and force it to grapple with the demands and aspirations of the Cuban people. Instead of keeping change out, as it has for decades, the regime would have to figure out how to adapt to a rapidly transforming society. What's more, it would open exciting new business opportunities for American companies, farmers, and entrepreneurs-especially for the Cuban-American community. That's my definition of a win-win. Now I know some critics of this approach point to other countries that remain authoritarian despite decades of diplomatic and economic engagement. And yes it's true that political change will not come quickly or easily to Cuba. But look around the world at many of the countries that have made the transition from autocracy to democracy - from Eastern Europe to East Asia to Latin America. Engagement is not a silver bullet, but again and again we see that it is more likely to hasten change, not hold it back. The future for Cuba is not foreordained. But there is good reason to believe that once it gets going, this dynamic will be especially powerful on an island just 90 miles from the largest economy in the world. Just 90 miles away from one and a half million Cuban Americans whose success provides a compelling advertisement for the benefits of democracy and an open society. So I have supported President Obama and Secretary Kerry as they've advanced this strategy. They've taken historic steps forward-re-establishing diplomatic relations, reopening our embassy in Havana, expanding opportunities further for travel and commerce, calling on Congress to finally drop the embargo. That last step about the embargo is crucial, because without dropping it, this progress could falter. We have arrived at a decisive moment. The Cuban people have waited long enough for progress to come. Even many Republicans on Capitol Hill are starting to recognize the urgency of moving forward. It's time for their leaders to either get on board or get out of the way. The Cuba embargo needs to go, once and for all. We should replace it with a smarter approach that empowers Cuban businesses, Cuban civil society, and the Cuban American community to spur progress and keep pressure on the regime. Today I am calling on Speaker Boehner and Senator McConnell to step up and answer the pleas of the Cuban people. By large majorities, they want a closer relationship with America. They want to buy our goods, read our books, surf our web, and learn from our people. They want to bring their country into the 21st century. That is the road toward democracy and dignity and we should walk it together. We can't go back to a failed policy that limits Cuban Americans' ability to travel and support family and friends. We can't block American businesses that could help free enterprise take root in Cuban soil-or stop American religious groups and academics and activists from establishing contacts and partnerships on the ground. If we go backward, no one will benefit more than the hardliners in Havana. In fact, there may be no stronger argument for engagement than the fact that Cuba's hardliners are so opposed to it. They don't want strong connections with the United States. They don't want Cuban Americans traveling to the island. They don't want American students and clergy and NGO activists interacting with the Cuban people. That is the last thing they want. So that's precisely why we need to do it. Unfortunately, most of the Republican candidates for President would play right into the hard-liners' hands. They would reverse the progress we have made and cut the Cuban people off from direct contact with the Cuban American community and the free-market capitalism and democracy that you embody. That would be a strategic error for the United States and a tragedy for the millions of Cubans who yearn for closer ties. They have it backwards: Engagement is not a gift to the Castros-it's a threat to the Castros. An American embassy in Havana isn't a concession-it's a beacon. Lifting the embargo doesn't set back the advance of freedom-it advances freedom where it is most desperately needed. Fundamentally, most Republican candidates still view Cuba-and Latin America more broadly-through an outdated Cold War lens. Instead of opportunities to be seized, they see only threats to be feared. They refuse to learn the lessons of the past or pay attention to what's worked and what hasn't. For them, ideology trumps evidence. And so they remain incapable of moving us forward. As President, I would increase American influence in Cuba, rather than reduce it. I would work with Congress to lift the embargo and I would also pursue additional steps. First, we should help more Americans go to Cuba. If Congress won't act to do this, I would use executive authority to make it easier for more Americans to visit the island to support private business and engage with the Cuban people. Second, I would use our new presence and connections to more effectively support human rights and civil society in Cuba. I believe that as our influence expands among the Cuban people, our diplomacy can help carve out political space on the island in a way we never could before. We will follow the lead of Pope Francis, who will carry a powerful message of empowerment when he visits Cuba in September. I would direct U.S. diplomats to make it a priority to build relationships with more Cubans, especially those starting businesses and pushing boundaries. Advocates for women's rights and workers' rights. Environmental activists. Artists. Bloggers. The more relationships we build, the better. We should be under no illusions that the regime will end its repressive ways any time soon, as its continued use of short-term detentions demonstrates. So we have to redouble our efforts to stand up for the rights of reformers and political prisoners, including maintaining sanctions on specific human-rights violators. We should maintain restrictions on the flow of arms to the regime-and work to restrict access to the tools of repression while expanding access to tools of dissent and free expression. We should make it clear, as I did as secretary of state, that the "freedom to connect" is a basic human right, and therefore do more to extend that freedom to more and more Cubans-particularly young people. Third, and this is directly related, we should focus on expanding communications and commercial links to and among the Cuban people. Just five percent of Cubans have access to the open Internet today. We want more American companies pursuing joint ventures to build networks that will open the free flow of information-and empower everyday Cubans to make their voices heard. We want Cubans to have access to more phones, more computers, more satellite televisions. We want more American airplanes and ferries and cargo ships arriving every day. I'm told that Airbnb is already getting started. Companies like Google and Twitter are exploring opportunities as well. It will be essential that American and international companies entering the Cuban market act responsibly, hold themselves to high standards, use their influence to push for reforms. I would convene and connect U.S. business leaders from many fields to advance this strategy, and I will look to the Cuban American community to continue leading the way. No one is better positioned to bring expertise, resources, and vision to this effort-and no one understands better how transformative this can be. We will also keep pressing for a just settlement on expropriated property. And we will let Raul explain to his people why he wants to prevent American investment in bicycle repair shops, in restaurants, in barbershops, and Internet cafes. Let him try to put up barriers to American technology and innovation that his people crave. Finally, we need to use our leadership across the Americas to mobilize more support for Cubans and their aspirations. Just as the United States needed a new approach to Cuba, the region does as well. Latin American countries and leaders have run out of excuses for not standing up for the fundamental freedoms of the Cuban people. No more brushing things under the rug. No more apologizing. It is time for them to step up. Not insignificantly, new regional cooperation on Cuba will also open other opportunities for the United States across Latin America. For years, our unpopular policy towards Cuba held back our influence and leadership. Frankly, it was an albatross around our necks. We were isolated in our opposition to opening up the island. Summit meetings were consumed by the same old debates. Regional spoilers like Venezuela took advantage of the disagreements to advance their own agendas and undermine the United States. Now we have the chance for a fresh start in the Americas. Strategically, this is a big deal. Too often, we look east, we look west, but we don't look south. And no region in the world is more important to our long-term prosperity and security than Latin America. And no region in the world is better positioned to emerge as a new force for global peace and progress. Many Republicans seem to think of Latin America still as a land of crime and coups rather than a place where free markets and free people are thriving. They've got it wrong. Latin America is now home to vibrant democracies, expanding middle classes, abundant energy supplies, and a combined GDP of more than $4 trillion. Our economies, communities, and even our families are deeply entwined. And I see our increasing interdependence as a comparative advantage to be embraced. The United States needs to build on what I call the "power of proximity." It's not just geography-it's common values, common culture, common heritage. It's shared interests that could power a new era of partnership and prosperity. Closer ties across Latin America will help our economy at home and strengthen our hand around the world, especially in the Asia-Pacific. There is enormous potential for cooperation on clean energy and combating climate change. And much work to be done together to take on the persistent challenges in our hemisphere, from crime to drugs to poverty, and to stand in defense of our shared values against regimes like that in Venezuela. So the United States needs to lead in the Latin America. And if we don't, make no mistake, others will. China is eager to extend its influence. Strong, principled American leadership is the only answer. That was my approach as Secretary of State and will be my priority as President. Now it is often said that every election is about the future. But this time, I feel it even more powerfully. Americans have worked so hard to climb out of the hole we found ourselves in with the worst financial crisis since the Great Depression in 2008. Families took second jobs and second shifts. They found a way to make it work. And now, thankfully, our economy is growing again. Slowly but surely we also repaired America's tarnished reputation. We strengthened old alliances and started new partnerships. We got back to the time-tested values that made our country a beacon of hope and opportunity and freedom for the entire world. We learned to lead in new ways for a complex and changing age. And America is safer and stronger as a result. We cannot afford to let out-of-touch, out-of-date partisan ideas and candidates rip away all the progress we've made. We can't go back to cowboy diplomacy and reckless war-mongering. We can't go back to a go-it-alone foreign policy that views American boots on the ground as a first choice rather than as a last resort. We have paid too high a price in lives, power, and prestige to make those same mistakes again. Instead we need a foreign policy for the future with creative, confident leadership that harnesses all of America's strength, smarts, and values. I believe the future holds far more opportunities than threats if we shape global events rather than reacting to them and being shaped by them. That is what I will do as President, starting right here in our own hemisphere. I'm running to build an America for tomorrow, not yesterday. For the struggling, the striving, and the successful. For the young entrepreneur in Little Havana who dreams of expanding to Old Havana. For the grandmother who never lost hope of seeing freedom come to the homeland she left so long ago. For the families who are separated. For all those who have built new lives in a new land. I'm running for everyone who's ever been knocked down, but refused to be knocked out. I am running for you and I want to work with you to be your partner to build the kind of future that will once again not only make Cuban Americans successful here in our country, but give Cubans in Cuba the same chance to live up to their own potential. Thank you all very, very much. "Well, let me thank you, Strobe, it's great to be back at Brookings, and there are a lot of long time friends and colleagues who perch here at Brookings. Obviously including Strobe and Martin who I'll speak to in a minute. Also Bob Einhorn and Tammy Wittes. This institution has hosted many important conversations over the years, and I appreciate Strobe's reference to the event last night and the continuing dialogue about urgent issues facing our nation and world. That's what brings me here today-back to Brookings-to talk about a question we are all grappling with right now: how to prevent Iran from acquiring a nuclear weapon-and more broadly, how to protect ourselves and our allies from the full range of threats that Iran poses. The stakes are high, and there are no simple or perfectly satisfying solutions. So these questions-and in particular, the merits of the nuclear deal recently reached with Iran-have divided people of good will and raised hard issues on both sides. Here's how I see it. Either we move forward on the path of diplomacy and seize this chance to block Iran's path to a nuclear weapon-or, we turn down a more dangerous path leading to a far less certain and riskier future. That's why I support this deal. I support it as part of a larger strategy toward Iran. By now, the outcome in Congress is no longer in much doubt. So we've got to start looking ahead to what comes next: enforcing the deal, deterring Iran and its proxies, and strengthening our allies. These will be my goals as president. And today, I want to talk about how I would achieve them. Let me start by saying I understand the skepticism so many feel about Iran. I too am deeply concerned about Iranian aggression and the need to confront it. It's a ruthless, brutal regime that has the blood of Americans, many others, including its own people, on its hands. Its political rallies resound with cries of "Death to America." Its leaders talk about wiping Israel off the face of the map, most recently just yesterday, and foment terror against it. There is absolutely no reason to trust Iran. Now, Vice President Cheney may hope that the American people will simply forget, but the truth is, by the time President Obama took office and I became secretary of state, Iran was racing toward a nuclear capability. They had mastered the nuclear fuel cycle-meaning that they had the material, scientists, and technical know-how to create material for nuclear weapons. They had produced and installed thousands of centrifuges, expanded their secret facilities, established a robust uranium enrichment program, and defied their international obligations under the Nuclear Non-Proliferation Treaty. And they hadn't suffered many consequences. I voted for sanctions again and again as a Senator from New York, but they weren't having much effect. Most of the world still did business with Iran. We needed to step up our game. So President Obama and I pursued a two-pronged strategy: pressure and engagement. We made it clear that the door to diplomacy was open-if Iran answered the concerns of the international community in a serious and credible way. We simultaneously launched a comprehensive campaign to significantly raise the cost of Iranian defiance. We systematically increased our military capabilities in the region, deepening our cooperation with partners and sending more firepower-an additional aircraft carrier, battleship, strike aircraft, and the most advanced radar and missile defense systems available. Meanwhile, I traveled the world-capital by capital, leader by leader-twisting arms to help build the global coalition that produced some of the most effective sanctions in history. With President Obama's leadership, we worked with Congress and the European Union to cut Iran off from the world's economic and financial system. And one by one, we persuaded energy-hungry consumers of Iranian oil like India and South Korea to cut back. Soon, Iran's tankers sat rusting in port. Its economy was collapsing. These new measures were effective because we made them global. American sanctions provided the foundation - but Iran didn't really feel the heat until we turned this into an international campaign so biting that Iran had no choice but to negotiate. They could no longer play off one country against another. They had no place to hide. So, they started looking for a way out. I first visited Oman to speak with the sultan of Oman in January of 2011. Went back later that year. The sultan helped set up a secret backchannel. I sent one of my closest aides as part of a small team to begin talks with the Iranians in secret. Negotiations began in earnest after the Iranian election in 2013-first the bilateral talks led by Deputy Secretary Bill Burns and Jake Sullivan that led to the interim agreement; then the multilateral talks led by Secretary John Kerry, Secretary Ernie Moniz, and Under Secretary Wendy Sherman. Now there's a comprehensive agreement on Iran's nuclear program. Is it perfect? Well, of course not. No agreement like this ever is. But is it a strong agreement? Yes it is. And we absolutely should not turn it down. The merits of the deal have been well argued, so I won't go through them in great detail here. The bottom line is that it accomplishes the major goals we set out to achieve. It blocks every pathway for Iran to get a bomb. And it gives us better tools for verification and inspection, and to compel rigorous compliance. Without a deal, Iran's breakout time-how long they need to produce enough material for a nuclear weapon-would shrink to a couple of months. With a deal, that breakout time stretches to a year, which means that if Iran cheats, we'll know it and we'll have time to respond decisively. Without a deal, we would have no credible inspections of Iran's nuclear facilities. With a deal, we'll have unprecedented access. We'll be able to monitor every aspect of their nuclear program. Now, some have expressed concern that certain nuclear restrictions expire after 15 years, and we need to be vigilant about that, which I'll talk more about in a moment. But other parts are permanent, including Iran's obligations under the Non-Proliferation Treaty and their commitment to enhanced inspections under the additional protocol. Others have expressed concern that it could take up to 24 days to gain access to some of Iran's facilities when we suspect cheating. I'd be the first to say that this part of the deal is not perfect-although the deal does allow for daily access to enrichment facilities and monitoring of the entire nuclear fuel cycle. It's important to focus on that because being able to monitor the supply chain is critical to what we will find out and how we will be able to respond. But our experts tell us that that even with delayed access to some places, this deal does the job. Microscopic nuclear particles remain for years and years. They are impossible to hide. That's why Secretary Moniz, a nuclear physicist, has confidence in this plan. And some have suggested that we just go back to the negotiating table and get a better, unspecified deal. I can certainly understand why that may sound appealing. But as someone who started these talks in the first place and built our global coalition piece by piece, I can assure you, it is not realistic. Plus, if we walk away now, our capacity to sustain and enforce sanctions will be severely diminished. We will be blamed, not the Iranians. So if we were to reject this agreement, Iran would be poised to get nearly everything it wants without giving up a thing. No restrictions on their nuclear program. No real warning if Tehran suddenly rushes toward a bomb. And the international sanctions regime would fall apart-so no more economic consequences for Iran, either. Those of us who have been out there on the diplomatic front lines know that diplomacy is not the pursuit of perfection-it's the balancing of risk. And on balance, the far riskier course right now would be to walk away. Great powers can't just junk agreements and expect the rest of the world to go along with us. We need to be reasonable and consistent, and we need to keep our word-especially when we're trying to lead a coalition. That's how we'll make this-and future-deals work. But it's not enough just to say yes to this deal. Of course it isn't. We have to say, "Yes-and." Yes, and we will enforce it with vigor and vigilance. Yes, and we will embed it in a broader strategy to confront Iran's bad behavior in the region. Yes, and we will begin from day one to set the conditions so Iran knows it will never be able to get a nuclear weapon-not during the term of the agreement, not after, not ever. We need to be clear and I think we have to make that very clear to Iran about what we expect from them. This is not the start of some larger diplomatic opening. And we shouldn't expect that this deal will lead to broader changes in their behavior. That shouldn't be a promise for proceeding. Instead, we need to be prepared for three scenarios. First, Iran tries to cheat-something it's been quite willing to do in the past. Second, Iran tries to wait us out. Perhaps it waits to move for 15 years, when some but not all restrictions expire. And third, Iran ramps up its dangerous behavior in the region, including its support for terrorist groups like Hamas and Hezbollah. I believe that the success of this deal has a lot to do with how the next president grapples with these challenges. So let me tell you what I would do. My starting point will be one of distrust. You remember President Reagan's line about the Soviets-trust, but verify? My approach will be distrust, and verify. We should anticipate that Iran will test the next president. They'll want to see how far they can bend the rules. That won't work if I'm in the White House. I'll hold the line against Iranian non-compliance. That means penalties even for small violations. Keeping our allies on board-but being willing to snap back sanctions into place unilaterally if we have to. Working with Congress to close any gaps in the sanctions. Right now, members of Congress are offering proposals to that effect, and I think the current administration should work with them to see whether there are additional steps that could be taken. Finally, it means ensuring that the IAEA has the resources it needs-from finances to personnel to equipment-to hold Iran's feet to the fire. But the most important thing we can do to keep Iran from cheating or trying to wait us out is to shape Iranian expectations right from the start. The Iranians and the world need to understand that we will act decisively if we need to. So here's my message to Iran's leaders: The United States will never allow you to acquire a nuclear weapon. As president, I will take whatever actions are necessary to protect the United States and our allies. I will not hesitate to take military action if Iran attempts to obtain a nuclear weapon. And I will set up my successor to be able to credibly make the same pledge. We will make clear to Iran that our national commitment to prevention will not waver depending on who's in office. It's permanent. And should it become necessary in the future, having exhausted peaceful alternatives, to turn to military force, we will have preserved-and in some cases enhanced-our capacity to act. And because we've proven our commitment to diplomacy first, the world will more likely join us. Then there's the broader issue of countering Iran's bad behavior across the region. Taking nuclear weapons out of the equation is crucial, because an Iran with nuclear weapons is so much more dangerous than an Iran without them. But even without nuclear weapons, we still see Iran's fingerprints on nearly every conflict across the Middle East. They support bad actors from Syria to Lebanon to Yemen. They vow to destroy Israel. And that's worth saying again - they vow to destroy Israel. We cannot ever take that lightly, particularly when Iran ships advanced missiles to Hezbollah and the Ayatollah outlines an actual strategy for eliminating Israel-or talks about how Israel won't exist in 25 years, just like he did today. And in addition to all the malicious activity they already underwrite, we've got to anticipate that Iran could use some of the economic relief they get from this deal to pay for even more. So as president, I will raise the costs for their actions, and confront them across the board. My strategy will be based on five strong pillars. First, I will deepen America's unshakeable commitment to Israel's security, including our long-standing tradition of guaranteeing Israel's Qualitative Military Edge. I'll increase support for Israeli rocket and missile defenses, and for intelligence sharing. I'll sell Israel the most sophisticated fighter aircraft ever developed, the F-35. We'll work together to develop and implement better tunnel detection technology to prevent arms smuggling and kidnapping, as well as the strongest possible missile defense system for northern Israel, which has been subjected to Hezbollah's attacks for years. Second, I will reaffirm that the Persian Gulf is a region of vital interest to the United States. We don't want any of Iran's neighbors to develop or acquire a nuclear weapons program either-so we want them to feel and be secure. I will sustain a robust military presence in the region, especially our air and naval forces. We'll keep the Strait of Hormuz open. We'll increase security cooperation with our Gulf allies-including intelligence sharing, military support and missile defense-to ensure they can defend against Iranian aggression, even if that takes the form of cyber-attacks or other non-traditional threats. Iran should understand that the United States-and I as president-will not stand by as our Gulf allies and partners are threatened. We will act. Third, I will build a coalition to counter Iran's proxies, particularly Hezbollah. That means enforcing and strengthening the rules prohibiting the transfer of weapons to Hezbollah, looking at new ways to choke off their funding, and pressing our partners to treat Hezbollah as the terrorist organization it is. It's time to eliminate the false distinction that some still make between the supposed political and military wings. If you're part of Hezbollah, you're part of a terrorist organization, plain and simple. Beyond Hezbollah, I'll crack down on the shipment of weapons to Hamas, and push Turkey and Qatar to end their financial support. I'll press our partners in the region to prevent aircraft and ships owned by companies linked to Iran's Revolutionary Guard from entering their territories, and urge our partners to block Iranian planes from entering their airspace on their way to Yemen and Syria. Across the board, I will vigorously enforce-and strengthen if necessary-the American sanctions on Iran and its Revolutionary Guard for its sponsorship of terrorism, its ballistic missile program, and other destabilizing activities. I'll enforce-and strengthen if necessary-our restrictions on sending arms to Iran, and from Iran to bad actors like Syria. And I'll impose these sanctions on everyone involved in these activities-whether they're in Iran or overseas. This will be a special imperative as some of the UN sanctions lapse. So the U.S. and our partners have to step up. Fourth, I'll stand-as I always have-against Iran's abuses at home, from its detention of political prisoners to its crackdown on freedom of expression, including online. Its inhumane policies hold back talented and spirited people. Our quarrel is not and never has been with the Iranian people-they'd have a bright future, a hopeful future, if they weren't held back by their leaders. As I've said before, I think we were too restrained in our support of the protests in June 2009 and in our condemnation of the government crackdown that followed-that won't happen again. We will enforce and-if need be-broaden our human rights sanctions. And I will not rest until every single American detained or missing in Iran is home. Fifth, just as the nuclear agreement needs to be embedded in a broader Iran policy, our broader Iran policy needs to be embedded in a comprehensive regional strategy that promotes stability and counters extremism. Iran, like ISIS, benefits from chaos and strife. It exploits other countries' weaknesses. And the best defense against Iran are countries and governments that are strong-that can provide security and economic opportunity to their people and they must have the tools to push back on radicalization and extremism. Helping countries get there will take time and strategic discipline. But it's crucial that the United States leads this effort. I will push for renewed diplomacy to solve the destructive regional conflicts that Iran fuels. We have to bring sufficient pressure on Assad to force a political solution in Syria, including a meaningful increase in our efforts to train and equip the moderate Syrian opposition-something I called for early in the conflict. And the United States must lead in assisting those who have been uprooted by conflict-especially the millions of Syrian refugees now beseeching the world to help them. As Pope Francis has reminded us, this is an international problem that demands an international response-and United States must help lead that response. That's who we are, and that's what we do. So our strategy needs to cover all these bases. Iran's nuclear ambitions and its support of terrorism. Its hatred of Israel and its cruelty toward its citizens. Its military resources and its economic strengths and weaknesses. We need to be creative, committed, and vigilant. And on every front, we need to keep working closely with our friends and partners. On that note, let me just spend a moment speaking about the serious concerns that Israel's leaders have about this deal. Israel has every reason to be alarmed by a regime that both denies its existence and seeks its destruction. I would not support this agreement for one second if I thought it put Israel in greater danger. I believe in my core that Israel and America must stand side by side. And I will always stand by Israel's right to defend itself as I always have. I believe this deal, and a joint strategy for enforcing it, makes Israel safer. I say that with humility. I'm not Israeli. I don't know what it's like to live under constant threat from your neighbors, in a country where the margin for error is so thin. I know that my saying, "This deal makes you safer," won't alleviate the very real fears of the Israeli people. But I have stood for Israel's security for a very long time. It was one of my bedrock principles as secretary of state. It's why I supported stronger defense systems, like the Iron Dome anti-rocket defense system, which proved so effective in protecting Israeli lives during the conflicts of 2012 and last summer. It's why I've worked closely with Israel to advance the two-state vision of a Jewish and democratic Israel with secure and recognized borders. And it's why I believe we should expedite negotiations of a long-term military assistance agreement with Israel. Let's not wait until 2017, when the current deal expires-let's get it done this year. I would invite the Israeli prime minister to the White House during my first month in office, to talk about all of these issues-and to set us on a course of close, frequent consultation, right from the start. Because we both rely on each other for support, as partners, allies and friends. This isn't just about policy for me. It is personal. As president, I'm committed to shoring up and strengthening the relationship between our countries. We have had honest disagreements about this deal. Now is the time to come together. Now is the time to remember what unites us, and build upon it. And so, I know well that the same forces that threaten Israel, threaten the United States. And to the people of Israel, let me say-you'll never have to question whether we're with you. The United States will always be with you. There have also been honest disagreements about the nuclear deal here at home. Smart, serious people can see issues like these differently. Like my friend, Chuck Schumer, who's going to be an excellent leader in the Senate-I respect the skepticism that he and others feel. And I respect differences of opinion, and people who advocate vigorously for their beliefs. But I have a harder time respecting those who approach an issue as serious as this with unserious talk-especially anyone running to be president of the United States. Several Republican candidates boast they'll tear up this agreement in 2017, more than a year after it's been implemented. That's not leadership-that's recklessness. It would set us right down the very dangerous path we've worked so hard to avoid. I'm looking forward to a robust debate about foreign policy in this campaign. Where we have disagreements, we should lay them out-like if American ground forces in Iraq should engage in direct combat, as Scott Walker wants; or if we should keep Cuba closed, as Marco Rubio and Jeb Bush want. Let's debate these issues. But let's debate them on the basis of facts, not fear. Let's resist denigrating the patriotism or loyalty of those who disagree with us. And let's avoid at all costs undermining America's credibility abroad. That only makes us weaker, and I'm going to call it out whenever I see it. I spent four years representing America abroad as America's secretary of state. It was one of the greatest privileges of my life. And knowing that my fellow Americans were counting on me, and rooting for me - not as democrats, not as republicans, but as Americans - meant a great deal. We are all one team-the American team. And that doesn't change, no matter how much we might disagree. And I can tell you from personal experience-we are stronger overseas when we are united at home. So we simply have to find a way to work together better than we have been doing. There's a lot that Democrats and Republicans can and should agree on. The United States should lead in the Middle East-we can agree on that. We should stand by our friends against Iranian aggression-we can agree on that, too. I believe that the plan I've laid out today is one that all Americans could endorse, and I hope they will. The next president will face threats from many quarters-from those we see today, like terrorism from ISIS, aggressiveness from Putin, pandemics like Ebola, to all those we can't predict yet. We need a leader who has a strong vision for the future, and the skill and determination to get us there. We can't stop the world from changing. But we can help to shape those changes. And we do that by leading-with strength, smarts, and an unyielding commitment to our values. You know I saw that when I was first lady, senator and secretary of state, that when America leads with principle and purpose, other people and governments are eager to join us. No country comes close to matching our advantages-the strength of our economy, the skill of our workforce, our tradition of innovation, our unmatched network of alliances and partnerships. So we are poised to remain the world's most admired and powerful nation for a long time-if we make the smart choices, and practice smart leadership. That's what I will try to do as your president. And I believe as strongly as ever that our best days are ahead of us-and that America's greatest contributions to the world are yet to come. Thank you."
    Donald Trump spoke in Pennsylvania about what a Trump Administration would mean for America and for Pennsylvania. Mr. Trump will renegotiate our trade deals, end illegal immigration, stop the massive inflow of refugees, reduce surging crime, cut taxes and regulations, end Common Core, and repeal and replace disastrous Obamacare. He will bring jobs back, incomes will go up, taxes will go down, and companies will stay on American soil. Mr. Trump will rebuild our depleted military and build the 350 ship Navy that we need. We will invest in American craftsmen and American steel to rebuild this fleet. A Trump Administration will fix our severely depleted infrastructure. Obama-Clinton doubled the national debt in 8 years, spending trillions in the Middle East while they left our own country to crumble. Mr. Trump is proposing substantial new investment at home to fix America's transportation, drinking water, and other vital infrastructure. Mr. Trump will renegotiate NAFTA, stop the TPP, and stand up to foreign product dumping and currency manipulation. He will ensure that we start making things in America again. Our products will be sent around the world, not our jobs. Hillary Clinton is corrupt and unfit to be President of the United States, a fact that has been substantiated by her email scandal and continuing WikiLeaks. Mr. Trump is proposing a package of ethics reforms to make our government honest once more. America loses $2 trillion dollars in economic activity a year to regulations. He is going to put the corrupt regulation industry out of business; for every 1 new regulation, 2 old regulations must be eliminated. Mr. Trump will revitalize our economy, lowering our business tax from 35 to 15 percent. He will rebuild our inner cities and bring back safety and prosperity. He will stop the illegal immigration crisis, unlike Hillary Clinton who wants open borders. We will defend our borders, build a wall, and establish mandatory minimum federal prison sentences for anyone who illegally re-enters the country after having been deported. Mr. Trump will introduce the biggest tax cut since Ronald Reagan, eliminate job killing regulations, defend religious liberty, provide to school to every low-income child, end common core, support the men and women of law enforcement, save the 2nd Amendment, and appoint Supreme Court Justices who uphold the Constitution. Mr. Trump is going to fight to bring us all together as Americans under one flag and work to make our nation wealthy, great, strong, and safe. Donald Trump outlined his plan to bring manufacturing back to North Carolina. Mr. Trump will renegotiate our trade deals, end illegal immigration, stop the massive inflow of refugees, reduce surging crime, cut taxes and regulations, unleash American energy, rebuild our military, take care of our Vets, and repeal and replace disastrous Obamacare. He will bring back jobs; incomes will go up and taxes will go down. He will get us to 4% economic growth and create 25 million jobs over 10 years. He will keep jobs from leaving our country by charging companies a 35% tax when they want to ship their products back to the United States. Hillary Clinton is corrupt and unfit to be President of the United States, a fact that has been substantiated by her email scandal and continuing WikiLeaks. Mr. Trump is making it his mission to make our government honest once more so he has proposed a package of ethics reforms: First, he will institute a five year ban on all executive branch officials lobbying the government for five years after they leave a government service. Congress will pass this ban so that it can not be lifted by executive order. Second, Congress will institute its own five year ban on lobbying by formerr members of Congress and their staffs. Third, the definition of lobbyist will be expanded to encompass former government officials labeling themselves otherwise in order to close loopholes. Fourth, there will be a lifetime ban against senior executive branch officials lobbying on behalf of foreign governments. Fifth, Congress will pass a campaign finance reform that prevents registered foreign lobbyists from raising money in American elections. Mr. Trump will end our economic stagnation, renegotiate NAFTA, and bring jobs back to American soil. The American Desk will be introduced. It's mission will be to protect the economic interests of the American worker and the national interests of the United States. Mr. Trump will lower the business tax from 35% to 15%. He will rebuild our inner cities. He will stop illegal immigration and cut off funding to Sanctuary Cities that refuse to cooperate with federal immigration authorities. Mr. Trump will introduce the biggest tax cut since Ronald Reagan, eliminate job killing regulations, defend religious liberty, provide to school to every low-income child, end common core, support the men and women of law enforcement, save the 2nd Amendment, and appoint Supreme Court Justices who uphold the Constitution. Mr. Trump is going to fight for every person in this country and unite us all under a government whose concern is serving the people, not special interests. "Yesterday, I outlined my Contract with the American Voter - a detailed list of solutions to bring prosperity to our economy, safety to our communities, and honesty to our government. One of the issues I addressed at length is the issue of government corruption. I put forward a plan to stop it; my opponent has no plan to end government corruption because she is the embodiment of government corruption. ... My goal is to keep foreign money out of American politics. Hillary Clinton's goal is to put the Oval Office up for sale to whatever country offers the highest price. ... If we win on November 8th, we are going to fix our rigged system and we are going to Drain The Swamp Of Corruption In Washington, D.C. That change includes a new foreign policy that puts America First. ... To cover-up her crimes as Secretary of State, Hillary Clinton bleached and deleted 33,000 emails, lied to Congress under oath, made 13 phones disappear - some with a hammer - and then told the FBI she couldn't remember 39 times. The best evidence that the system is rigged is that Hillary Clinton is even allowed to run for President in the first place. The Clintons grew their gross income by $60 million while Hillary was Secretary of State. She's not a diplomat - she's a grifter, always looking for an easy way to sell government favors and access in exchange for cash. The Clintons end up with the money, and America ends up with the humiliation. When I am President, America will be respected in the world once again. ... If you elect me, along with a Republican House and Senate, we will also immediately repeal the Obama-Clinton defense sequester and rebuild our badly depleted military. It's part of my 100-day plan. ... My proposal is based on three crucial words: Peace Through Strength. This defense build-up will be supported by ships in Mayport, and by engineers and advanced manufacturing on the Space Coast. New aircraft will fly from MacDill Air Force Base and Naval Air Station Pensacola. New Navy and Coast Guard ships will patrol the Florida coast to prevent drugs and terrorists from entering our shores. With a Republican House and Senate, we will also immediately repeal and replace the disaster known as Obamacare. ... A Republican House and Senate can swiftly enact the other items in my contract immediately, including massive tax reduction and tax simplification for the middle class. My plan to transform our tax, regulatory, energy and trade policies is the most pro-growth plan in American history. It includes lowering our business tax from 35 to 15 percent. My reform plan will lift millions out of poverty, raise wages dramatically, and create at least 25 million new jobs in 10 years - and we can enact the whole plan in our first 100 days. ... If companies want to fire their workers and leave for other countries, then we will charge them a 35% tax when they want to ship their products back into the United States. ... If I'm elected President I am going to keep Radical Islamic Terrorists out of our country. We will also stop the crisis of illegal immigration. A Trump Administration will secure and defend our borders. ... If I'm elected President I am going to keep Radical Islamic Terrorists out of our country. We will also stop the crisis of illegal immigration. A Trump Administration will secure and defend our borders. ... We will make America wealthy again. We will make America strong again. We will make America safe again. And we will make America great again." "Together, we are going to deliver real change that puts America First. That begins with immediately repealing and replacing Obamacare. My first day in office, I'm going to ask Congress to put a bill on my desk getting rid of this disastrous law and replacing it with reforms that expand choice, freedom and affordability. ... Insurers are leaving. Companies are fleeing. Doctors are quitting. It's an absolute disaster. Hillary Clinton wants to double-down on Obamacare and make it even worse. She wants to put the government totally in charge of your healthcare. Repealing Obamacare, and stopping Hillary's healthcare takeover, is one of the single most important reasons we must win on November 8th. ... Hillary bleached and deleted 33,000 emails, lied to Congress under oath, made 13 phones disappear - some with a hammer - and then told the FBI she couldn't remember 39 times. We have also just learned that one of the closest people to Hillary Clinton, with longstanding ties to her and her husband, gave more than $675,000 to the campaign of the spouse of a top FBI official who helped oversee the investigation into Mrs. Clinton's illegal email server. ... I've proposed a Contract With the American Voter that will give the government back to the people. My contract begins with a plan to end the rampant government corruption - and to put the special interests out of business. I want everyone in Washington to hear and to heed the words I am about to say. If we win on November 8th, We Are Going To Washington, D.C. And We Are Going To DRAIN THE SWAMP. Under my contract with the American Voter, we are proposing a series of ethics reforms on Day 1 to end government corruption. They include: --A constitutional amendment to impose term limits on all members of Congress --A Lifetime ban on government officials lobbying for a foreign government --A total ban on foreign lobbyists raising money for American elections Under my contract, I am also going to take a series of actions on Day One to protect American workers. We are living through the greatest jobs theft in the history of the world. Our nation has lost one-third of its manufacturing since NAFTA - a deal signed by Bill Clinton and supported strongly by Hillary Clinton. We've lost 70,000 factories since China entered the World Trade Organization - another Bill and Hillary Clinton-backed deal. My Contract includes the following: --We will renegotiate NAFTA, or withdraw from the deal to get a much better one for our workers. --We will withdraw from the Trans-Pacific Partnership, the deal Hillary Clinton called the "Gold Standard." Hillary's Wall Street donors want it, and she'd approve it if she ever got the chance - Tim Kaine has even left the door open to passing TPP by another name. --We will lift the restrictions on the production of American energy, including shale, oil, natural gas and clean coal. Hillary wants to shut down American energy, and put the miners out of work. We are going to put the miners and the steel workers back to work. Additionally, on the first day, I will take the following actions to restore the rule of law. These include: --I will cancel every illegal Obama executive order --Cancel all federal funding to Sanctuary Cities --Suspend immigration from regions compromised by Radical Islamic terrorism, including the suspension of the Syrian Refugee Program Next, I will work with Congress to introduce a series of legislative reforms and will fight for their passage in the first 100 days. This legislation includes: Middle Class Tax Relief And Simplification Act. A middle-class family with 2 children will get a 35% tax cut. End The Offshoring Act Establishes tariffs to stop companies from laying off their workers and relocating to other countries. American Infrastructure Act.$1 trillion in public-private infrastructure investment, this includes help for projects like expanding the Orlando-Sanford International Airport. ... Restoring National Security Act. Eliminates the Obama-Clinton defense sequester, rebuilds our military, and gives Veterans the right to seek private medical care. Under my plan, not only will we modernize our Navy's cruisers, but we will invest in the technologies of the future being developed in Central Florida. My plan also includes major investments in Florida for space exploration. ... To maximize the amount of investment and funding that is available. This means launching and operating major space assets that employ thousands, spur innovation, and fuel economic growth. I will free NASA from the restriction of serving primarily as a logistics agency for low-earth orbit activities. Instead, I will refocus its mission on space exploration. Under a Trump administration, Florida and America, will lead the way into the stars. ... We are going to have the biggest tax cut since Ronald Reagan; eliminate every unnecessary job-killing regulation; provide school choice and put an end to Common Core; support the men and women of law enforcement; save the 2nd amendment; and appoint Justices to the Supreme Court who will uphold and defend the Constitution of the United States. Republicans need to come together; this is our last chance. This is bigger than me, or any of us: it's about our country. This is about ending Obamacare. This about the Supreme Court. This is about rebuilding our military, taking care of our vets, strengthening our borders, and keeping our companies and jobs from leaving. This is about restoring the rule of law, saving our Constitution, and keeping Radical Islamic Terrorists out of our country." "I want to talk about how to grow the African-American middle class, and to provide a new deal for Black America. That deal is grounded in three promises: safe communities, great education, and high-paying jobs. My vision rests on a principle that has defined this campaign: America First. Every African-American citizen in this country is entitled to a government that puts their jobs, wages and security first. ... Our opponent represents the rigged system and failed thinking of yesterday. ... Hillary has been there for 30 years and hasn't fixed anything - she's just made it worse. American politics is caught in a time loop - we keep electing the same people, who keep making the same mistakes, and who keep offering the same excuses. ... African-American citizens have sacrificed so much for this nation. They have fought and died in every war since the Revolution, and from the pews and the picket lines they have lifted up the conscience of our country in the long march for Civil Rights. Yet, too many African-Americans have been left behind. ... The conditions in our inner cities today are unacceptable. The Democrats have run our inner cities for fifty, sixty, seventy years or more. They've run the school boards, the city councils, the mayor's offices, and the congressional seats. Their policies have failed, and they've failed miserably. They've trapped children in failing government schools, and opposed school choice at every turn. The Clintons gave us NAFTA and China's entry into the World Trade Organization, two deals that de-industrialized America, uprooted our industry, and stripped bare towns like Detroit and Baltimore and the inner cities of North Carolina. ... Democratic policies have also given rise to crippling crime and violence. Then there is the issue of taxation and regulation. Massive taxes, massive regulation of small business, and radical restrictions on American energy, have driven jobs and opportunities out of our inner cities. Hillary wants to raise taxes on successful small businesses as high as 45 percent - which will only drive more jobs out of your community, and into other countries. ... . No group has been more economically-harmed by decades of illegal immigration than low-income African-American workers. Hillary's pledge to enact "open borders," - made in secret to a foreign bank - would destroy the African-American middle class. At the center of my revitalization plan is the issue of trade. ... We won't let your jobs be stolen from you anymore. When we stop the offshoring to low-wage countries, we raise wages at home - meaning rent and bills become instantly more affordable. At the same time, my plan to lower the business tax from 35 percent to 15 percent will bring thousands of new companies onto our shores. It also includes a massive middle class tax cut, tax-free childcare savings accounts, and childcare tax deductions and credits. I will also propose tax holidays for inner-city investment, and new tax incentives to get foreign companies to relocate in blighted American neighborhoods. ... We will also encourage small-business creation by allowing social welfare workers to convert poverty assistance into repayable but forgive-able micro-loans. ... I will invest in training and funding both local and federal law enforcement operations to remove the gang members, drug dealers, and criminal cartels from our neighborhoods. The reduction of crime is not merely a goal - but a necessity. We will get it done. The war on police urged on by my rival is reckless, and dangerous, and puts African-American lives at risk. We must work with our police, not against them. On immigration, my policy is simple. I will restore the civil rights of African-Americans, Hispanic-Americans, and all Americans, by ending illegal immigration. I will reform visa rules to give American workers preference for jobs, and I will suspend reckless refugee admissions from terror-prone regions that cost taxpayers hundreds of billions of dollars. ... School choice is at the center of my plan. My proposal redirects education spending to allow every disadvantaged child in America to attend the public, private, charter, magnet, religious or home school of their choice. ... The cycle of poverty can be broken, and great new things can happen for our people. But to achieve this future, we must reject the failed elites in Washington who've been wrong about virtually everything for decades. ... Now is the time to embrace a New Direction. "Real change begins with immediately repealing and replacing Obamacare. It has just been announced that Americans are going to experience another massive double-digit hike in Obamacare premiums, including a 116% premium-hike in the great state of Arizona. ... Obamacare is a catastrophe for Ohio workers, and is making it impossible for many parents to pay their bills, support their families, or get quality medical care for their children. ... Real change also means getting rid of the corruption in Washington. Hillary bleached and deleted 33,000 emails, lied to Congress under oath, made 13 phones disappear - some with a hammer - and then told the FBI she couldn't remember 39 times. The Clinton crew gave more than $675,000 to the wife of the Deputy FBI Director overseeing the investigation into Hillary's illegal server - a server we now know Obama knew about. ... I've proposed a Contract With the American Voter that will end the corruption and give the government back the people. ... That includes a constitutional amendment to impose term limits on all members of Congress. At the core of my contract is my economic plan. That plan can be summarized in three very beautiful words: jobs, jobs, jobs. ... Under my contract, if a company wants to fire their workers, move to Mexico or other countries, and ship their products back into the U.S., we will put a 35% tariff on those products. We will also immediately begin renegotiating NAFTA - and if we don't get the deal we want, we will terminate NAFTA and get a much better deal for our workers and our companies. As part of our plan to bring back American jobs, we will lower taxes on our businesses from 35 percent down to 15 percent. We will also cut taxes for middle class families by 35 percent, and massively simplify taxes for all Americans. ... If I am elected President, I am going to keep Radical Islamic Terrorists out of our country. Hillary also said she wants totally "open borders." No one who supports open borders can ever serve as President of the United States. A Trump Administration will also secure and defend the borders of the United States. And yes, we will build a wall. ... I have a message for the cartels, the drug dealers and the gang members: when I win, your long reign of crime and terror will come crashing to an end. We will also repeal the Obama-Clinton defense cuts and rebuild our badly depleted military. Our Air Force is the smallest and oldest it has ever been. We will build brand new, modern, state-of-the-art planes to fly out of Wright-Patterson Air Force Base. We have a lot of Veterans here today -- I want to thank them for their service, and let them know as President, I will have their back. ... We are going to have the biggest tax cut since Ronald Reagan; eliminate every unnecessary job-killing regulation; provide school choice and put an end to Common Core; rebuild our military and take care of our Vets; support the men and women of law enforcement; save the 2nd amendment; and appoint Justices to the Supreme Court who will uphold and defend the Constitution of the United States. ... Just imagine what our country could accomplish if we started working together as One People, under One God, saluting One American Flag. Once again, we will have a government of, by and for the people." "My Contract With the American Voter outlines a plan to repeal and replace Obamacare, and I'm asking for your vote so we can save healthcare for every family in Arizona. Real change also means getting rid of the corruption in Washington. As you have heard, it was just announced yesterday that the FBI is reopening their investigation into the criminal and illegal conduct of Hillary Clinton. ... To cover-up her crimes, she bleached and deleted 33,000 emails after receiving a Congressional Subpoena, made 13 phones disappear - some with a hammer, lied to Congress under oath, lied to the FBI many times, and then two boxes of email evidence went mysteriously missing. The WikiLeaks revelations have exposed criminal corruption at the highest levels of our government. Hillary put the office of Secretary of State up for sale and, if she ever got the chance, she'd put the Oval Office up for sale too ... A vote for Hillary is a vote to surrender our government to public corruption, graft and cronyism that threatens the survival of our Constitutional system itself. What makes us exceptional is that we are a nation of laws, and that we are all equal under those laws - Hillary's corruption shreds the principle on which our nation was founded. ... Restoring honesty to our government, and the rule of law to our society, will be my highest priority as President. ... My Contract With The American Voter begins with a plan to end government corruption. ... At the core of my contract is my plan to bring back our jobs. .... A Trump Administration will stop TPP, renegotiate NAFTA, and we are going to stand up to China on Currency Manipulation. We are going to lower taxes on American business from 35 percent to 15 percent. We are going to cut taxes for Middle Class families by hundreds of billions of dollars. My infrastructure plan will provide help for projects like the proposed Interstate-11, which would connect Phoenix with Las Vegas and other areas. We will also unleash the full power of American energy including shale, oil, natural gas and clean coal. ... When I become President, this crime wave ends. We are going to cancel all federal funding for Sanctuary Cities. We are going to impose tough prison sentences for illegal immigrants who return after a previous deportation. We will end illegal immigration, deport every last criminal alien, and save American lives. We will also repeal the Obama-Clinton defense sequester and rebuild our badly depleted military. ... We are going to have the biggest tax cut since Ronald Reagan; eliminate every unnecessary job-killing regulation; cancel every illegal Obama executive order; stop the massive inflow of refugees and keep Radical Islamic Terrorist out of our country; rebuild our military and take care of our Vets; reduce surging crime and support the men and women of law enforcement; provide school choice and put an end to Common Core; save the 2nd amendment; and appoint Justices to the Supreme Court who will uphold and defend the Constitution of the United States." "Real change also means getting rid of the corruption in Washington. As you have heard, it was just announced yesterday that the FBI is reopening their investigation into the criminal and illegal conduct of Hillary Clinton. ... Hillary set-up an illegal server for the obvious purpose of shielding her criminal conduct from public disclosure. She set-up this illegal server knowing full well that her actions put our national security at risk, and put the safety and security of your children at risk. To further cover-up her crimes, she bleached and deleted 33,000 emails after a Congressional Subpoena, made 13 phones disappear - some with a hammer, lied to Congress under oath, lied to the FBI, and then two boxes of email evidence went mysteriously missing. She even pretended not to know that the letter "C" meant confidential information that was classified. ... The WikiLeaks revelations have revealed a degree of corruption at the highest levels of our government like nothing we've ever seen before ... Hillary put the office of Secretary of State up for sale and, if she ever got the chance, she'd put the Oval Office up for sale too. ... And now it's reported that the Department of Justice is fighting the FBI - that's because the Department of Justice is trying to protect her. 97% of Department of Justice employees' presidential contributions went to Hillary Clinton. There are those, and I happen to be one of them, who think Hillary offered Lorretta Lynch a reappointment as Attorney General. ... A vote for Hillary is vote a to surrender our government to public corruption, graft and cronyism that threatens the very foundations of our Constitutional system. What makes us exceptional is that we are a nation of laws, and that we are all equal under those laws - Hillary's corruption shreds that foundational principle. Public Corruption is a grave and profound threat to a Democracy. ... Hillary believes money and power - not truth and justice - should rule the day. ... Restoring honesty to our government, and the rule of law to our society, will be my highest priority as President. We must save the America. That is why My Contract With The American Voter begins with a plan to end government corruption and take our country back from the special interests." "My Contract With the American Voter outlines a plan to repeal and replace Obamacare, and I'm asking for your vote so we can save healthcare for every family in Nevada. Real change also means getting rid of the corruption in Washington. As you have heard, it was just announced on Friday that the FBI is reopening their investigation into the criminal and illegal conduct of Hillary Clinton. ... Hillary set-up an illegal server for the obvious purpose of shielding her criminal conduct from public disclosure - and exposure. She set-up this illegal server knowing full well that her actions put our national security at risk, and put the safety and security of your children at risk. To cover-up her crimes, she bleached and deleted 33,000 emails after receiving a Congressional Subpoena, made 13 phones disappear - some with a hammer, lied to Congress under oath, lied to the FBI many times, and then two boxes of email evidence went mysteriously missing. The WikiLeaks revelations have exposed criminal corruption at the highest levels of our government. Hillary put the office of Secretary of State up for sale and, if she ever got the chance, she'd put the Oval Office up for sale too. ... My Contract With The American Voter begins with a plan to end government corruption. ... At the core of my contract is my plan to bring back our jobs. ... A Trump Administration will stop the Trans-Pacific Partnership, renegotiate NAFTA, and we are going to stand up to China on Currency Manipulation. We are going to lower taxes on American business from 35 percent to 15 percent. We are going to cut taxes for Middle Class families by hundreds of billions of dollars. ... We'll support local police and federal law enforcement in an effort to aggressively reduce surging crime. So far this year in Las Vegas, homicides have increased 27% over last year. We will also keep our nation safe from terrorism. Hillary wants a 550% increase in Syrian Refugees. When I'm elected President, we will suspend the Syrian Refugee Program - and we will keep Radical Islamic Terrorists out of our country. A Trump Administration will also secure and defend the borders of the United States. And yes, we will build a wall. ... We will end illegal immigration, deport all criminal illegal immigrants, and save American lives. We will also repeal the Obama-Clinton defense sequester and rebuild our badly depleted military. ... We are going to have the biggest tax cut since Ronald Reagan; eliminate every unnecessary job-killing regulation; cancel every illegal Obama executive order; rebuild our military and take care of our Vets; reduce surging crime and support the incredible men and women of law enforcement; provide school choice and put an end to Common Core; save the 2nd amendment; and appoint Justices to the Supreme Court who will uphold and defend the Constitution of the United States. ... We are fighting to unlock the potential of every American community, and every American family, who hope and pray and yearn for a better future. Washington, D.C. wants you to think small. But I am asking you to Dream Big." "When I win on November 8th, I am going to bring back your jobs. The long nightmare of jobs leaving Michigan will be coming to an end. We will make Michigan the economic envy of the world once again. ... But to bring back your jobs, we must also immediately repeal and replace Obamacare. It's just been announced that Michigan residents are going to experience crushing double-digit premium hikes. ... Real change also means restoring honesty to our government. As you know, the FBI has reopened its investigation into Hillary Clinton and has discovered another 650,000 emails. Hillary lied under oath when she said she turned over all of her work-related emails - just one more lie out of so many. ... Hillary is the one who set-up an illegal private email server in a closet to shield her criminal activity. Hillary is the one who engaged in a corrupt Pay-For-Play scheme at the State Department - and now there are 5 FBI probes into the Clinton Foundation and their pay-for-play activities. Hillary is the one who sent and received classified information on an insecure server, putting the safety of the American people under threat. Hillary is the one who lied to Congress under oath. Hillary is the one who lied to the FBI. Hillary is the one who made 13 phones disappear. Hillary is the one who destroyed 33,000 emails. ... My Contract With The American Voter begins with a plan to restore honesty and accountability to our government. ... At the core of my contract is my plan to bring back your jobs. Michigan has lost more than 1 in 4 manufacturing jobs since Bill Clinton signed NAFTA, a deal strongly supported by Hillary Clinton. Before NAFTA went into effect, there were 280,000 auto workers in Michigan. Today, that number is only 160,000. Our country has lost 70,000 factories since China entered the World Trade Organization - another Bill and Hillary-backed deal. ... I've outlined a plan for urban renewal, it's called A New Deal For Black America. That deal includes a plan to use the money we will save by securing our border, and curbing refugee admissions, to invest in communities like Flint and Detroit. It includes a pledge of school choice for African-American children. My plan also includes a promise to cancel billions in climate change spending for the United Nations - a number Hillary wants to increase - and instead use that money to provide for American infrastructure, including clean water in cities like Flint. My plan also includes a pledge to reduce violent crime - every child in this nation has a right to grow up in safety and peace. And my plan includes a pledge to restore manufacturing in the United States. ... A Trump Administration will stop the Trans-Pacific Partnership. We will renegotiate NAFTA and, if we don't get the deal we want, we will terminate NAFTA and get a much better deal. We are going to lower taxes on American business from 35 percent to 15 percent. We are going to massively cut taxes for the Middle Class. We will unleash American energy - including shale, oil, natural gas and clean coal. The Obama-Clinton war on coal is going to cost this state 50,000 jobs. ... When we win, we will suspend the Syrian Refugee Program - and we will keep Radical Islamic Terrorists out of our country. A Trump Administration will also secure and defend the borders of the United States. And yes, we will build a wall. ... We are going to have the biggest tax cut since Ronald Reagan; eliminate every unnecessary job-killing regulation; cancel every illegal Obama executive order; rebuild our military and take care of our Vets; support the men and women of law enforcement; save the 2nd amendment; and appoint Justices to the Supreme Court who will uphold and defend the Constitution of the United States. ... We are fighting to unlock the potential of every American community, and every American family, who hope and pray and yearn for a better future."
    #This program analyzes two corpus that contains: *two of Hilary Clinton's speeches named "Remarks in Miami on the Cuba embargo" and 
    "Remarks at the Brookings Institution on the Iran deal", both dated Jan 31, 2016; *ten of Donald Trump's speeches, from October 21, 2016 to October 31, 2016.
    # -*- coding: utf-8 -*- import sys import codecs import nltk import math import collections from nltk import word_tokenize, sent_tokenize from nltk.probability import * #questa funzione restituisce il testo tokenizzato def TokenizzaTesto(frasi): tokensTOT=[] for frase in frasi: #divido la frase in token tokens = nltk.word_tokenize(frase) #prende in input una stringa e restituisce una lista di stringhe #concateno la frase appena tokenizzata con le tokenizzazioni precedenti tokensTOT = tokensTOT+tokens #questa variabile contiene tutto il file tokenizzato return tokensTOT #questa funzione calcola il numero di token di un testo def CalcolaLunghezza(frasi): lunghezzaTOT=0.0 for frase in frasi: #divido la frase in token tokens=nltk.word_tokenize(frase) #calcolo la lunghezza totale lunghezzaTOT=lunghezzaTOT+len(tokens) #restituisco il risultato return lunghezzaTOT #questa funzione calcola la lunghezza media delle frasi si un testo def LunghezzaMediaFrasi(frasi): lunghezzaFrasi=0.0 numFrasi=0.0 for frase in frasi: #divido la frase in token tokens=nltk.word_tokenize(frase) #calcolo la lunghezza di ciascuna frase e la sommo con la lunghezza delle altre ottenendo la lunghezza del testo lunghezzaFrasi=lunghezzaFrasi+len(tokens) #il contatore registra il numero delle frasi osservate con lo scorrimento del ciclo for numFrasi=numFrasi+1 #restituisco la media matematica della lunghezza di ciascuna frase return lunghezzaFrasi/numFrasi #lista delle porzioni incrementali dei corpora listaPorzioni=[1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000] #questa funzione calcola la grandezza del vocabolario per porzioni incrementali di 1000 token def GrandezzaVocabolario(testoTokenizzato): listaLunghezze=[] for porzione in listaPorzioni: #estraggo il vocabolario della porzione di corpus corrispondente Vocab=set(testoTokenizzato[:int(porzione)]) #rendo intero il valore perche' e' l'estremo di un intervallo #calcolo la lunghezza del vocabolario lunghezzaVocab=len(Vocab) #aggiungo ogni valore ottenuto ad una lista listaLunghezze.append(lunghezzaVocab) #restituisco la lista contenente tutte le lunghezze corrispondenti a ciascuna porzione di corpus return listaLunghezze #questa funzione calcola la ricchezza lessicale per porzioni incrementali di 1000 token def CalcolaTTR(testoTokenizzato): listaTTR=[] for porzione in listaPorzioni: #estraggo il vocabolario della porzione di corpus corrispondente Vocab=set(testoTokenizzato[:int(porzione)]) #rendo intero il valore perche' e' l'estremo di un intervallo #calcolo la lunghezza del vocabolario lunghezzaVocab=len(Vocab) #calcolo la Type-Token-Ratio come rapporto tra la cardinalita' del vocabolario (num. di tipi) per la cardinalita' del corpus (num. di token) TTR=lunghezzaVocab*1.0/porzione*1.0 #aggiungo ogni valore ottenuto ad una lista listaTTR.append(TTR) #restituisco la lista contenente tutte le TTR corrispondenti a ciascuna porzione di corpus return listaTTR #questa funzione calcola il rapporto tra sostantivi e verbi osservati nel corpus def RapportoSostantiviVerbi(POStag): #lista dei tag usati nel PoS tagging (Penn Treebank), da http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html tagSostantivi = ["NN", "NNS", "NNP", "NNPS"] tagVerbi = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"] #per ogni coppia (token, PoS) memorizzo in due liste tutte le pos rispettivamente di sostantivi e di verbi osservate nel corpus listaTagSostantivi=[] listaTagVerbi=[] for (tok, pos) in POStag: if pos in tagSostantivi: listaTagSostantivi.append(pos) if pos in tagVerbi: listaTagVerbi.append(pos) #conto quante volte occorre ciascuna pos osservata nel corpus occorrenzeSostantivi=collections.Counter(listaTagSostantivi) occorrenzeVerbi=collections.Counter(listaTagVerbi) #faccio la somma totale di tutti i conteggi sostantivi=sum(occorrenzeSostantivi.values()) verbi=sum(occorrenzeVerbi.values()) #calcolo e restituisco il rapporto return sostantivi*1.0/verbi*1.0 #questa funzione calcola la densita' lessicale def DensitaLessicale(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) lunghezzaTesto=CalcolaLunghezza(testoTokenizzato) #lista dei tag usati nel PoS tagging (Penn Treebank), da http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html tagSostantivi = ["NN", "NNS", "NNP", "NNPS"] tagVerbi = ["VB", "VBD", "VBG", "VBN", "VBP", "VBZ"] tagAvverbiEAggettivi=["JJ", "JJR", "JJS", "RB", "RBR", "RBS", "WRB"] tagPunteggiatura=[".", ","] #memorizzo nelle liste tutte le pos osservate nel corpus listaTagSostantivi=[] listaTagVerbi=[] listaTagAvverbiEAggettivi=[] listaTagPunteggiatura=[] for (tok, pos) in POStag: if pos in tagSostantivi: listaTagSostantivi.append(pos) if pos in tagVerbi: listaTagVerbi.append(pos) if pos in tagAvverbiEAggettivi: listaTagAvverbiEAggettivi.append(pos) if pos in tagPunteggiatura: listaTagPunteggiatura.append(pos) #conto quante volte occorre ciascuna pos osservata nel corpus occorrenzeSostantivi=collections.Counter(listaTagSostantivi) occorrenzeVerbi=collections.Counter(listaTagVerbi) occorrenzeAvverbiEAggettivi=collections.Counter(listaTagAvverbiEAggettivi) occorrenzePunteggiatura=collections.Counter(listaTagPunteggiatura) #faccio la somma totale di tutti i conteggi sostantivi=sum(occorrenzeSostantivi.values()) verbi=sum(occorrenzeVerbi.values()) avverbiEAggettivi=sum(occorrenzeAvverbiEAggettivi.values()) punteggiatura=sum(occorrenzePunteggiatura.values()) #restituisco il risultato del calcolo return (sostantivi + verbi + avverbiEAggettivi) / (lunghezzaTesto - punteggiatura) #questa funzione restituisce due liste contenenti i token dei due corpora e li confronta in base alla lunghezza def main(file1, file2): fileInput1 = codecs.open(file1, "r", "utf-8") fileInput2 = codecs.open(file2, "r", "utf-8") var1 = fileInput1.read() var2 = fileInput2.read() #carico il modello statistico utilizzato dalla funzione tokenize sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') #divido i testi dei due file in frasi frasi1 = sent_tokenizer.tokenize(var1) frasi2 = sent_tokenizer.tokenize(var2) #tokenizzo il file1 testoTokenizzato1=TokenizzaTesto(frasi1) #tokenizzo il file2 testoTokenizzato2=TokenizzaTesto(frasi2) #calcolo la lunghezza dei due testi lunghezzaFile1 = CalcolaLunghezza(frasi1) lunghezzaFile2 = CalcolaLunghezza(frasi2) print "\n\nPROGRAMMA #1\n" #confronto i testi sulla base del numero di token e stampo i risultati print "\nIl file", file1, "e' lungo", lunghezzaFile1, "token" print "Il file", file2, "e' lungo", lunghezzaFile2, "token" if (lunghezzaFile1>lunghezzaFile2): print file1, "e' piu' lungo di", file2 elif (lunghezzaFile1<lunghezzaFile2): print file2, "e' piu' lungo di", file1 else: print "i due file hanno la stessa lunghezza" #calcolo la lunghezza media delle frasi nei due testi lunghezzaMedia1 = LunghezzaMediaFrasi(frasi1) lunghezzaMedia2 = LunghezzaMediaFrasi(frasi2) #confronto i testi sulla base della lunghezza media delle frasi e stampo i risultati print "\nLe frasi del file", file1, "hanno lunghezza media pari a", lunghezzaMedia1, "token" print "Le frasi del file", file2, "hanno lunghezza media pari a", lunghezzaMedia2, "token" #calcolo le cardinalita' dei vocabolari dei due file lunghezzaVocabolario1=len(set(testoTokenizzato1)) lunghezzaVocabolario2=len(set(testoTokenizzato2)) #confronto i testi sulla base del vocabolario print "\nIl vocabolario del file", file1, "ha", lunghezzaVocabolario1, "tipi" print "Il vocabolario del file", file2, "ha", lunghezzaVocabolario2, "tipi" if (lunghezzaVocabolario1>lunghezzaVocabolario2): print file1, "ha un vocabolario piu' ricco di", file2 elif (lunghezzaVocabolario2>lunghezzaVocabolario2): print file2, "ha un vocabiolario piu' ricco di", file1 else: print "i vocabolari dei due file hanno la stessa lunghezza" #restituisco la lunghezza dei vocabolari dei due file per porzioni incrementali dei due corpora vocabolario1 = GrandezzaVocabolario(testoTokenizzato1) vocabolario2 = GrandezzaVocabolario(testoTokenizzato2) print "\nGrandezza del vocabolario del file", file1, "all'aumento del corpus per porzioni incrementali di 1000 token:" for voc1 in vocabolario1: print voc1 print "\nGrandezza del vocabolario del file", file2, "all'aumento del corpus per porzioni incrementali di 1000 token:" for voc2 in vocabolario2: print voc2 print "\nNotiamo dal risultato che il vocabolario tende a crescere sempre piu' lentamente." #restituisco la ricchezza lessicale dei due file per porzioni incrementali dei due corpora TTR1= CalcolaTTR(testoTokenizzato1) TTR2= CalcolaTTR(testoTokenizzato2) print "\nRicchezza lessicale del file", file1, "all'aumento del corpus per porzioni incrementali di 1000 token:" for t1 in TTR1: print t1 print "\nRicchezza lessicale del file", file2, "all'aumento del corpus per porzioni incrementali di 1000 token:" for t2 in TTR2: print t2 print "Piu' e' alto il valore della Type-Token-Ratio, maggiore e' la ricchezza del vocabolario" #memorizzo su due variabili i due testi annotati sulla base delle Part-Of-Speech POStag1=nltk.pos_tag(testoTokenizzato1) POStag2=nltk.pos_tag(testoTokenizzato2) #calcolo il rapporto sostantivi/verbi nei due testi rapportoSV1=RapportoSostantiviVerbi(POStag1) rapportoSV2=RapportoSostantiviVerbi(POStag2) print "\nIl rapporto sostantivi/verbi nel file", file1, "e': ", rapportoSV1 print "Il rapporto sostantivi/verbi nel file", file2, "e': ", rapportoSV2 #calcolo la densita' lessicale dei due testi densita1=DensitaLessicale(testoTokenizzato1) densita2=DensitaLessicale(testoTokenizzato2) print "\nLa densita' lessicale del file", file1, "e': ", densita1 print "La densita' lessicale del file", file2, "e': ", densita2 main(sys.argv[1], sys.argv[2])
    #This program analyzes two corpus that contains: *two of Hilary Clinton's speeches named "Remarks in Miami on the Cuba embargo" 
    and "Remarks at the Brookings Institution on the Iran deal", both dated Jan 31, 2016; *ten of Donald Trump's speeches,
    from October 21, 2016 to October 31, 2016.
    # -*- coding: utf-8 -*- import re import sys import codecs import nltk import math import collections from nltk import word_tokenize, sent_tokenize from nltk.probability import * #questa funzione restituisce il testo tokenizzato def TokenizzaTesto(frasi): tokensTOT=[] for frase in frasi: #divido la frase in token tokens = nltk.word_tokenize(frase) #prende in input una stringa e restituisce una lista di stringhe #concateno la frase appena tokenizzata con le tokenizzazioni precedenti tokensTOT = tokensTOT+tokens #questa variabile contiene tutto il file tokenizzato return tokensTOT #questa funzione estrae le POS piu' frequenti nel corpus def POSFrequenti(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) #creo una lista con tutte le POS osservate nel corpus listaTag=[] for (tok, pos) in POStag: listaTag.append(pos) #conto quante volte occorre ciascuna pos osservata nel corpus frequenze=collections.Counter(listaTag) #restituisco le prime 10 piu' frequenti POSPiuFrequenti=frequenze.most_common(10) return POSPiuFrequenti #questa funzione estrae i token piu' frequenti nel corpus def TokenFrequenti(testoTokenizzato): SegniPunteggiatura=[".", ",", ":", ";", "!", "?", "...", "-"] #token da escludere #estraggo il testo senza punteggiatura testoSenzaPunteggiatura=[] for token in testoTokenizzato: if token not in SegniPunteggiatura: testoSenzaPunteggiatura.append(token) #calcolo la frequenza di ogni token del testo ottenuto freqToken=collections.Counter(testoSenzaPunteggiatura) #restituisco i primi 20 in ordine di frequenza decrescente tokenPiuFrequenti=freqToken.most_common(20) return tokenPiuFrequenti #questa funzione estrae i bigrammi piu' frequenti nel corpus def BigrammiFrequenti(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) bigrammi=nltk.bigrams(POStag) #lista dei tag per la classificazione di articoli, congiunzioni e segni di punteggiatura usati nel PoS tagging (Penn Treebank) #da http://www.clips.ua.ac.be/pages/mbsp-tags tagArticoliCongiunzioniEPunteggiatura=["CC", "DT", "IN", "SYM", ".", ",", ":", "(", ")", "'", "''"] #POS da escludere #creo una lista di tutti i bigrammi le cui POS corrispondenti NON sono tra quelle da escludere listaBigrammi=[] for ((tok1, pos1), (tok2, pos2)) in bigrammi: if pos1 not in tagArticoliCongiunzioniEPunteggiatura: if pos2 not in tagArticoliCongiunzioniEPunteggiatura: bigramma= (tok1, tok2) listaBigrammi.append(bigramma) #calcolo la frequenza di ciascun bigramma freqBigrammi=collections.Counter(listaBigrammi) #restituisco i primi 20 in ordine di frequenza decrescente bigrammiPiuFrequenti=freqBigrammi.most_common(20) return bigrammiPiuFrequenti #questa funzione estrae i trigrammi piu' frequenti nel corpus def TrigrammiFrequenti(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) trigrammi=nltk.trigrams(POStag) #lista dei tag per la classificazione di articoli, congiunzioni e segni di punteggiatura usati nel PoS tagging (Penn Treebank) #da http://www.clips.ua.ac.be/pages/mbsp-tags tagArticoliCongiunzioniEPunteggiatura=["CC", "DT", "IN", "SYM", ".", ",", ":", "(", ")", "'", "''"] #POS da escludere #creo una lista di tutti i trigrammi le cui POS corrispondenti NON sono tra quelle da escludere listaTrigrammi=[] for ((tok1, pos1), (tok2, pos2), (tok3, pos3)) in trigrammi: if pos1 not in tagArticoliCongiunzioniEPunteggiatura: if pos2 not in tagArticoliCongiunzioniEPunteggiatura: if pos3 not in tagArticoliCongiunzioniEPunteggiatura: trigramma= (tok1, tok2, tok3) listaTrigrammi.append(trigramma) #calcolo la frequenza di ciascun trigramma freqTrigrammi=collections.Counter(listaTrigrammi) #restituisco i primi 20 in ordine di frequenza decrescente trigrammiPiuFrequenti=freqTrigrammi.most_common(20) return trigrammiPiuFrequenti #questa funzione estrae i bigrammi aggettivo-sostantivo in cui ogni token ha frequenza maggiore di 2 def BigrammiAggettivoSostantivo(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) #per ogni token calcolo quante volte occorre listaFreqToken=collections.Counter(testoTokenizzato) #creo una lista che contiene solo i token con frequenza maggiore di due listaToken=[] for tok in listaFreqToken: if listaFreqToken.values()>2: listaToken.append(tok) #estraggo i bigrammi bigrammi=nltk.bigrams(POStag) listaBigrammi=[] #lista dei tag per la classificazione di aggettivi e sostantivi usati nel PoS tagging (Penn Treebank) #da http://www.clips.ua.ac.be/pages/mbsp-tags tagAggettivi = ["JJ", "JJR", "JJS"] tagSostantivi = ["NN", "NNS", "NNP", "NNPS"] for ((tok1, pos1), (tok2, pos2)) in bigrammi: #se le POS sono rispettivamente un aggettivo e un nome if (pos1 in tagAggettivi) and (pos2 in tagSostantivi): #se i token dei bigrammi appartengono alla lista dei token con frequenza maggiore di 2 if (tok1 in listaToken) and (tok2 in listaToken): bigramma= (tok1, tok2) #creo una lista di bigrammi che rispettano le condizioni, eliminando le POS che non mi servono per i calcoli successivi listaBigrammi.append(bigramma) return listaBigrammi #questa funzione calcola la frequenza del primo elemento del bigramma (ovvero degli aggettivi) in tutto il corpus def CalcolaFreqAggettivi(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) tagAggettivi = ["JJ", "JJR", "JJS"] #creo una lista con tutti gli aggettivi osservati nel corpus listaAggettivi=[] for (token, POS) in POStag: if POS in tagAggettivi: listaAggettivi.append(token) #calcolo la frequenza per ciascun aggettivo osservato freqAggettivo=collections.Counter(listaAggettivi) #trasformo l'oggetto Counter in una lista freqAggettivi=freqAggettivo.items() #contiene le frequenze degli aggettivi nel corpus return freqAggettivi #questa funzione calcola la frequenza del secondo elemento del bigramma (ovvero dei sostantivi) in tutto il corpus def CalcolaFreqSostantivi(testoTokenizzato): POStag=nltk.pos_tag(testoTokenizzato) tagSostantivi = ["NN", "NNS", "NNP", "NNPS"] #creo una lista con tutti i sostantivi osservati nel corpus listaSostantivi=[] for (token, POS) in POStag: if POS in tagSostantivi: listaSostantivi.append(token) #calcolo la frequenza per ciascun sostantivo osservato freqSostantivo=collections.Counter(listaSostantivi) #trasformo l'oggetto Counter in una lista freqSostantivi=freqSostantivo.items() #contiene le frequenze dei sostantivi nel corpus return freqSostantivi #questa funzione calcola la prob condizionata di bigrammi agg-sost def CalcoloProbCondizionata(testoTokenizzato): #estraggo i bigrammi che rispettano le condizioni listaBigrammi=BigrammiAggettivoSostantivo(testoTokenizzato) #per ogni bigramma agg-sost calcolo la sua frequenza freqBigramma=collections.Counter(listaBigrammi) #trasformo l'oggetto Counter in una lista freqBigrammi=freqBigramma.items() #contiene le frequenze dei bigrammi agg-sost nel corpus #calcolo la frequenza degli aggettivi freqAggettivi=CalcolaFreqAggettivi(testoTokenizzato) #creo una lista con i bigrammi e le relative probabilita' condizionate lista=[] for (agg, freqA) in freqAggettivi: for ((elem1, elem2), freqB) in freqBigrammi: if agg==elem1: #accoppio gli aggettivi con il bigramma in cui compaiono #calcolo la probabilita' condizionata probCondizionata=freqB*1.0 / freqA*1.0 elemento= (elem1, elem2), probCondizionata lista.append(elemento) return lista #questa funzione estrae i 20 bigrammi agg-sost con probabilita' condizionata massima def BigrammiProbCondizionata(testoTokenizzato): #dalla funzione precedente estraggo una lista con tutte le prob condizionate per ciascun bigramma listaProbCondiz=CalcoloProbCondizionata(testoTokenizzato) #ordino la lista per probabilita' decrescenti listaOrdinata= sorted(listaProbCondiz, key = lambda a: -a[1], reverse=False) #restituisco i primi 20 bigrammi return listaOrdinata[:20] #questa funzione calcola la prob congiunta di bigrammi agg-sost def CalcoloProbCongiunta(testoTokenizzato): #Per calcolare la probabilita' congiunta e' necessario usare la probabilita' condizionata #estraggo una lista con tutte le prob condizionate per ciascun bigramma listaProbCondiz=CalcoloProbCondizionata(testoTokenizzato) #calcolo la frequenza degli aggettivi freqAggettivi=CalcolaFreqAggettivi(testoTokenizzato) lista=[] for (agg, freqA) in freqAggettivi: for ((elem1, elem2), probCondiz) in listaProbCondiz: if agg==elem1: #accoppio gli aggettivi con il bigramma in cui compaiono #calcolo la probabilita' congiunta probCongiunta=probCondiz*(freqA*1.0/len(testoTokenizzato)*1.0) elemento= (elem1, elem2), probCongiunta lista.append(elemento) return lista #questa funzione estrae i 20 bigrammi agg-sost con probabilita' congiunta massima def BigrammiProbCongiunta(testoTokenizzato): #dalla funzione precedente estraggo una lista con tutte le prob congiunte per ciascun bigramma listaProbCong=CalcoloProbCongiunta(testoTokenizzato) #ordino la lista per probabilita' decrescenti listaOrdinata= sorted(listaProbCong, key = lambda a: -a[1], reverse=False) #restituisco i primi 20 bigrammi return listaOrdinata[:20] #questa funzione applica la formula della probabilita' (definizione frequentista) def CalcoloProbabilita(frequenza, testoTokenizzato): probabilita=frequenza*1.0/len(testoTokenizzato)*1.0 return probabilita #questa funzione calcola il valore della Local Mutual Information per ciascun bigramma def CalcoloLMI(testoTokenizzato): #calcolo la frequenza degli aggettivi freqAggettivi=CalcolaFreqAggettivi(testoTokenizzato) #calcolo la frequenza dei sostantivi freqSostantivi=CalcolaFreqSostantivi(testoTokenizzato) #estraggo i bigrammi che rispettano le condizioni listaBigrammi=BigrammiAggettivoSostantivo(testoTokenizzato) #per ogni bigramma agg-sost calcolo la sua frequenza freqBigramma=collections.Counter(listaBigrammi) #trasformo l'oggetto Counter in una lista freqBigrammi=freqBigramma.items() #contiene le frequenze dei bigrammi agg-sost nel corpus lista=[] for ((elem1, elem2), freqB) in freqBigrammi: for (agg, freqA) in freqAggettivi: for (sost, freqS) in freqSostantivi: #per i bigrammi agg-sost calcolo sia la prob del bigramma che quella dei singoli elementi che lo compognono if (elem1==agg) and (elem2==sost): probabilitaAgg=CalcoloProbabilita(freqA, testoTokenizzato) probabilitaSost=CalcoloProbabilita(freqS, testoTokenizzato) probabilitaBigr=CalcoloProbabilita(freqB, testoTokenizzato) #calcolo la MI memorizzando i valori intermedi del calcolo in una variabile var=probabilitaBigr*1.0/(probabilitaAgg*probabilitaSost)*1.0 MI=math.log(var, 2) #dalla MI ricavo la LMI LMI=freqB*MI elemento= (elem1, elem2), LMI lista.append(elemento) return lista #questa funzione estrae i 20 bigrammi agg-sost con forza associativa massima def BigrammiLMI(testoTokenizzato): #dalla funzione precedente estraggo una lista con tutti i valori di LMI per ciascun bigramma listaBigrammiLMI=CalcoloLMI(testoTokenizzato) #ordino la lista per LMI decrescenti listaOrdinata= sorted(listaBigrammiLMI, key = lambda a: -a[1], reverse=False) #restituisco i primi 20 bigrammi return listaOrdinata[:20] #questa funzione estrae la frase con probabilita' massima, calcolata con un modello markoviano di ordine 0 def MarkovOrdine0(testoTokenizzato, frasi): lunghezzaCorpus=len(testoTokenizzato) #tokenizzo frase per frase listaFrasiTokenizzate=[] for frase in frasi: fraseTokenizzata=nltk.word_tokenize(frase) listaFrasiTokenizzate.append(fraseTokenizzata) #calcolo la freqenza di ciascun token osservato nella frase freqToken=collections.Counter(fraseTokenizzata) prob=1.0 probMassima=0.0 #calcolo la probabilita' di ciascun token for fraseTokenizzata in listaFrasiTokenizzate: for tok in fraseTokenizzata: probToken=CalcoloProbabilita(freqToken[tok], fraseTokenizzata) #nel modello di ordine 0 la probabilita' della frase equivale al prodotto delle probabilita' dei singoli token prob=prob*probToken #estraggo la probabilita' massima if prob > probMassima: probMassima=prob return fraseTokenizzata, "-------probabilita':", probMassima #questa funzione estrae la frase con probabilita' massima, calcolata con un modello markoviano di ordine 1 def MarkovOrdine1(testoTokenizzato, frasi): lunghezzaCorpus=len(testoTokenizzato) freqToken=collections.Counter(testoTokenizzato) #lista di frequenze di ogni token nel testo bigrammi=nltk.bigrams(testoTokenizzato) #estraggo i bigrammi freqBigrammi=collections.Counter(bigrammi) #lista di frequenze di ogni bigramma nel testo probMassima=0.0 fraseProbMax="" for frase in frasi: i=0.0 #per ogni frase la tokenizzo ed estraggo i bigrammi tokens=nltk.word_tokenize(frase) frasiBigrammi=nltk.bigrams(tokens) #considero solo le frasi lunghe almeno 10 token e i token con frequenza maggiore di 2 if len(tokens)>=10: for tok in tokens: if freqToken[tok]<2: i=i+1 if i==len(tokens): #quando ho letto tutta la frase #inizio calcolando la probabilita' semplice del primo token della frase probIntermedia=freqToken[tokens[0]]*1.0/lunghezzaCorpus*1.0 for bigramma in frasiBigrammi: #nel modello di ordine 1 bisogna calcolare, per ogni bigramma, la prob condizionata(del secondo token del bigramma dato il primo) probBigr=freqBigrammi[bigramma]*1.0/freqToken[bigramma[0]]*1.0 #faccio il prodotto di ogni valore ottenuto probIntermedia=probIntermedia*probBigr #confronto la probabilita' della frase (ottenuta dal prodotto) con la prob massima if probIntermedia>probMassima: probMassima=probIntermedia fraseProbMax=frase #restituisco la frase con la relativa probabilita' massima return fraseProbMax, "-------probabilita':", probMassima #questa funzione restituisce i 20 nomi di persona piu' frequenti nel corpus def NomiPropriPersona(testoTokenizzato): tutteLeNE=[] listaPersone=[] listaLuoghi=[] tokensPOS=nltk.pos_tag(testoTokenizzato) #lista di bigrammi (token, POS) analisi=nltk.ne_chunk(tokensPOS) #rappresentazione ad albero IOBformat=nltk.chunk.tree2conllstr(analisi) #trasformo in formato IOB for nodo in analisi: #ciclo l'albero scorrendo i nodi if hasattr(nodo, 'label'): #controlla se e' un nodo intermedio if nodo.label() =="PERSON": elementoP= nodo.leaves() #converte l'elemento in una tupla (utile per elementi composti da piu token che altrimenti verrebbero restituiti in sottoliste) listaPersone.append(tuple(elementoP)) #calcolo le frequenze freqPersone=collections.Counter(listaPersone) #restituisco i primi 20 persone=freqPersone.most_common(20) return persone #questa funzione restituisce i 20 nomi di luogo piu' frequenti nel corpus def NomiPropriLuogo(testoTokenizzato): tutteLeNE=[] listaPersone=[] listaLuoghi=[] tokensPOS=nltk.pos_tag(testoTokenizzato) #lista di bigrammi (token, POS) analisi=nltk.ne_chunk(tokensPOS) #rappresentazione ad albero IOBformat=nltk.chunk.tree2conllstr(analisi) #trasformo in formato IOB for nodo in analisi: #ciclo l'albero scorrendo i nodi if hasattr(nodo, 'label'): #controlla se e' un nodo intermedio if nodo.label() =="GPE": elementoL= nodo.leaves() #converte l'elemento in una tupla (utile per elementi composti da piu token che altrimenti verrebbero restituiti in sottoliste) listaLuoghi.append(tuple(elementoL)) #calcolo le frequenze freqLuoghi=collections.Counter(listaLuoghi) #restituisco i primi 20 luoghi=freqLuoghi.most_common(20) return luoghi def main(file1, file2): fileInput1 = codecs.open(file1, "r", "utf-8") fileInput2 = codecs.open(file2, "r", "utf-8") var1 = fileInput1.read() var2 = fileInput2.read() #carico il modello statistico utilizzato dalla funzione tokenize sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') #divido i testi dei due file in frasi frasi1 = sent_tokenizer.tokenize(var1) frasi2 = sent_tokenizer.tokenize(var2) #tokenizzo il file1 testoTokenizzato1=TokenizzaTesto(frasi1) #tokenizzo il file2 testoTokenizzato2=TokenizzaTesto(frasi2) #memorizzo su due variabili i due testi annotati sulla base delle Part-Of-Speech POStag1=nltk.pos_tag(testoTokenizzato1) POStag2=nltk.pos_tag(testoTokenizzato2) print "\n\nPROGRAMMA #2\n" #le 10 POS piu' frequenti POSFrequenti1=POSFrequenti(testoTokenizzato1) POSFrequenti2=POSFrequenti(testoTokenizzato2) print "\nLe 10 POS piu' frequenti del file", file1, "(ordinate in senso descrescente), sono:\n" for POSFrequente1 in POSFrequenti1: print POSFrequente1 print "\n\nLe 10 POS piu' frequenti del file", file2, "(ordinate in senso descrescente), sono:\n" for POSFrequente2 in POSFrequenti2: print POSFrequente2 #i 20 token piu' frequenti tokenFrequenti1=TokenFrequenti(testoTokenizzato1) tokenFrequenti2=TokenFrequenti(testoTokenizzato2) print "\n\nI 20 token piu' frequenti nel file", file1, "sono:\n" for tokenFrequente1 in tokenFrequenti1: print tokenFrequente1 print "\n\nI 20 token piu' frequenti nel file", file2, "sono:\n" for tokenFrequente2 in tokenFrequenti2: print tokenFrequente2 #i 20 bigrammi piu' frequenti bigrammi1=BigrammiFrequenti(testoTokenizzato1) bigrammi2=BigrammiFrequenti(testoTokenizzato2) print "\n\nI 20 bigrammi piu' frequenti nel file", file1, "sono:\n" for bigramma1 in bigrammi1: print bigramma1 print "\n\nI 20 bigrammi piu' frequenti nel file", file2, "sono:\n" for bigramma2 in bigrammi2: print bigramma2 #i 20 trigrammi piu' frequenti trigrammi1=TrigrammiFrequenti(testoTokenizzato1) trigrammi2=TrigrammiFrequenti(testoTokenizzato2) print "\n\nI 20 trigrammi piu' frequenti nel file", file1, "sono:\n" for trigramma1 in trigrammi1: print trigramma1 print "\n\nI 20 trigrammi piu' frequenti nel file", file2, "sono:\n" for trigramma2 in trigrammi2: print trigramma2 #i 20 bigrammi aggettivo-sostantivo con probabilita' condizionata massima bigrammiProbCondizionata1=BigrammiProbCondizionata(testoTokenizzato1) bigrammiProbCondizionata2=BigrammiProbCondizionata(testoTokenizzato2) print "\n\nI 20 bigrammi aggettivo-sostantivo con probabilita' condizionata massima nel file", file1, "sono:\n" for bigrammaProbCondizionata1 in bigrammiProbCondizionata1: print bigrammaProbCondizionata1 print "\n\nI 20 bigrammi aggettivo-sostantivo con probabilita' condizionata massima nel file", file2, "sono:\n" for bigrammaProbCondizionata2 in bigrammiProbCondizionata2: print bigrammaProbCondizionata2 #i 20 bigrammi aggettivo-sostantivo con probabilita' congiunta massima bigrammiProbCongiunta1=BigrammiProbCongiunta(testoTokenizzato1) bigrammiProbCongiunta2=BigrammiProbCongiunta(testoTokenizzato2) print "\n\nI 20 bigrammi aggettivo-sostantivo con probabilita' congiunta massima nel file", file1, "sono:\n" for bigrammaProbCongiunta1 in bigrammiProbCongiunta1: print bigrammaProbCongiunta1 print "\n\nI 20 bigrammi aggettivo-sostantivo con probabilita' congiunta massima nel file", file2, "sono:\n" for bigrammaProbCongiunta2 in bigrammiProbCongiunta2: print bigrammaProbCongiunta2 #i 20 bigrammi aggettivo-sostantivo con forza associativa massima bigrammiLMI1=BigrammiLMI(testoTokenizzato1) bigrammiLMI2=BigrammiLMI(testoTokenizzato2) print "\n\nI 20 bigrammi aggettivo-sostantivo con forza associativa massima nel file", file1, "sono:\n" for bigrammaLMI1 in bigrammiLMI1: print bigrammaLMI1 print "\n\nI 20 bigrammi aggettivo-sostantivo con forza associativa massima nel file", file2, "sono:\n" for bigrammaLMI2 in bigrammiLMI2: print bigrammaLMI2 #Entita' Nominate persone1=NomiPropriPersona(testoTokenizzato1) persone2=NomiPropriPersona(testoTokenizzato2) luoghi1=NomiPropriLuogo(testoTokenizzato1) luoghi2=NomiPropriLuogo(testoTokenizzato2) print "\n\nNel file", file1, "i 20 nomi propri di persona piu' frequenti sono:\n" for persona1 in persone1: print persona1 print "\nNel file", file1, "i 20 nomi propri di luogo piu' frequenti sono:\n" for luogo1 in luoghi1: print luogo1 print "\n\nNel file", file2, "i 20 nomi propri di persona piu' frequenti sono:\n" for persona2 in persone2: print persona2 print "\nNel file", file2, "i 20 nomi propri di luogo piu' frequenti sono:\n" for luogo2 in luoghi2: print luogo2 #le due frasi con probabilita' piu' alta calcolate con modello di ordine 0 probFrase01=MarkovOrdine0(testoTokenizzato1, frasi1) print "\n\nLa frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 0, nel file", file1, " e':\n", probFrase01 probFrase02=MarkovOrdine0(testoTokenizzato2, frasi2) print "\n\nLa frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 0, nel file", file2, " e':\n", probFrase02 #le due frasi con probabilita' piu' alta calcolate con modello di ordine 1 probFrase11=MarkovOrdine1(testoTokenizzato1, frasi1) print "\n\nLa frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 1, nel file", file1, " e':\n", probFrase11 probFrase12=MarkovOrdine1(testoTokenizzato2, frasi2) print "\n\nLa frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 1, nel file", file2, " e':\n", probFrase12 main(sys.argv[1], sys.argv[2])
    >
    PROGRAMMA #1
    
    
    Il file clinton e' lungo 8156.0 token
    Il file trump e' lungo 6479.0 token
    clinton e' piu' lungo di trump
    
    Le frasi del file clinton hanno lunghezza media pari a 18.8360277136 token
    Le frasi del file trump hanno lunghezza media pari a 20.4384858044 token
    
    Il vocabolario del file clinton ha 1909 tipi
    Il vocabolario del file trump ha 1244 tipi
    clinton ha un vocabolario piu' ricco di trump
    
    Grandezza del vocabolario del file clinton all'aumento del corpus per porzioni incrementali di 1000 token:
    465
    742
    981
    1190
    1420
    1575
    1757
    1890
    1909
    1909
    
    Grandezza del vocabolario del file trump all'aumento del corpus per porzioni incrementali di 1000 token:
    371
    631
    880
    1055
    1157
    1218
    1244
    1244
    1244
    1244
    
    Notiamo dal risultato che il vocabolario tende a crescere sempre piu' lentamente.
    
    Ricchezza lessicale del file clinton all'aumento del corpus per porzioni incrementali di 1000 token:
    0.465
    0.371
    0.327
    0.2975
    0.284
    0.2625
    0.251
    0.23625
    0.212111111111
    0.1909
    
    Ricchezza lessicale del file trump all'aumento del corpus per porzioni incrementali di 1000 token:
    0.371
    0.3155
    0.293333333333
    0.26375
    0.2314
    0.203
    0.177714285714
    0.1555
    0.138222222222
    0.1244
    Piu' e' alto il valore della Type-Token-Ratio, maggiore e' la ricchezza del vocabolario
    
    Il rapporto sostantivi/verbi nel file clinton e':  1.47268408551
    Il rapporto sostantivi/verbi nel file trump e':  1.81444332999
    
    La densita' lessicale del file clinton e':  0.56704929102
    La densita' lessicale del file trump e':  0.574747474747
    
    >
    PROGRAMMA #2
    
    
    Le 10 POS piu' frequenti del file clinton (ordinate in senso descrescente), sono:
    
    ('NN', 899)
    ('IN', 816)
    ('DT', 661)
    ('JJ', 550)
    ('PRP', 479)
    ('NNS', 462)
    ('VB', 460)
    ('NNP', 449)
    ('.', 433)
    ('RB', 395)
    
    
    Le 10 POS piu' frequenti del file trump (ordinate in senso descrescente), sono:
    
    ('NN', 855)
    ('IN', 584)
    ('NNP', 558)
    ('DT', 496)
    ('JJ', 381)
    ('VB', 370)
    ('NNS', 364)
    ('.', 297)
    (',', 262)
    ('CC', 251)
    
    
    I 20 token piu' frequenti nel file clinton sono:
    
    (u'the', 314)
    (u'to', 313)
    (u'and', 246)
    (u'of', 150)
    (u'a', 133)
    (u'I', 120)
    (u'that', 104)
    (u'in', 93)
    (u"'s", 93)
    (u'we', 92)
    (u'for', 87)
    (u'it', 69)
    (u'our', 67)
    (u'is', 61)
    (u'will', 61)
    (u'Iran', 57)
    (u'on', 54)
    (u'have', 53)
    (u'with', 52)
    (u'as', 52)
    
    
    I 20 token piu' frequenti nel file trump sono:
    
    (u'the', 261)
    (u'and', 223)
    (u'to', 196)
    (u'of', 153)
    (u'will', 118)
    (u'a', 96)
    (u'our', 92)
    (u'for', 63)
    (u'is', 62)
    (u'in', 60)
    (u'We', 51)
    (u'we', 50)
    (u'Hillary', 49)
    (u'American', 44)
    (u'that', 41)
    (u'I', 40)
    (u'on', 37)
    (u'are', 35)
    (u'government', 33)
    (u'plan', 32)
    
    
    I 20 bigrammi piu' frequenti nel file clinton sono:
    
    ((u'to', u'be'), 19)
    ((u'United', u'States'), 18)
    ((u'Iran', u"'s"), 17)
    ((u'I', u'will'), 15)
    ((u'need', u'to'), 13)
    ((u'Cuban', u'people'), 12)
    ((u'have', u'to'), 11)
    ((u'will', u'be'), 11)
    ((u'I', u'would'), 11)
    ((u'it', u"'s"), 10)
    ((u'It', u"'s"), 10)
    ((u'I', u"'ll"), 10)
    ((u'I', u"'m"), 9)
    ((u'is', u'not'), 9)
    ((u'want', u'to'), 8)
    ((u'we', u'will'), 8)
    ((u'Latin', u'America'), 8)
    ((u'ca', u"n't"), 8)
    ((u'I', u'believe'), 8)
    ((u'we', u'need'), 8)
    
    
    I 20 bigrammi piu' frequenti nel file trump sono:
    
    ((u'going', u'to'), 31)
    ((u'We', u'will'), 25)
    ((u'are', u'going'), 21)
    ((u'we', u'will'), 18)
    ((u'plan', u'to'), 17)
    ((u'We', u'are'), 17)
    ((u'will', u'also'), 16)
    ((u'Hillary', u'Clinton'), 16)
    ((u'United', u'States'), 15)
    ((u'Mr.', u'Trump'), 14)
    ((u'our', u'government'), 12)
    ((u'rebuild', u'our'), 12)
    ((u'our', u'country'), 12)
    ((u'lied', u'to'), 11)
    ((u'I', u'will'), 11)
    ((u'we', u'are'), 10)
    ((u'American', u'Voter'), 10)
    ((u'illegal', u'immigration'), 10)
    ((u'Trump', u'Administration'), 10)
    ((u'will', u'be'), 10)
    
    
    I 20 trigrammi piu' frequenti nel file clinton sono:
    
    ((u'we', u'need', u'to'), 7)
    ((u'We', u'ca', u"n't"), 5)
    ((u'I', u'want', u'to'), 5)
    ((u'need', u'to', u'be'), 5)
    ((u'We', u'have', u'to'), 4)
    ((u'to', u'step', u'up'), 4)
    ((u'do', u"n't", u'want'), 4)
    ((u'go', u'back', u'to'), 4)
    ((u'be', u'able', u'to'), 4)
    ((u"'s", u'why', u'I'), 4)
    ((u'Cuban', u'American', u'community'), 4)
    ((u'Iran', u"'s", u'nuclear'), 3)
    ((u'we', u'have', u'to'), 3)
    ((u"n't", u'go', u'back'), 3)
    ((u'We', u'need', u'to'), 3)
    ((u'to', u'make', u'it'), 3)
    ((u'ca', u"n't", u'go'), 3)
    ((u'They', u'do', u"n't"), 3)
    ((u'Iran', u'tries', u'to'), 2)
    ((u'we', u"'ve", u'got'), 2)
    
    
    I 20 trigrammi piu' frequenti nel file trump sono:
    
    ((u'are', u'going', u'to'), 21)
    ((u'We', u'are', u'going'), 14)
    ((u'We', u'will', u'also'), 10)
    ((u'Trump', u'Administration', u'will'), 9)
    ((u'Mr.', u'Trump', u'will'), 9)
    ((u'biggest', u'tax', u'cut'), 7)
    ((u'made', u'13', u'phones'), 7)
    ((u'lied', u'to', u'Congress'), 7)
    ((u'13', u'phones', u'disappear'), 7)
    ((u'to', u'15', u'percent'), 7)
    ((u'deleted', u'33,000', u'emails'), 6)
    ((u'Radical', u'Islamic', u'Terrorists'), 6)
    ((u'keep', u'Radical', u'Islamic'), 6)
    ((u'to', u'bring', u'back'), 6)
    ((u'unnecessary', u'job-killing', u'regulation'), 5)
    ((u'we', u'are', u'going'), 5)
    ((u'rebuild', u'our', u'military'), 5)
    ((u'Supreme', u'Court', u'who'), 5)
    ((u'who', u'will', u'uphold'), 5)
    ((u'reduce', u'surging', u'crime'), 5)
    
    
    I 20 bigrammi aggettivo-sostantivo con probabilita' condizionata massima nel file clinton sono:
    
    ((u'malicious', u'activity'), 1.0)
    ((u'uranium', u'enrichment'), 1.0)
    ((u'Israel-or', u'talks'), 1.0)
    ((u'fifty', u'years'), 1.0)
    ((u'bright', u'future'), 1.0)
    ((u'interim', u'agreement'), 1.0)
    ((u'multilateral', u'talks'), 1.0)
    ((u'abundant', u'energy'), 1.0)
    ((u'religious', u'groups'), 1.0)
    ((u'previous', u'attempts'), 1.0)
    ((u'21st', u'century'), 1.0)
    ((u'northern', u'Israel'), 1.0)
    ((u'allies-including', u'intelligence'), 1.0)
    ((u'courageous', u'Ladies'), 1.0)
    ((u'only', u'answer'), 1.0)
    ((u'compelling', u'advertisement'), 1.0)
    ((u'advanced', u'radar'), 1.0)
    ((u'constant', u'threat'), 1.0)
    ((u'enhanced-our', u'capacity'), 1.0)
    ((u'executive', u'authority'), 1.0)
    
    
    I 20 bigrammi aggettivo-sostantivo con probabilita' condizionata massima nel file trump sono:
    
    ((u'executive', u'branch'), 1.0)
    ((u'Public', u'Corruption'), 1.0)
    ((u'Congressional', u'Subpoena'), 1.0)
    ((u'previous', u'deportation'), 1.0)
    ((u'Democratic', u'policies'), 1.0)
    ((u'longstanding', u'ties'), 1.0)
    ((u'failed', u'elites'), 1.0)
    ((u'Clinton-backed', u'deal'), 1.0)
    ((u'civil', u'rights'), 1.0)
    ((u'easy', u'way'), 1.0)
    ((u'congressional', u'seats'), 1.0)
    ((u'real', u'change'), 1.0)
    ((u'confidential', u'information'), 1.0)
    ((u'spur', u'innovation'), 1.0)
    ((u'advanced', u'manufacturing'), 1.0)
    ((u'Pay-For-Play', u'scheme'), 1.0)
    ((u'Trump', u'Administration'), 1.0)
    ((u'middle-class', u'family'), 1.0)
    ((u'unnecessary', u'job-killing'), 1.0)
    ((u'common', u'core'), 1.0)
    
    
    I 20 bigrammi aggettivo-sostantivo con probabilita' congiunta massima nel file clinton sono:
    
    ((u'nuclear', u'weapons'), 0.0006130456105934282)
    ((u'Cuban', u'American'), 0.0004904364884747424)
    ((u'human', u'rights'), 0.0003678273663560569)
    ((u'foreign', u'policy'), 0.0003678273663560569)
    ((u'civil', u'society'), 0.0003678273663560569)
    ((u'next', u'president'), 0.0003678273663560569)
    ((u'long', u'time'), 0.00036782736635605686)
    ((u'American', u'leadership'), 0.00036782736635605686)
    ((u'nuclear', u'program'), 0.00036782736635605686)
    ((u'global', u'coalition'), 0.0002452182442373713)
    ((u'broader', u'Iran'), 0.0002452182442373713)
    ((u'failed', u'policy'), 0.0002452182442373713)
    ((u'bad', u'behavior'), 0.0002452182442373713)
    ((u'bad', u'actors'), 0.0002452182442373713)
    ((u'free', u'expression'), 0.0002452182442373713)
    ((u'many', u'others'), 0.0002452182442373713)
    ((u'additional', u'steps'), 0.0002452182442373713)
    ((u'political', u'prisoners'), 0.0002452182442373713)
    ((u'dangerous', u'path'), 0.0002452182442373713)
    ((u'Iranian', u'aggression'), 0.0002452182442373713)
    
    
    I 20 bigrammi aggettivo-sostantivo con probabilita' congiunta massima nel file trump sono:
    
    ((u'illegal', u'immigration'), 0.001543448062972681)
    ((u'American', u'Voter'), 0.0015434480629726809)
    ((u'biggest', u'tax'), 0.0010804136440808766)
    ((u'illegal', u'server'), 0.0007717240314863406)
    ((u'unnecessary', u'job-killing'), 0.0007717240314863404)
    ((u'inner', u'cities'), 0.0007717240314863404)
    ((u'open', u'borders'), 0.0006173792251890724)
    ((u'illegal', u'Obama'), 0.0006173792251890724)
    ((u'middle', u'class'), 0.0006173792251890724)
    ((u'Real', u'change'), 0.0006173792251890724)
    ((u'other', u'countries'), 0.0006173792251890723)
    ((u'illegal', u'conduct'), 0.00046303441889180435)
    ((u'Congressional', u'Subpoena'), 0.0004630344188918043)
    ((u'elected', u'President'), 0.0004630344188918043)
    ((u'Republican', u'House'), 0.0004630344188918043)
    ((u'email', u'evidence'), 0.0004630344188918043)
    ((u'Syrian', u'Refugee'), 0.0004630344188918043)
    ((u'American', u'business'), 0.0004630344188918043)
    ((u'highest', u'levels'), 0.0004630344188918043)
    ((u'special', u'interests'), 0.0004630344188918043)
    
    
    I 20 bigrammi aggettivo-sostantivo con forza associativa massima nel file clinton sono:
    
    ((u'nuclear', u'weapons'), 39.32181521535816)
    ((u'Cuban', u'American'), 37.515744863605555)
    ((u'civil', u'society'), 31.226050677886327)
    ((u'human', u'rights'), 29.980938180049794)
    ((u'foreign', u'policy'), 28.6026433241379)
    ((u'next', u'president'), 28.22605067788632)
    ((u'long', u'time'), 25.04936961072562)
    ((u'terrorist', u'organization'), 22.817367118590884)
    ((u'nuclear', u'program'), 22.04936961072562)
    ((u'honest', u'disagreements'), 21.987292120033192)
    ((u'bad', u'actors'), 21.987292120033192)
    ((u'American', u'leadership'), 21.55887341387698)
    ((u'additional', u'steps'), 20.817367118590884)
    ((u'bad', u'behavior'), 19.987292120033192)
    ((u'political', u'prisoners'), 19.987292120033192)
    ((u'free', u'expression'), 19.647442117148568)
    ((u'dangerous', u'path'), 19.34343593025847)
    ((u'Iranian', u'aggression'), 19.34343593025847)
    ((u'global', u'coalition'), 18.81736711859088)
    ((u'failed', u'policy'), 17.898503881316287)
    
    
    I 20 bigrammi aggettivo-sostantivo con forza associativa massima nel file trump sono:
    
    ((u'illegal', u'immigration'), 75.82604101072937)
    ((u'American', u'Voter'), 75.74092601217419)
    ((u'biggest', u'tax'), 57.88466613782299)
    ((u'unnecessary', u'job-killing'), 51.69813673790198)
    ((u'inner', u'cities'), 49.27100260205077)
    ((u'middle', u'class'), 41.358509390321586)
    ((u'other', u'countries'), 38.129089702091164)
    ((u'Real', u'change'), 37.966521764101785)
    ((u'illegal', u'server'), 35.56559408885858)
    ((u'open', u'borders'), 35.52078291577239)
    ((u'natural', u'gas'), 33.229778825239805)
    ((u'Congressional', u'Subpoena'), 33.229778825239805)
    ((u'Republican', u'House'), 33.229778825239805)
    ((u'Syrian', u'Refugee'), 31.984666327403275)
    ((u'email', u'evidence'), 31.984666327403275)
    ((u'special', u'interests'), 31.01888204274119)
    ((u'clean', u'coal'), 30.739553829566745)
    ((u'illegal', u'Obama'), 30.556750517757216)
    ((u'highest', u'levels'), 30.229778825239805)
    ((u'massive', u'inflow'), 29.562601561230466)
    
    
    Nel file clinton i 20 nomi propri di persona piu' frequenti sono:
    
    (((u'Latin', 'NNP'), (u'America', 'NNP')), 6)
    (((u'Cuba', 'NNP'),), 4)
    (((u'Obama', 'NNP'),), 4)
    (((u'Strobe', 'NNP'),), 3)
    (((u'Pope', 'NNP'), (u'Francis', 'NNP')), 2)
    (((u'Frank', 'NNP'),), 2)
    (((u'Third', 'NNP'),), 2)
    (((u'Ernie', 'NNP'), (u'Moniz', 'NNP')), 1)
    (((u'Bill', 'NNP'), (u'Burns', 'NNP')), 1)
    (((u'Putin', 'NNP'),), 1)
    (((u'Chuck', 'NNP'), (u'Schumer', 'NNP')), 1)
    (((u'Venezuela', 'NNP'),), 1)
    (((u'Frankly', 'NNP'),), 1)
    (((u'Ebola', 'NNP'),), 1)
    (((u'Tehran', 'NNP'),), 1)
    (((u'John', 'NNP'), (u'Kerry', 'NNP')), 1)
    (((u'Martin', 'NNP'),), 1)
    (((u'Capitol', 'NNP'), (u'Hill', 'NNP')), 1)
    (((u'Google', 'NNP'),), 1)
    (((u'Joe', 'NNP'),), 1)
    
    Nel file clinton i 20 nomi propri di luogo piu' frequenti sono:
    
    (((u'Iran', 'NNP'),), 55)
    (((u'American', 'JJ'),), 21)
    (((u'Israel', 'NNP'),), 21)
    (((u'United', 'NNP'), (u'States', 'NNPS')), 18)
    (((u'Cuban', 'NNP'),), 16)
    (((u'Cuba', 'NNP'),), 15)
    (((u'America', 'NNP'),), 14)
    (((u'Iranian', 'JJ'),), 10)
    (((u'Cubans', 'NNPS'),), 6)
    (((u'U.S.', 'NNP'),), 4)
    (((u'Israeli', 'NNP'),), 3)
    (((u'Havana', 'NNP'),), 3)
    (((u'Cuban', 'JJ'),), 3)
    (((u'Iranians', 'NNPS'),), 3)
    (((u'American', 'NNP'),), 3)
    (((u'Americans', 'NNPS'),), 3)
    (((u'Castro', 'NNP'),), 2)
    (((u'Yemen', 'NNP'),), 2)
    (((u'White', 'NNP'),), 2)
    (((u'Syrian', 'JJ'),), 2)
    
    
    Nel file trump i 20 nomi propri di persona piu' frequenti sono:
    
    (((u'Hillary', 'NNP'),), 28)
    (((u'Hillary', 'NNP'), (u'Clinton', 'NNP')), 16)
    (((u'Mr.', 'NNP'), (u'Trump', 'NNP')), 14)
    (((u'Obamacare', 'NNP'),), 9)
    (((u'Ronald', 'NNP'), (u'Reagan', 'NNP')), 7)
    (((u'Radical', 'NNP'), (u'Islamic', 'NNP')), 6)
    (((u'Bill', 'NNP'), (u'Clinton', 'NNP')), 2)
    (((u'Bill', 'NNP'),), 2)
    (((u'Flint', 'NNP'),), 2)
    (((u'Clinton', 'NNP'),), 2)
    (((u'Hillary', 'JJ'),), 2)
    (((u'Black', 'NNP'), (u'America', 'NNP')), 2)
    (((u'Michigan', 'NNP'),), 2)
    (((u'Currency', 'NNP'), (u'Manipulation', 'NNP')), 2)
    (((u'Real', 'NNP'),), 2)
    (((u'Naval', 'NNP'), (u'Air', 'NNP')), 1)
    (((u'Obama', 'NNP'),), 1)
    (((u'Donald', 'NNP'), (u'Trump', 'NNP')), 1)
    (((u'Coast', 'NNP'), (u'Guard', 'NNP')), 1)
    (((u'School', 'NNP'),), 1)
    
    Nel file trump i 20 nomi propri di luogo piu' frequenti sono:
    
    (((u'American', 'JJ'),), 21)
    (((u'United', 'NNP'), (u'States', 'NNPS')), 15)
    (((u'America', 'NNP'),), 14)
    (((u'American', 'NNP'),), 10)
    (((u'Washington', 'NNP'),), 9)
    (((u'China', 'NNP'),), 5)
    (((u'Middle', 'NNP'),), 4)
    (((u'Florida', 'NNP'),), 3)
    (((u'D.C.', 'NNP'),), 3)
    (((u'Pennsylvania', 'NNP'),), 2)
    (((u'Michigan', 'NNP'),), 2)
    (((u'North', 'NNP'), (u'Carolina', 'NNP')), 2)
    (((u'Arizona', 'NNP'),), 2)
    (((u'U.S.', 'NNP'),), 1)
    (((u'State', 'NNP'),), 1)
    (((u'Nevada', 'NNP'),), 1)
    (((u'Detroit', 'NNP'),), 1)
    (((u'Veterans', 'NNPS'),), 1)
    (((u'Mexico', 'NNP'),), 1)
    (((u'New', 'NNP'), (u'Navy', 'NNP')), 1)
    
    
    La frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 0, nel file clinton  e':
    ([u'Thank', u'you', u'.', u"''"], "-------probabilita':", 0.3333333333333333)
    
    
    La frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 0, nel file trump  e':
    ([u'...', u'We', u'are', u'fighting', u'to', u'unlock', u'the', u'potential', u'of', u'every', u'American', u'community', u',', u'and', u'every', u'American', u'family', u',', u'who', u'hope', u'and', u'pray', u'and', u'yearn', u'for', u'a', u'better', u'future', u'.', u"''"], "-------probabilita':", 0.0)
    
    
    La frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 1, nel file clinton  e':
    ('', "-------probabilita':", 0.0)
    
    
    La frase con probabilita' piu' alta, calcolata con catena di Markov di ordine 1, nel file trump  e':
    ('', "-------probabilita':", 0.0)