Showing posts with label Outage. Show all posts
Showing posts with label Outage. Show all posts

Sunday, July 10, 2022

Rogers suffers widespread outage

Rogers experienced a widespread network outage beginning on Friday morning and impacting Quebec and Ontario primarily, as well as other linked services nationwide. The Rogers outage led to cascading effects at banks, credit card processing, and online businesses.

On Saturday, Rogers reported service restoration to the majority of its users. 

Tony Staffieri, the president and CEO of Rogers, stated "We now believe we’ve narrowed the cause to a network system failure following a maintenance update in our core network, which caused some of our routers to malfunction early Friday morning. We disconnected the specific equipment and redirected traffic, which allowed our network and services to come back online over time as we managed traffic volumes returning to normal levels. We know how much our customers rely on our networks and I sincerely apologize. We’re particularly troubled that some customers could not reach emergency services and we are addressing the issue as an urgent priority. We will proactively credit all customers automatically for yesterday’s outage. This credit will be automatically applied to your account and no action is required from you."




Rogers launch Canada’s first commercial 5G standalone network

Rogers Communications a has launched the first commercial 5G standalone (SA) network in Canada, turning on the next-generation service after completing the rollout of Canada’s first national standalone 5G core and the country’s first 5G standalone device certification. Rogers said its 5G SA Core network has been built from the ground up based on the latest cloud native technologies, enabling more advanced wireless capabilities like ultra-low...


Rogers + Shaw merger to reshape Canadian market

Rogers Communications agreed to acquire Shaw Communications in a $26 billion deal that could reshape the Canadian communications market. Under the transaction, Rogers will acquire all of Shaw’s Class A and Class B shares for $40.50 per share, reflecting a ~70% premium to Shaw’s Class B share price.The merger will create Canada’s most robust wholly-owned national network and accelerate the deployment of 5G. Once the transaction is complete, the companies...




Monday, January 17, 2022

Digicel Tonga confirms two separate undersea cable breaks

Digicel reported that its preliminary technical fault investigation has established that there are two separate undersea cable breaks. The first between TCL cable landing station Sopu, Tongatapu and FINTEL cable landing station in Suva, Fiji. The international cable break is approximately 37km offshore from Tonga. The second cable break is on the domestic cable which is near the area of the recent volcanic activity. The company has engaged the cable repair ship CS Reliance to undertake a full fault assessment as well as determine the safety of a possible cable repair. 

Digicel also confirmed that its domestic network in Tongatapu is active. 

Digicel Regional CEO, Shally Jannif, said; “We know how vital it is at times like this that we keep people connected. We are focused on doing everything we can to ensure that we are able to establish international connectivity with Tonga.”

https://www.digicelgroup.com/to/en/news/2022/jan/18th/network-update-volcanic-eruption.html

Monday, January 10, 2022

Norway reports cut on Svalbard cable

Space Norway AS, which owns and operates a subsea cable system connecting mainland Norway with the archipelago of Svalbard, reported a fiber cut on Friday 7 January 2022. The cut is believed to have occurred between 130 and 230 km from Longyearbyen and in an area where the cable goes steeply into the deep sea.  A second cable is still in operation, but the system will have no further redundancy until a repair is completed. A cable-laying ship will be mobilized.

Svalbard is located about midway (74° to 81° north latitude) between the northern coast of Norway and the North Pole. The islands have a population of about 2,900 and the largest settlement is the town of Longyearbyen.

https://spacenorway.no

Wednesday, December 22, 2021

AWS hit by loss of power within a data center in US-EAST-1 Region

 At 4:35 AM PST on December 22, Amazon Web Services reported increased EC2 launch failures and networking connectivity issues for some instances in its US-EAST-1 Region. Shortly after, AWS confirmed a loss of power within a single data center within a single Availability Zone (USE1-AZ4) in its US-EAST-1 Region. 

The outage impacted availability and connectivity to EC2 instances that are part of the affected data center within the affected Availability Zone. AWS also reported elevated RunInstance API error rates for launches within the affected Availability Zone. 

By 5:39 AM PST, AWS restored power to all instances and network devices within the affected data center. Network connectivity within the affected Availability Zone returned to normal levels.

As of 9:28 AM PST, AWS was still working to restore connectivity issues between some remaining EC2 instances and EBS volumes in the affected data center. AWS also noted increased error rates for some customers using Directory Services AD Connector or Managed AD with Amazon SSO in US-EAST-1 Region.

https://status.aws.amazon.com/


AWS attributes outage to surge from automated scaling of internal network

At 7:30 AM PST on December 7th, 2021, an automated activity at AWS Northern Virginia (US-EAST-1) Region that is used to scale capacity of services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network. The unexpected behavior resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in...

AWS hit by outage in US-EAST-1 region

Amazon Web Services (AWS) experienced an outage started on Dec 7, 2021 at approximately 15:40 UTC impacting various regions worldwide.The AWS status page reported API and console issues in the US-EAST-1 Region, which is used to host the AWS global console landing page. The trouble included elevated error rates for EC2 APIs in the US-EAST-1 region. At 12:34 EST, the company said the root cause was "impairment of several network devices," confirmed...


Sunday, December 12, 2021

AWS attributes outage to surge from automated scaling of internal network

At 7:30 AM PST on December 7th, 2021, an automated activity at AWS Northern Virginia (US-EAST-1) Region that is used to scale capacity of services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network. The unexpected behavior resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks. These delays increased latency and errors for services communicating between these networks, resulting in even more connection attempts and retries. This led to persistent congestion and performance issues on the devices connecting the two networks." The traffic surge impacted the control planes that are used for creating and managing AWS resources.  In particular, API Gateway servers were impacted by their inability to communicate with the internal network during the early part of this event. As a result of these errors, many API Gateway servers eventually got into a state where they needed to be replaced in order to serve requests successfully. 

In a blog posting, Amazon Web Services (AWS) apologized for the incident and said it has already taken several actions to prevent a recurrence of this event, including disabling of the automated scaling process until a remediation method is deployed. 


https://aws.amazon.com/message/12721/

Tuesday, December 7, 2021

AWS hit by outage in US-EAST-1 region


Amazon Web Services (AWS) experienced an outage started on Dec 7, 2021 at approximately 15:40 UTC impacting various regions worldwide.

The AWS status page reported API and console issues in the US-EAST-1 Region, which is used to host the AWS global console landing page. The trouble included elevated error rates for EC2 APIs in the US-EAST-1 region. 

At 12:34 EST, the company said the root cause was "impairment of several network devices," confirmed that recovery was underway, but could not provide an ETA for full recovery.



https://status.aws.amazon.com

Monday, October 4, 2021

Facebook hit by 6 hour outage


Facebook, Instagram, and WhatsApp were hit by a global outage on Monday beginning at approximately 15:50 UTC and continuing for almost 6 hours. In July, Facebook reported 3.51 billion people are using at least one of its apps every month. 

Facebook CTO Mike Schroepfer attributed the outage to networking issues and offered the company's apologies. 

In a blog post, Facebook Engineering Group's Santosh Janardhan said the root cause of the outage was a faulty configuration change on the backbone routers that coordinate network traffic between data centers. The disruption cascaded across Facebook's application, including internal tools the company uses to manage its infrastructure.


Cloudflare said the issue appeared more serious than a DNS misconfiguration because it appeared as if Facebook had stopped announcing the BGP routes to their DNS prefixes, making their infrastructure IPs suddenly were unreachable as "if someone had pulled the cables from their data centers all at once and disconnected them from the Internet."

https://engineering.fb.com/2021/10/04/networking-traffic/outage/

https://blog.cloudflare.com/october-2021-facebook-outage/

Tuesday, June 8, 2021

Fastly's global CDN suffers configuration error

 Fastly, which operates a global content delivery network, experienced service configuration issue that triggered disruptions across its POPs worldwide. Service disruptions continued for about three hours, impacting traffic from top websites such as CNN, NYTimes, Amazon, Reddit, and others.

According to the company's website, the incident affected: Asia/Pacific (Auckland (AKL), Brisbane (BNE), Dubai (FJR), Hong Kong (HKG), Melbourne (MEL), Osaka (ITM), Perth (PER), Singapore (SIN), Sydney (SYD), Tokyo (HND), Tokyo (TYO), Wellington (WLG), Singapore (QPG), Tokyo (NRT)), South America (Buenos Aires (EZE), Bogota (BOG), Curitiba (CWB), Rio de Janeiro (GIG), Santiago (SCL), Sāo Paulo (CGH), Sāo Paulo (GRU), Lima (LIM)), North America (Ashburn (BWI), Ashburn (DCA), Ashburn (IAD), Ashburn (WDC), Atlanta (FTY), Atlanta (PDK), Boston (BOS), Chicago (CHI), Chicago (MDW), Chicago (ORD), Chicago (PWK), Columbus (CMH), Columbus (LCK), Dallas (DAL), Dallas (DFW), Denver (DEN), Houston (IAH), Jacksonville (JAX), Kansas City (MCI), Los Angeles (BUR), Los Angeles (LAX), Los Angeles (LGB), Miami (MIA), Minneapolis (MSP), Minneapolis (STP), Montreal (YUL), New York (LGA), Newark (EWR), Palo Alto (PAO), Phoenix (PHX), Portland (PDX), San Jose (SJC), Seattle (SEA), St. Louis (STL), Toronto (YYZ), Vancouver (YVR)), South Africa (Cape Town (CPT), Johannesburg (JNB)), India (Chennai (MAA), Mumbai (BOM), New Delhi (DEL)), and Europe (Amsterdam (AMS), Copenhagen (CPH), Dublin (DUB), Frankfurt (FRA), Frankfurt (HHN), Helsinki (HEL), London (LCY), London (LHR), London (LON), Madrid (MAD), Manchester (MAN), Marseille (MRS), Milan (MXP), Oslo (OSL), Paris (CDG), Stockholm (BMA), Vienna (VIE), Munich (MUC)).

https://status.fastly.com/incidents/vpk0ssybt3bj

Tuesday, January 26, 2021

Verizon suffers fiber cut in Brooklyn

 Following widespread outages on the East Coast, Verizon confirmed a fiber cut in Brooklyn.





Monday, December 14, 2020

Google suffers widespread outage

Google was hit by a widespread outage impacting Gmail, YouTube, and other services. The problem began on 14-Dec-2020 at 3:55 AM (Pacific). The company said that service was restored for the majority of users by 4:52 AM.

https://www.google.com/appsstatus#hl=en&v=status




Wednesday, November 25, 2020

AWS suffers outage impacting Kinesis Data Streams API

Amazon Web Services reported a widespread outage in North America impacting its Kinesis Data Streams API.

Amazon Kinesis is a managed service that lets websites ingest, buffer, and process streaming data in real-time.

In a status update, AWS said the issue was "also affecting other services, including ACM, Amplify Console, API Gateway, AppMesh, AppStream2, AppSync, Athena, Batch, CodeArtifact, CodeGuru Profiler, CodeGuru Reviewer, CloudFormation, CloudMap, CloudTrail, Cognito, Connect, DynamoDB, EventBridge, Glue, IoT Services, Lambda, LEX, Managed Blockchain, Marketplace, MediaLive, MediaConvert, Personalize, RDS Performance Insights, Resource Groups, SageMaker, Support Console, Well Architected, and Workspaces.




https://status.aws.amazon.com/

Monday, September 28, 2020

Microsoft 365 hit by outage

Microsoft 365 experienced a widespread on Monday evening across the United States.

The disruption left some users unable to access any services that leverage Azure Active Directory (AAD) including Outlook, Microsoft Teams and Teams Live Events as well as Office.com. 

While the issue was being resolved, Microsoft said it was rerouting some traffic to alternate infrastructure.

About 4 hours after reports of the outage surfaced, Microsoft tweeted that the majority of services for most users had been recovered.


https://twitter.com/MSFT365Status


 

Sunday, August 30, 2020

Cloudflare: CenturyLink/Level(3) outage led to 3.5% drop in global traffic

Beginning at 10:03 UTC, Cloudflare' traffic monitoring systems detected a significant disruption impacting leading network providers worldwide.

In a blog post, Cloudflare said a problem on the CenturyLink/Level(3) backbone led to "a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States.

For its part, CenturyLink said its IP NOC detected a bad Flowspec rule that propagated throughout the network.

The issue took approximately four hours to fully resolve.

https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/

Tuesday, May 19, 2020

Opengear: Estimated the cost of enterprise network outages

In the US, nearly two-fifths (38%) of senior IT decision-makers and network managers reported losing more than $1 million in the past 12 months, according to a recent study commissioned by Opengear, a Digi International company. More than half of survey respondents globally say they have had four or more network outages lasting more than 30 minutes in the past year, with outages costing half of the surveyed organizations worldwide between $300,000 and $6 million in downtime.

‘Measuring the True Cost of Network Outages,’ Opengear’s in-depth research study of 500 global senior IT decision makers, including 125 respondents from the US, also discovered that US businesses put significantly greater emphasis on network resilience than any other country surveyed. In fact, network resilience has become the top priority for 73% of US IT departments, as well as 70% of US companies at the board level. Globally, responses were at 49% and 47% respectively.

Some highlights of the study:

  • Although more than three quarters (78%) of organizations globally have set aside a specific budget to ensure its network resilience, almost half (49%) had outages increase by 10% or more over the last five years. 
  • Outages were even more prevalent in the US, with nearly one-third (32%) reporting an increase of 25% or more. 
  • More than four out of 10 (42%) US businesses reported that network outages took more than one working day on average to find and resolve after they were reported, with an average of nearly 10 hours across the country.


With many organizations running geographically spread networks, travel time to get engineers on site has become the most common challenge in resolving network issues quickly, according to more than two in five (41%) globally and over half (52%) in the US. But the US differs from other regions with the second most common challenge, inadequate network monitoring (41%); whereas globally, companies reported a lack of in-house engineering capabilities (40%).

Steve Cummins, Vice President of Marketing at Opengear, said “The true cost of a network outage is much more than just lost revenue. Our survey found that reduced customer satisfaction was the biggest impact of an outage according to 41% of respondents, ahead of data loss (34%) and financial loss (31%). Organizations need to think in advance about how they can avoid, and then recover from, an outage quickly before the consequences become severe. Given the time invested to resolve network outages and the costs incurred, finding a solution that addresses these is an urgent priority. This is where an out-of-band management solution can be highly beneficial. Companies around the world recognize that the ability to operate independently from the production network, and detect and remediate network issues automatically, can dramatically improve security (48%), save time (45%) and most importantly, reduce costs (41%).”

http://www.opengear.com

Thursday, January 16, 2020

Outages on West African Cable System (WACS) & SAT3 cables

Internet service in central and southern Africa is impacted by simultaneous outages on the West African Cable System (WACS) & the SAT3 cable.

The break in the WACS system is believed to have occurred near Libreville, Gabon, while the break in the SAT3 cable reportedly happened near Luanda, Angola.



West Africa Cable System upgraded to 32X100G

The West Africa Cable System (WACS) has been successfully upgraded to 32*100G wavelengths configured on the longest optically amplified single fiber span stretching 11500km from South Africa to Portugal.  WACS has two network operation centers and 15 landing points in 14 countries spanning West Africa and Europe.
Huawei Marine, which was the contractor, said the upgrade employed Flex Grid and Optical pass-through technologies, and now represents the world's longest 100G system.

ALU Completes SAT-3/WASC Undersea Cable Upgrade

Alcatel-Lucent completed the fourth upgrade of the SAT-3/WASC undersea cable system, which now offers 40 Gbps connections, full in-system protection, and one of the lowest latency routes between Europe, West Coast of Africa and Southern Africa.

The SAT-3/WASC cable system was upgraded from 420 Gbps to 920 Gbps in the northern segments, north of Ghana, and from 340 Gbps to 800 Gbps in the southern segments. Overall, this fourth upgrade enables a sevenfold increase in SAT-3/WASC’s original design capacity through the use of Alcatel-Lucent’s advanced coherent technology.

This latest upgrade went live during the first half of 2014.

The SAFE cable provides on-going connections via the shortest route and therefore lowest latency between Southern Africa and Asia with connectivity via South Africa, Mauritius, Reunion, India and Malaysia.

Monday, July 15, 2019

Galileo cites ground infrastructure for ongoing outage

Galileo, the EU's satellite navigation system, cited a technical incident related to its ground infrastructure as a primary cause of an outage that has persisted since Friday, 11 July 2019.

A service recovery time/date has not yet been forecasted.

Galileo is convening an Anomaly Review Board to analyse the exact root cause and to implement recovery actions.

https://www.gsc-europa.eu/news/update-on-the-availability-of-some-galileo-initial-services



Sunday, June 9, 2019

ThousandEyes: Cogent outage part of a larger BGP route leak

Last week's outage at a Cogent data center in London, which disrupted some Whatsapp traffic, was part of a larger BGP route leak, according to an updated analysis from ThousandEyes.

The disruption had nothing to do with the Whatsapp service itself, writes Archana Kesavan, but was the result of a major BGP route leak by Swiss colocation provider, Safe Host. Her analysis indicates that "Safe Host leaked thousands of prefixes which had a cascading effect on the availability of those services when the routes were accepted and propagated by service providers, such as China Telecom, and then further accepted by other ISPs such as Cogent."

https://blog.thousandeyes.com/whatsapp-disruption-just-one-symptom-of-broader-route-leak/


Thursday, June 6, 2019

WhatsApp outage linked to Cogent' London data center

WhatsApp users from multiple locations around the world experienced a major service disruption on Thursday, 06-June-2019 from 10:50am - 11:30am BST & from 1:10pm - 2:13pm BST.

ThousandEyes picked up the outages & identified the issue to 100% packet loss w/in Cogent’s London data center.


https://www.thousandeyes.com/

Monday, June 3, 2019

Google Cloud outage attributed to configuration change

The widescale outage experienced by Google Cloud Platform on 02-June-2019 was caused by a configuration change that was intended for a small number of servers in a single region but was mistakenly applied to a larger number of servers across several neighboring regions, according to a blog posting by Benjamin Treynor Sloss, VP, 24x7, Google.

Google says the disruption caused a 10% drop in global views of YouTube during the incident, while Google Cloud Storage measured a 30% reduction in traffic.

https://cloud.google.com/blog/topics/inside-google-cloud/an-update-on-sundays-service-disruption

In a follow-up report tracking the Google Cloud outage, ThousandEyes said its monitoring data indicate that connectivity and packet loss issues impacted Google network locations in the eastern US, including Ashburn, Atlanta and Chicago. High packet loss conditions radiated out to Google’s network edge.  However, most GCP regions were unaffected by the outage, including regions in the US as well as in Europe and elsewhere.

More: https://blog.thousandeyes.com/google-cloud-platform-outage-analysis/



Sunday, June 2, 2019

Google Cloud suffers widespread outage

Google reported high levels of network congestion in the eastern USA, impacting a wide range of Google services and third-party websites and apps.

The first report on the Google status page was posted at 12:53pm (Pacific) impacting Google Cloud Networking. Google Compute Engine was also impacted

Users across the United States reported slow performance and intermittent outages with Google Cloud, G Suite, Gmail, YouTube and other key services.

At 16:00 (Pacific), Google reported that the network situation was back to nornal for the majority of users.