More

nyellin · on Nov 20, 2024

I know there are already a number of comments here about proprietary solutions.

If you're looking for something open source: https://github.com/robusta-dev/holmesgpt/

nyellin · on Aug 23, 2024

We've open sourced something with similar goals that you can use today: https://github.com/robusta-dev/holmesgpt/

We're taking a slightly different angle than what Facebook published, in that we're primarily using tool calling and observability data to run investigations.

What we've released really shines at surfacing up relevant observability data automatically, and we're soon planning to add the change-tracking elements mentioned in the Facebook post.

If anyone is curious, I did a webinar with PagerDuty on this recently.

BodyCulture · on Aug 23, 2024

https://news.ycombinator.com/item?id=41327430

BodyCulture · on Aug 23, 2024

Can we see the recording of this webinar somewhere?

nyellin · on Aug 23, 2024

Here you go: https://www.youtube.com/live/Jml1hk6I5Wo?si=YbjJKRkO4yf0bOlx

And thanks for submitting!

nyellin · on July 2, 2024

Robusta.dev | REMOTE (EUROPE) or ONSITE(ISRAEL)| Staff Software Engineer, Backend Team Lead

We investigate cloud alerts with LLMs - see http://github.com/robusta-dev/holmesgpt/

Email natan at our domain

walth · on July 2, 2024

Chatted with these folks a few times and they are all lovely people and would be super fun to work with.

nyellin · on July 7, 2024

Thank you!

nyellin · on June 27, 2024

aider is pretty good - https://github.com/paul-gauthier/aider

sdwr · on June 27, 2024

Thanks, tried it out - it's pretty cheap and pretty good. Being able to commit directly into multiple files simultaneously is great.

nyellin · on June 27, 2024

Interesting. We're finding that it's more verbose than GPT-4o (and often ignores instructions to give output in a specific format), but does better when faced with ambiguity and potentially confusing situations.

Here is an example using http://github.com/robusta-dev/holmesgpt/ where ask it to troubleshoot a payment processing workload in a Kubernetes cluster vs the GPT-4o output underneath:

  holmes ask --model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0" "what is wrong with the payment processor?"
  AI: To investigate what's wrong with the payment processor, we'll need to gather more information about it. First, let's try to find the relevant Kubernetes resources related to the payment processor.

  Running `kubectl get -A --show-labels pod | grep payment`

  Thank you for your question about the payment processor. initial search, I've found two pods related to payment processing. Let's investigate the one that seems to be having issues:

  1. In the default namespace, there's a pod named "payment-processing-worker-5bb5bd75cd-x4g99" which is in a CrashLoopBackOff state. This indicates that the pod is repeatedly crashing and restarting.

  2. In the sock-shop namespace, there's a pod named "payment-ff86cd6f8-rs47b" which appears to be running normally.

  Let's focus on the problematic pod in the default namespace. To get more information about what's wrong, we'll need to look at its logs and description.

  Running `kubectl describe pod payment-processing-worker-5bb5bd75cd-x4g99 -n default`

  AI: Based on the information gathered, it appears that there is an issue with the payment processor in the Kubernetes cluster. Let me explain what's wrong:

  1 The payment processing worker pod is in a CrashLoopBackOff state:
    • Pod name: payment-processing-worker-5bb5bd75cd-x4g99
    • Namespace: default
    • Status: CrashLoopBackOff
  2 The pod is continuously restarting and failing:
    • Restart Count: 242
    • The container is in a "Waiting" state with the reason "CrashLoopBackOff"
  3 The root cause of the issue appears to be related to an environment variable:
    • The container is running a bash script that checks for the presence of the DEPLOY_ENV environment variable.
    • If DEPLOY_ENV is not set, the container exits immediately after printing an error message.
    • The command being run is:

       if [[ -z "${DEPLOY_ENV}" ]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi

  4 The environment section in the pod description shows:

    Environment: <none>

   This indicates that no environment variables are set for the container.
  5 The container is exiting with a status of "Completed" and an exit code of 0, which suggests that it's running the script, finding that DEPLOY_E

  To fix this issue:

  1 The DEPLOY_ENV environment variable needs to be set for the payment-processing-worker deployment.
  2 You should update the deployment configuration to include this environment variable.
  3 After setting the environment variable, the pod should stop crashing and enter a running state where it prints "hello" every 10 seconds.

  To resolve this, you'll need to update the deployment configuration for the payment-processing-worker. You can do this by editing the deployment and adding the DEPLOY_ENV environment variable with an appropriate value.

  Would you like me to show you how to update the deployment with the required environment variable?

Note that it explicitly picks up on the ambiguity in the question and notes that there are 2 payment related workloads and one is healthy.

GPT-4o:

  $ holmes ask "what is wrong with the payment processor?"

  Running `kubectl get -A --show-labels pod | grep payment-processor`                     
  Running `kubectl get -A --show-labels deployment | grep payment-processor`                    
  Running `kubectl get -A --show-labels service | grep payment-processor`              
  Running `kubectl get -A --show-labels pod | grep payment`
  Running `kubectl describe pod payment-processing-worker-5bb5bd75cd-x4g99 -n default`
  Running `kubectl logs payment-processing-worker-5bb5bd75cd-x4g99 -n default --previous`
  
  AI: payment-processing-worker-5bb5bd75cd-x4g99 in namespace default crashes due to missing DEPLOY_ENV environment variable.

  Fix: Define DEPLOY_ENV in the deployment spec.

  Example patch:

  env:
   - name: DEPLOY_ENV
     value: "production"

nyellin · on June 25, 2024

Most of the cli tools just wrap an LLM, but don't give it access to the data it needs to be useful. Aider is an exception of course - it gives great results because it feeds the LLM your source files.

We built http://github.com/robusta-dev/holmesgpt/ to investigate Prometheus/Jira/PagerDuty issues. We're able to get pretty good results (we benchmark extensively) because we use function-calling to give the LLM read acess to relevant data. I think we're the only open source AIOps tool, and the only AIOps tool period that does something more complex than RAG + summarization.

nyellin · on Sept 14, 2023

To put one more option out there, we use Hikaru (https://pypi.org/project/hikaru/) in Robusta.dev (https://github.com/robusta-dev/robusta) and have been pretty happy with it. Example code below:

  with Pod().read(name='thename', namespace='the-namespace') as p:
      p.labels['new-label'] = 'value'

remram · on Sept 14, 2023

Does this read or also write?

nyellin · on May 26, 2023

Yes, thank you!

Any idea how active GitPay is?

nyellin · on April 17, 2023

Interesting, I wrote a bot to debug Kubernetes issues. You wrote a bot to generate them ;)

https://github.com/robusta-dev/kubernetes-chatgpt-bot/

Just kidding on that. This looks very cool.

empath-nirvana · on April 17, 2023

they should be friends!

nyellin · on April 17, 2023

CLI: https://github.com/npiv/chatblade