/r/CouchDB
/r/CouchDB
What's the best ZFS recordsize for CouchDB? For example, Postgres accesses data in 8k blocks. MySQL InnoDB is 16k. What would be the block size for CouchDB?
I'm running a Python script that connects to a CouchDB database and subscribes to an MQTT topic. The script works fine on my laptop (Ubuntu 24.04.1 LTS), but when I try to run it on a Jetstream2 Access server (Ubuntu 20.04 LTS), I get a connection refused error when trying to access/create a database. Here's the relevant part of my code:
import paho.mqtt.client as mqtt
import couchdb
import json
from datetime import datetime
from urllib.parse import quote
# CouchDB setup
username = "admin"
password = "" # Your actual password
host = "localhost"
port = "5984"
# URL encode the username and password
encoded_username = quote(username, safe='')
encoded_password = quote(password, safe='')
couch_url = f"http://{encoded_username}:{encoded_password}@{host}:{port}"
try:
couch = couchdb.Server(couch_url)
print("Successfully connected to CouchDB")
except Exception as e:
print(f"Error connecting to CouchDB: {e}")
exit(1)
db_name = 'weather_data'
try:
if db_name not in couch:
db = couch.create(db_name)
else:
db = couch[db_name]
print(f"Successfully accessed/created database: {db_name}")
except Exception as e:
print(f"Error accessing/creating database: {e}")
exit(1)
# MQTT setup
mqtt_broker = "iotwx.ucar.edu"
mqtt_port = 1883
# mqtt_topic = "ncar/iotwx/hi/maui/mauiNNN"
mqtt_topic = "ncar/iotwx/co/rmnp/neon000"
def on_connect(client, userdata, flags, rc):
print(f"Connected with result code {rc}")
client.subscribe(mqtt_topic)
def on_message(client, userdata, msg):
print(f"Received message on topic: {msg.topic}")
payload = msg.payload.decode()
print(f"Raw payload: {payload}")
lines = payload.split('\n')
data = {}
for line in lines:
if ':' in line:
key, value = line.split(':', 1)
data[key.strip()] = value.strip()
print(f"Parsed data: {data}")
if 'sensor' in data and 'm' in data and 't' in data:
doc = {
'device': data.get('device', ''),
'sensor': data['sensor'],
'measurement': float(data['m']),
'timestamp': int(data['t']),
'received_at': datetime.now().isoformat()
}
print(f"Prepared document for CouchDB: {doc}")
try:
doc_id, doc_rev = db.save(doc)
print(f"Saved document to CouchDB. ID: {doc_id}, Rev: {doc_rev}")
except Exception as e:
print(f"Error saving document to CouchDB: {e}")
else:
print("Received message does not contain all required fields (sensor, m, t)")
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
try:
client.connect(mqtt_broker, mqtt_port, 60)
print(f"Connected to MQTT broker: {mqtt_broker}")
except Exception as e:
print(f"Error connecting to MQTT broker: {e}")
exit(1)
print(f"Listening for messages on topic: {mqtt_topic}")
client.loop_forever()
The error message I get is:
Successfully connected to CouchDB
Error accessing/creating database: [Errno 111] Connection refused
I'm not sure what's causing the connection refused error, especially since the script works fine on my laptop. Has anyone else encountered this issue on Jetstream2 Access servers or Ubuntu 20.04 LTS? Any help would be appreciated!
Environment:
Jetstream2 Access server (Ubuntu 20.04 LTS)
CouchDB 3.2.2
Python 3.8.10
couchdb library version 1.1.1
Hi guys! I need some help for deploying my code to railway, especially with my database usong CouchDB.
TDLR: for CouchDB, what should I set my variable name as well as for my reference as in my empty project on railway?
My project backend is using springboot, and for databases, I have SQL, MongoDB, and CouchDB as my databases.
So I created an empty project to deploy my backend project. Inside, I have to create 3 environmental variables. When deploying in railway, under my project variables, there will be 3 variables: one for SQL, MongdoDB and also CouchDB. For railway, after adding SQL and MongoDB, environmental variables for URL is automatically added, but this is not the case for CouchDB!
So for my empty project under its environmental variables, I put this: SPRING_DATASOURCE_URL under my variable name and for reference I put this: jdbc:${{MySQL.MYSQL_URL}}. This is for SQL.
I put this: SPRING_DATA_MONGODB_URI under my variable name and for reference I put this: ${{MongoDB.MONGO_URL}}/products?authSource=admin. This is for MongoDB.
Then for CouchDB, what should I set my variable name as well as for my reference as?
Thank you very much!
Hey all,
I am adding offline capabilities to my app and decided to move away from Firebase even though Firestore offers some kind of persistence. I found PouchDB which seems perfect for what I need, I just needed some advice for the current database structure I have in Firestore.
Basically there are 2 main collections: "users" & "projects". A user can have multiple projects. So far so good. My issue is that each project document has 4 more sub-collections, which as I understand PouchDB doesn't support? I don't expect these sub-collections to have more than a few dozen documents each, so perhaps I could just add them as keys to each project but on the other hand I don't always need this data when I'm fetching a project.
I'm not a database expert so I'm wondering if there's a better approach? Any help is appreciated.
I am trying to configure the couchdb to be accessible from internet using the cloudflared dns.
I started the cloudflared tunnel using following config
- hostname: example.org
path: /couchdb
service: http://192.168.0.103:5984
so that it is accessible via https://example.org/couchdb
but upon visiting the url I am faced with the following error on the webpage, and 404 error in the couchdb docker logs.
{"error":"not_found","reason":"Database does not exist."}
A CouchDB like DB, which runs in your browser. Access your saved attachments locally with REST API:
<img src=`/recliner/dbname/doc_id/attachment_name`>
In your service worker add this:
import {Recliner} from 'recliner-js';
const recliner = new Recliner();//create instance
self.addEventListener("fetch",(e)=>{
const url_path = Recliner.getPathFromUrl(e.request);
//mounts recliner
if(url_path.startsWith("/recliner")){
e.respondWith(recliner.process(e.request));
}else{
// do whatever else
}
});
Now you can access docs and attachments saved in your recliner DB via URL
<img src=`/recliner/dbname/doc_id/attachment_name`>
There are two ways to interact with DB:
fetch
in your JS code using REST API similar to CouchDB*.
const getADoc = await fetch(`/recliner/dbname/docid`);
if(getADoc.status === 200){
return await getADoc.json();//{_id,_rev,ok: true}
}
See complete list of Rest API
*Though many , but not all Couch REST API is supported. See Difference from CouchDB section.
Use the a client instead: UsageDAO
import {UsageDAO} from 'recliner-js'; //Create await UsageDAO.postADoc(dbname,{name:"person1",age:30}); //Retrieve const doc = await UsageDAO.readADoc(dbname,docid); //Update await UsageDAO.updateADoc(dbname,doc._id,{name:"person1",age:34}); //Delete await UsageDAO.deleteADoc(dbname,doc._id);
//query const findResult = await UsageDOA.findByPagination({ dbanme, selector:{ age:{$lt: 40}, income: {$within:[10000,20000]}, loc: {$isinpolygon:[]}//has some GIS capability } });
//Save Attachments
await UsageDAO.addAttachmentsToDocID(dbname,doc._id,{
"my_doc.pdf":DOC_BLOB,
"my_video.webm":VIDEO_BLOB
});
//Save attach wth a cloud URl. //this way when such docs get replicated the Attachments are not sent. As they can be downloaded at end system using the cloud URL await UsageDAO.addAnAttachmentToExistingDoc(dbname,doc,attachment_name,blob,new_edits,cloud_url,content_type); //CRUD with attachments on DOC is available
//Replication: say fetch last 10 post await UsageDAO.replicate({ selector:{ post_id type:"post", time:{$lt: now} }, limit: 10, target:{ url:"/recliner/my_db" }, source:{ url: "/proxy/couchdb/dbname", headers:{ token:"some_token" } } });
Save the document with a cloud_url
in a _attachments
property.
await fetch("/recliner/dbname/docid",{
method:"PUT",
body:{
_id:docid
name:"person1",
_attachments:{
"my_video.webm":{
cloud_url:"some_valid_cloud_url_which_supports_partial_content",
}
}
}
});
Now this is can be streamed using:
<video src="/recliner/dbname/docid/my_video.webm">
The video player will automatically stream the video via recliner. Using the cloud_url, the docs will be partially downloaded and saved for offline use, and then streamed to video
element. So next time when user stream the same video, its pulled out from the local cache.
However for all this to work, you need to configure recliner
for what all mime type you want to support for streaming.
import {Recliner} from 'recliner-js';
const recliner = new Recliner(24,{
"video/webm":1000_000,//1MB of partial content size for streaming
"audio/mp3":1000_00//0.1MB of partial content size for streaming
});
When configured this way, then whenever an attachments of type webm
and mp3
are requested, they are automatically streamed. If partial content of a doc is not present locally, than using the cloud_url
its partial content is first pulled from cloud, saved in indexedDB and then streamed to the corresponding requesting GUI components like : video
and audio
tags. Next time same partial content will be streamed from local DB, instead of fetching it from cloud_url.
"/recliner/:db";
"/recliner/:db/_design/:ddoc";
"/recliner/:db/_find";
"/recliner/:db/_index";
"/recliner/_replicate";
"/recliner/:db/_changes";
"/recliner/:db/_bulk_get";
"/recliner/:db/_revs_diff";
"/recliner/:db/_local/:docid";
"/recliner/:db/:docid";
"/recliner/:db/:docid/:attachment";
"/recliner/:db/_db_design";
"/recliner/:db/_run_update_function";
"/recliner/_delete_recliner";
_m
$lt,$lte,$eq,$ne,$gte,$gt,$exists,$within,$nin,$regex,$in,$isinpolygon,$isnotinpolygon,$nwithin
UsageDAO.putADBDesign
, can be used to configure various function at DB level :
export interface DBDesignDoc{
//name of the DB
for_db:string;
//before insertion docs are validated using this
doc_validation_function?:string;//this is stringified JS function
//used for map reduce
map_functions?: Record<string,string>;
reduce_functions?:Record<string,string>;
//can be used to mass modify docs using a selector
update_functions?:Record<string,string>;
/**
* Used during remote to local replication using view query as source. [one can pass viewQueryUrl to replication info, to start view based replication]
* This functions are used to filter view result before replicating them to local DB.
*/
view_result_filter_functions?:Record<string,string>;
}
UsageDAO.postQueryToDBDesign<D>(dbname:string, query:MapReduceQuery)
. 3. Design docs are not replicated by default when doing remote to local
or local to remote
replication. However for local to local
replication design docs are copied. By local means database present in the browser. By remote means the one located and accessed via HTTPS
on DB server(or via a proxy). 4. _local
attribute can be added on doc. They remain solely on local machine, and is removed when are replicated. So some values you wanted to keep in doc, but don;t want to send to server can be saved here. CouchDB Local Docs is supported, which are never replicated.Though the most important API, to deal with docs and attachments and Database is implemented. And many which deemed insignificant where dropped.
Type this commands in order
npm run i
install dev dependenciesnpm run build
builds for typescript systemnpm run predemo
adds ".js" extension to build for demo purpose.npm run demo
: open http://localhost:8000/demo/index.html
to view the demoI’m not finding documentation that coherently explains a lot of stuff in couchdb. Seems to be an ongoing problem throughout its 10+ year existence according to what shows up in search engines, but my question today is: what is a legal hostname in couchdb? Is it just letters, numbers, and period - or are “hyphens” considered legal?
K8s has something called a “statefulset” that (long story short) auto generates hostnames with a hyphen followed by an ordinal. The FQDN would look something like “couchdb-0.couchdb.apache-couchdb.svc.cluster.local”, and scaling would create nodes couchdb-1, couchdb-2, etc… same FQDN pathing. Unfortunately, couchdb@couchdb-0.couchdb.apache-couchdb.svc.cluster.local throws the illegal nodename error.
Other that creating separate deployments with separate non-hyphenated service names (seems kind of silly), there’s no way to set this up without a whole lot of kludgy workarounds - unless you know how to bypass this hyphen issue.
I am currently working on a project that uses CouchDB, and I need to write a Mango Query, in which two fields needs to be compared, and I was not able to find too much about this online, and didn't have luck figuring this myself.
Long story, short, I have documents which contain the following fields, among many others: "processed" and "modifiedOn". These fields are of type string and their content is date formatted with ISO-8601 date format (yyyy-MM-ddThh:mm:ss).
ex:
{
...,
"processed": "2023-12-12T12:12:12Z",
"modifiedOn": "2023-12-12T12:13:00Z"
}
So, my question is whether a query can be written, so that it will return all of the documents whose "processed" field's value is less that the "modifiedOn" field's value.
Hello, I have been using couchdb in standalone mode for some time and I am considering setting up a cluster for reasons of redundancy and also load balancing. As I have read, couchdb has its own load balancing system that, although it is quite simple, is sufficient for the test I want to do.
I have created a cluster of 3 nodes in Docker and the 3 communicate and replicate well, however, no matter how many queries I launch to node 0, (millions in 1 minute), I do not see that it delegates any to the rest of the nodes.. Should I configure something else or I have not understood couchdb balancing?
Thanks people.
Hi, is it normal for the Docker container's CPU usage to consistently hover around 5%? I've noticed that beam.smp seems to be responsible for this, at least according to what I see in htop. Typically, this wouldn't be a concern, but I'm worried about unnecessary power consumption. I have several other services running, and this behavior is only occurring with CouchDB.
I am new to couchDB. I inheritted a couchDB server at work. I am looking through it and at times we are getting 409 errors. I know that is a document conflict. My question is is there a way to know why that is happening? Is it because I am attempting to update an old revision? Is that the only way that a 409 error is generated?
How well does CouchDB perform with 100,000 per-user databases supporting "live" users? Are there any readily available and repeatable online tests to gauge its performance?
I find the implementation of CouchDB to be laborious, time-consuming, and not enjoyable. I'm concerned that the advantages of replication might not justify the effort and inconvenience involved in using CouchDB.
Hello,
I'm pretty new in couchdb world, I just use it to synchronize obsidian (with livesync plugin) but I wonder what is best practices to securing a couchdb exposé on web. Is use this configuration in docker :
[couchdb]
single_node=true
max_document_size = 50000000
[chttpd]
require_valid_user = true
max_http_request_size = 4294967296
[chttpd_auth]
require_valid_user = true
authentication_redirect = /_utils/session.html
[httpd]
WWW-Authenticate = Basic realm="couchdb"
enable_cors = true
[cors]
origins = app://obsidian.md,capacitor://localhost,http://localhost
credentials = true
headers = accept, authorization, content-type, origin, referer
methods = GET, PUT, POST, HEAD, DELETE
max_age = 3600
It's behind a reverse proxy in https (manage by cloudflare), password it's secure (32 chars with upper, lower and number).
But I wonder if it's enough? I read official documentation but I found nothing else than require_valid_user and use strong password.
Do you have recommandation ?
Thank on advance
Your value to Reddit is your free posts and comments so remember to delete your posts before deleting your account!
I used this Reddit API script to delete 10 years of my comments and posts: https://codepen.io/j0be/full/WMBWOW/
Bye Reddit! It has been fun!
I am just starting up with CouchDB. Say I have an object joeDoe of class Person that has an ArrayList attribute of Car objects. My question is this: Should I store my joeDoe and all his cars in one document, or should I store each object in separate documents and simply provide the ids of the objects wherever they are needed? Which is considered best practice? Is there any benefit or drawback in any of these approaches?
Thank you.
Dear CouchDB users,
I downloaded CouchDB to a MacBook Pro from this address "https://neighbourhood.ie/download-apache-couchdb-mac/" and installed it. I was trying to create a view document but I was taking "Error: internal_server_error message". So I just tried to verify the installation and took this screenshot. What seems to be the problem?
Hello, I'm one of the creators of CouchDB and PouchDB, and I have a new database people in this forum might be excited about. It's free and open source, and uses the IPFS protocol. The API feels a lot like CouchDB, except this is writtin in (not very much) JavaScript and designed to run in pages and app. I'd love feedback on ease of use and API from folks in this community -- I bet y'all are better qualified than most to understand how to use something like this. There are a bunch of opportunities to contribute listed in the README, wow that's be exciting if I saw some PRs.
Here's the Fireproof GitHub repo, and here's the website. Thanks! Chris
Hi
I know this has been asked and answered a few times, but I'm going to ask again, because I'm still unsure.
(Sorry - this has ended up being quite long - TLDR: one database per user sounds great for offline PouchDB stuff - but how do you make it work when multi-user access to shared documents with fine-grained permissions is needed?)
I've got a V1 app, written in Rails using a relational back-end. I'm now approaching the time to design the V2 version and the client wants it to be able to work offline (which immediately puts Rails out of the question - at least for the client, if not the admin interface). PouchDB and CouchDB seem like the perfect way to do this - but my relational mind is still struggling to figure out how to organise things. Documents and Views I get - but fine-grained security and authorisation less so.
In Rails all client access to the data is through the app-server, so I control who sees, edits and deletes which document. But if the system is to work offline, my PouchDB database needs to sync to the server-side CouchDB database, bypassing any app-server level controls.
Each user only has access to a subset of the data - so I don't want to sync the entire database across. Firstly, it's costly (Gbs to move) and secondly, I don't want people poking around on their client device and seeing other people's stuff inside the database (even if they can't access it in the app - the client has some security-conscious customers).
"One database per user" seems to be the solution - but a lot of this data is shared. For example (and this is just a small subset) - a supervisor creates a work-schedule, it gets approved by a manager, and then the employee views it. When it's time to start working, the employee updates their timesheet. The timesheet gets submitted back to the supervisor and eventually processed by the manager.
The account owner sees/updates everything across all departments. The manager sees/updates everything within their own department. The supervisor only sees/updates the schedules and timesheets for their own team. The employee only sees/updates their own stuff.
My initial thought, then, is to have a primary database, then a database per user. Then, I set up replication filters between all these databases so the correct information goes to the correct place - in both directions. Does that sound like a good idea?
(Even more complex - when not just dealing with timesheets, certain types of document might need to be available to be visible to and edited by employee-1, then visible to and edited by employee-2 - so the filter rules would have to allow updates from employee-1-database to primary to employee-2-database and back again)
Then within each document (schedule, timesheet etc), on the primary I have a list of users who have access to it, so the filter rules can easily figure out who can see it? Although that then potentially publishes a list of all users to the user-databases. So can the filter rule transform the document in some way? Or can the filter rule reference a separate document which describes the authorisation rules for this document?
Finally when they sign up a new employee I have to create a new database (which will be a standard template, with filter rules predefined, so should be pretty simple) and then possibly add in extra filter rules to the replication design document on the primary database (depending on how the permissions are stored)? Likewise, if someone gets promoted, from supervisor to manager, I then need to rewrite the filter rules relating to them, both on their user-database and on the primary?
Or is there another simpler method that I'm missing?
Hi all, I'm playing with PouchDB for an offline-first web app and wondering how best to solve a simple thing.
What I want is to be able to create the database with a couple of default example docs. There doesn't seem to be any obvious response from `new PouchDB('example')` that would tell me 'this is newly created' that I can use to trigger the default document creation.
I could put a flag somewhere, maybe a 'config' document with 'hasInitialised' in it and use that. But it seems a bit of a faff to create a whole additional database with a single document in it to store my config for a single flag. Is there something obvious I'm not seeing?
Thanks!
Assume db holds bank account transactions, each transaction have a date and amount. How to create a view that will give transactions sorted by date, amount and running total (sum with previous amount incrementally)?
map only works with a single document, no access to the previous document. reduce reduces the grouped items by some aggregation. No idea how to access the previous row.
This is called a Window Function in SQL databases. Any help is appreciated.
I installed CouchDB on a cloud server instance w/ 512MB RAM, 20GB disk, and uploaded 200,000 json documents, totaling just under 1GB of documents.
Then I tried to create a simple view (conditional `emit` of 2 fields).
During the view creation I got "OS timeout".
Then trying to use the view I get "OS error 137".
(these are from memory as the error pop-up in Fauxton goes away before I could copy/paste)
Is this normal?
My latest blog entry on this here. I am mainly comparing it with MongoDB as far as NOSQL systems is concerned.
Those of you who are experienced CouchDB devs writing some very complex views, what does your IDE and test/deploy pipeline look like?
I'm new to CouchDB, and more generally I'm new to "offline-first".
I read somewhere that databases such as CouchDB that make offline-first possible, store a replication of the whole user's dataset on their own side (client-side) as well, is it true?
If that's the case, it doesn't make sense to me... Please tell me what I'm getting wrong here? It means every time the user logs in to their account on a new device, or even on a new browser, (or worse, when Incognito browsing is used), their data must be downloaded fully? Couldn't it be chopped into parts and then -like pagination-, keep only a more recent part on the client-side, then load more on need?
Of course, the new device/browser thing I said doesn't happen frequently (the Incognito thing could happen more frequently though), but even if these happen every few times in a while, can be UX killer... Let's say it's a simple notes app with only 5000 notes, it will be almost 50MBs in size to download for the first time, that means a noticeable delay even on a good Internet connections before the initial UI load... Isn't there a way to make this experience more smooth?
Can you recommend a simple flask example to use CouchDB from Flask ?
I am newbie for both of them.
I am learning CouchDB. As I understand it, documents in the database cannot be grouped into categories, such as, for example, all receipt documents can be put into a receipt bucket, invoices can be put into invoice bucket etc.
Are there any free and opensource NoSQL databases that provide this feature of grouping documents according to category?
Anyone know what causes this error when trying to view the settings in Fauxton?
Failed to load the configuration. Unexpected token < in JSON at position 0
I am comparing DB design for a simple "Post and Comment" system using Postgres and CouchDB. With Postgres I can design the following tables:
user_info {email, pass_hash, pass_salt, ...}
post_info {post_id, creator_email, title, text, ...}
comment_info {comment_id, creator_email, post_id, parent_comment_id, text, ...}
But if I use CouchDB, there is a concept of creating per-user tables. So I was thinking of the following design:
user_table {email, table_id}
user_<table_id> {email, pass_hash, pass_salt, ...}
post_<table_id> {post_id, <table_id>_creator_email, title, text, ...}
comment_<table_id> {comment_id, <table_id>_creator_email, <table_id>_post_id, <table_id>_parent_comment_id, text, ...}
I am in no way expert in Postgres and CouchDB, so my question is, is this the correct way to design per-user CouchDB tables? What is the better way? And what is the efficient way to create/use CRUD queries?